List 试图从Biopython获取分类信息
我正试图改变以前的一个脚本,它利用biopython获取关于一个物种门的信息。编写此脚本是为了一次检索一个物种的信息。我想修改脚本,这样我就可以一次为100个生物体做这件事。 这是初始代码List 试图从Biopython获取分类信息,list,loops,iteration,bioinformatics,biopython,List,Loops,Iteration,Bioinformatics,Biopython,我正试图改变以前的一个脚本,它利用biopython获取关于一个物种门的信息。编写此脚本是为了一次检索一个物种的信息。我想修改脚本,这样我就可以一次为100个生物体做这件事。 这是初始代码 import sys from Bio import Entrez def get_tax_id(species): """to get data from ncbi taxomomy, we need to have the taxid. we can get that by passi
import sys
from Bio import Entrez
def get_tax_id(species):
"""to get data from ncbi taxomomy, we need to have the taxid. we can
get that by passing the species name to esearch, which will return
the tax id"""
species = species.replace(" ", "+").strip()
search = Entrez.esearch(term = species, db = "taxonomy", retmode = "xml")
record = Entrez.read(search)
return record['IdList'][0]
def get_tax_data(taxid):
"""once we have the taxid, we can fetch the record"""
search = Entrez.efetch(id = taxid, db = "taxonomy", retmode = "xml")
return Entrez.read(search)
Entrez.email = ""
if not Entrez.email:
print "you must add your email address"
sys.exit(2)
taxid = get_tax_id("Erodium carvifolium")
data = get_tax_data(taxid)
lineage = {d['Rank']:d['ScientificName'] for d in
data[0]['LineageEx'] if d['Rank'] in ['family', 'order']}
我已经设法修改了脚本,以便它接受包含我正在使用的生物体之一的本地文件。但我需要把它扩展到100个生物体。
因此,我们的想法是从我的有机体文件中生成一个列表,并以某种方式将列表中生成的每个项目分别输入到行taxid=get_tax\u id(“Erodium carvifolium”)
中,并用我的有机体名称替换“Erodium carvifolium”。但我不知道该怎么做
下面是代码的示例版本,其中包含我的一些调整
import sys
from Bio import Entrez
def get_tax_id(species):
"""to get data from ncbi taxomomy, we need to have the taxid. we can
get that by passing the species name to esearch, which will return
the tax id"""
species = species.replace(' ', "+").strip()
search = Entrez.esearch(term = species, db = "taxonomy", retmode = "xml")
record = Entrez.read(search)
return record['IdList'][0]
def get_tax_data(taxid):
"""once we have the taxid, we can fetch the record"""
search = Entrez.efetch(id = taxid, db = "taxonomy", retmode = "xml")
return Entrez.read(search)
Entrez.email = ""
if not Entrez.email:
print "you must add your email address"
sys.exit(2)
list = ['Helicobacter pylori 26695', 'Thermotoga maritima MSB8', 'Deinococcus radiodurans R1', 'Treponema pallidum subsp. pallidum str. Nichols', 'Aquifex aeolicus VF5', 'Archaeoglobus fulgidus DSM 4304']
i = iter(list)
item = i.next()
for item in list:
???
taxid = get_tax_id(?)
data = get_tax_data(taxid)
lineage = {d['Rank']:d['ScientificName'] for d in
data[0]['LineageEx'] if d['Rank'] in ['phylum']}
print lineage, taxid
问号指的是我被难住下一步该做什么的地方。我不知道如何连接我的循环来替换?在get_tax_id(?)中。或者我是否需要以某种方式附加列表中的每个项目,以便它们每次都被修改以包含
get\u tax\u id(幽门螺杆菌26695)
,然后找到某种方法将它们放在包含taxid=的行中。这是您需要的,将其放在您的函数定义下面,也就是说:sys.exit(2)的行之后
你应该问问biostars:谢谢你的建议
species_list = ['Helicobacter pylori 26695', 'Thermotoga maritima MSB8', 'Deinococcus radiodurans R1', 'Treponema pallidum subsp. pallidum str. Nichols', 'Aquifex aeolicus VF5', 'Archaeoglobus fulgidus DSM 4304']
taxid_list = [] # Initiate the lists to store the data to be parsed in
data_list = []
lineage_list = []
print('parsing taxonomic data...') # message declaring the parser has begun
for species in species_list:
print ('\t'+species) # progress messages
taxid = get_tax_id(species) # Apply your functions
data = get_tax_data(taxid)
lineage = {d['Rank']:d['ScientificName'] for d in data[0]['LineageEx'] if d['Rank'] in ['phylum']}
taxid_list.append(taxid) # Append the data to lists already initiated
data_list.append(data)
lineage_list.append(lineage)
print('complete!')