Python XLRD/Entrez:通过Pubmed搜索并提取计数
我正在从事一个项目,该项目要求我使用Python XLRD/Entrez:通过Pubmed搜索并提取计数,python,xlrd,biopython,pubmed,Python,Xlrd,Biopython,Pubmed,我正在从事一个项目,该项目要求我使用Excel电子表格中的输入搜索pubmed,并打印结果计数。我一直在使用xlrd和entrez来做这项工作。这是我试过的 我需要在pubmed中搜索作者姓名、他/她的医学院、一系列年份以及他/她的导师姓名,这些都在Excel电子表格中。我使用xlrd将包含所需信息的每一列转换为字符串列表 from xlrd import open_workbook book = xlrd.open_workbook("HEENT.xlsx").sheet_by_index(0
Excel
电子表格中的输入搜索pubmed
,并打印结果计数。我一直在使用xlrd
和entrez
来做这项工作。这是我试过的
pubmed
中搜索作者姓名、他/她的医学院、一系列年份以及他/她的导师姓名,这些都在Excel
电子表格中。我使用xlrd
将包含所需信息的每一列转换为字符串列表
from xlrd import open_workbook
book = xlrd.open_workbook("HEENT.xlsx").sheet_by_index(0)
med_name = []
for row in sheet.col(2):
med_name.append(row)
med_school = []
for row in sheet.col(3):
med_school.append(row)
mentor = []
for row in sheet.col(9):
mentor.append(row)
from Bio import Entrez
Entrez.email = "your@email.edu"
handle = Entrez.egquery(term="Jennifer Runch AND ((2012[Date - Publication] : 2017[Date - Publication])) ")
handle_1 = Entrez.egquery(term = "Jennifer Runch AND ((2012[Date - Publication] : 2017[Date - Publication])) AND Leoard P. Byk")
handle_2 = Entrez.egquery(term = "Jennifer Runch AND ((2012[Date - Publication] : 2017[Date - Publication])) AND Southern Illinois University School of Medicine")
record = Entrez.read(handle)
record_1 = Entrez.read(handle_1)
record_2 = Entrez.read(handle_2)
pubmed_count = []
for row in record["eGQueryResult"]:
if row["DbName"] == "pubmed":
pubmed_count.append(row["Count"])
for row in record_1["eGQueryResult"]:
if row["DbName"] == "pubmed":
pubmed_count.append(row["Count"])
for row in record_2["eGQueryResult"]:
if row["DbName"] == "pubmed":
pubmed_count.append(row["Count"])
print(pubmed_count)
>>>['3', '0', '0']
问题是,我需要将学生姓名(“Jennifer Runch”)替换为学生姓名列表中的下一个学生姓名(“med_姓名”),将医学院替换为下一个学校,将当前导师的姓名替换为列表中的下一个导师的姓名pubmed
声明电子邮件后编写for循环,但我不确定如何将这两段代码链接在一起。有没有人知道一种连接两段代码的有效方法,或者知道如何用一种比我尝试过的更有效的方法来连接这两段代码?
谢谢大家! 大部分代码都已准备就绪。它只需要稍微修改一下 假设您的表如下所示:
Jennifer Bunch |Southern Illinois University School of Medicine|Leonard P. Rybak
Philipp Robinson|Stanford University School of Medicine |Roger Kornberg
您可以使用以下代码
import xlrd
from Bio import Entrez
sheet = xlrd.open_workbook("HEENT.xlsx").sheet_by_index(0)
med_name = list()
med_school = list()
mentor = list()
search_terms = list()
for row in range(0, sheet.nrows):
search_terms.append([sheet.cell_value(row, 0), sheet.cell_value(row,1), sheet.cell_value(row, 2)])
pubmed_counts = list()
for search_term in search_terms:
handle = Entrez.egquery(term="{0} AND ((2012[Date - Publication] : 2017[Date - Publication])) ".format(search_term[0]))
handle_1 = Entrez.egquery(term = "{0} AND ((2012[Date - Publication] : 2017[Date - Publication])) AND {1}".format(search_term[0], search_term[2]))
handle_2 = Entrez.egquery(term = "{0} AND ((2012[Date - Publication] : 2017[Date - Publication])) AND {1}".format(search_term[0], search_term[1]))
record = Entrez.read(handle)
record_1 = Entrez.read(handle_1)
record_2 = Entrez.read(handle_2)
pubmed_count = ['', '', '']
for row in record["eGQueryResult"]:
if row["DbName"] == "pubmed":
pubmed_count[0] = row["Count"]
for row in record_1["eGQueryResult"]:
if row["DbName"] == "pubmed":
pubmed_count[1] = row["Count"]
for row in record_2["eGQueryResult"]:
if row["DbName"] == "pubmed":
pubmed_count[2] = row["Count"]
print(pubmed_count)
pubmed_counts.append(pubmed_count)
输出
所需的修改是使用使查询变为变量 一些不必要但可能有用的其他修改:
- 仅在
工作表上循环一次Excel
- 将
存储在预定义的列表中,因为如果值返回为空,则输出的大小会有所不同,因此很难猜测哪个值属于哪个查询pubmed\u计数
- 一切都可以进一步优化和修饰,例如,将查询存储在列表中,并循环查询,这样代码重复就少了,但现在它完成了这项工作
['3', '0', '0']
['1', '0', '0']