Python 使用Beauty soup导航到下一页_Python_Beautifulsoup_Biopython

Python 使用Beauty soup导航到下一页

python

Python 使用Beauty soup导航到下一页,python,beautifulsoup,biopython,Python,Beautifulsoup,Biopython,如何使用BeautifulSoup浏览结果的所有页面。例如，我必须清理此站点：搜索查询是“”（（肿瘤学）和乳腺癌），结果是“没有引号。我如何取回所有的页面？我尝试在请求头中查找表单数据。尝试修改一些字段。我能够修改它，以获得每页200个条目。但是没有了。我实际上需要遍历页面来获取所有内容。任何帮助都将不胜感激假设现在，我只想看第四页守则的有关部分： post_params = { 'term' : val, 'EntrezSystem2.PEntrez.Pu

如何使用BeautifulSoup浏览结果的所有页面。例如，我必须清理此站点：

搜索查询是“
”（（肿瘤学）和乳腺癌），结果是“
没有引号。
我如何取回所有的页面？我尝试在请求头中查找表单数据。尝试修改一些字段。我能够修改它，以获得每页200个条目。但是没有了。我实际上需要遍历页面来获取所有内容。任何帮助都将不胜感激

假设现在，我只想看第四页

守则的有关部分：

post_params = {
    'term' : val,
         'EntrezSystem2.PEntrez.PubMed.Pubmed_ResultsPanel.Pubmed_DisplayBar.PageSize' : 20,
'EntrezSystem2.PEntrez.PubMed.Pubmed_ResultsPanel.Pubmed_DisplayBar.sPageSize' : 20,
'coll_start' : 61,
'citman_count' : 20,
'citman_start' : 61,
'coll_start2' : 61,
'citman_count2' : 20,
'citman_start2' : 61,
'CollectionStartIndex': 1,
'CitationManagerStartIndex' : 1,
'CitationManagerCustomRange' : 'false',

'EntrezSystem2.PEntrez.PubMed.Pubmed_ResultsPanel.Entrez_Pager.cPage' : 3,
'EntrezSystem2.PEntrez.PubMed.Pubmed_ResultsPanel.Entrez_Pager.CurrPage' : 4,

}

"""This part handles the scraping business"""
post_args = urllib.urlencode(post_params)
baseurl = 'http://www.ncbi.nlm.nih.gov'
url = 'http://www.ncbi.nlm.nih.gov/pubmed/'
page = urllib2.urlopen(url, post_args)
page = page.read()
soup = BeautifulSoup(page)
soup.prettify()

它仍然获取第一页。一旦这一部分成功，我会考虑每次迭代这段代码更改参数。

永远不要刮PubMed——总有一种更简单的方法可以直接检索数据。安装并使用该软件包。以下是使用您的查询获取前10篇论文的简单脚本：

from Bio import Entrez, Medline

# Always tell NCBI who you are  
Entrez.email = "your_address@example.com"  

term="((oncology) AND breast cancer) AND resulted in"

handle = Entrez.esearch(db="pubmed", retmax=10, term=term)
record = Entrez.read(handle)

print record['Count']  # see how many hits in your search

for ref in record['IdList']:
    handle = Entrez.efetch(db="pubmed", id=ref, 
                           rettype="Medline", 
                           retmode="text")
    paper = Medline.read(handle)
    # Medline returns a dict from which we can extract the 
    # fields we desire
    print '-' * 30
    print paper['TI']
    print
    print paper['AB']

该手册内容广泛，但您只需阅读有关使用BioPython Entrez搜索和获取记录以及使用BioPython Medline解析结果的部分

你应该加上你的名字code@PadraicCunningham我已经添加了代码。您无法查看

'EntrezSystem2.PEntrez.PubMed.PubMed_ResultsPanel.Entrez_Pager.CurrPage'：4

和do xrange（1，n）并使用结果而不是4？？？@akira假设我只想查看第4页。我目前没有迭代。只是尝试获取具有给定页码的页面。您获取一页。如果你想取更多：取更多。我不明白你的问题，那太好了。还有一件事。我只需要摘要。如何从handle.read（）@user3286661中提取它——我已经编辑了这个片段，其中包括使用Medline模块将返回的记录解析为Python dict