Python-web scraping pubmed.gov摘要w/BeautifulSoup-获取非类型错误_Python_Text_Web Scraping_Beautifulsoup_Pubmed

Python-web scraping pubmed.gov摘要w/BeautifulSoup-获取非类型错误

python text web-scraping

Python-web scraping pubmed.gov摘要w/BeautifulSoup-获取非类型错误,python,text,web-scraping,beautifulsoup,pubmed,Python,Text,Web Scraping,Beautifulsoup,Pubmed,我正在从网站上抓取摘要，除了没有文本的摘要外，它大部分都在工作。我尝试了一个IF语句，但显然我做得不对。我如何做到这一点，让它跳过没有抽象文本的URL？我提供了一个发生这种情况的URL 我得到这个错误：AttributeError:'NoneType'对象没有属性'find' 提前谢谢 import requests from bs4 import BeautifulSoup listofa_urls = ['https://www.ncbi.nlm.nih.gov/pubmed/311035

我正在从网站上抓取摘要，除了没有文本的摘要外，它大部分都在工作。我尝试了一个IF语句，但显然我做得不对。我如何做到这一点，让它跳过没有抽象文本的URL？我提供了一个发生这种情况的URL

我得到这个错误：AttributeError:'NoneType'对象没有属性'find'

提前谢谢

import requests
from bs4 import BeautifulSoup

listofa_urls = ['https://www.ncbi.nlm.nih.gov/pubmed/31103571']

for th in listofa_urls:

    response = requests.get(th)
    soup = BeautifulSoup(response.content, 'html.parser')

    if (soup.find(class_='abstr').find('div') is not None):
       div_ = soup.find(class_='abstr').find('div')
       if div_.find('h4'):
           h4_ = div_.find_all('h4')
           p_ = div_.find_all('p')
       else:
           h4_ = soup.find(class_='abstr').find_all('h3')
           p_ = soup.find(class_='abstr').find_all('p')

       mp = list(map(lambda x, y: [x.get_text(),y.get_text()], h4_, p_))
       print(mp)

如注释中所述，您不能将

.find（）

设置为“无”，因此只需检查第一个

find

是否找到任何内容

只需删除第二个

查找：
if (soup.find(class_='abstr').find('div') is not None):

变成
if (soup.find(class_='abstr') is not None)

哪一行有错误？如果打印（dir（soup））
，会发生什么情况？我得到的错误是：if（soup.find（class='absr'）。find（'div'）不是None）：我在html源代码中没有看到任何带有class='absr'
的标记，因此.find（'div'）
将不起作用，因为在（soup.find（class='absr'））时不能使用.find（）
没有返回anything@chitown88正如OP所说，这正是问题所在！：）当soup.find（）
返回None
时，他如何处理这个问题呢。谢谢。这可能会错过一个假设的情况，即“class='abstr'”元素存在，但“div”不存在，在该情况下，它将在下一行抛出与前面相同的“no attribute'find'”错误。为了解释这种可能性，除了这个答案之外，我还将后续条件从“if div_uu.find（'h4'）”调整为“if div_u'note None and div_uu.find（'h4'）”，如果我看得好，如果有一个摘要，就有一个h4。对于这种情况，我认为这是没有必要的。但对于其他来源，你是完全正确的