Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/16.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 3.x BeautifulSoup get_text返回非类型对象_Python 3.x_Web Scraping_Beautifulsoup - Fatal编程技术网

Python 3.x BeautifulSoup get_text返回非类型对象

Python 3.x BeautifulSoup get_text返回非类型对象,python-3.x,web-scraping,beautifulsoup,Python 3.x,Web Scraping,Beautifulsoup,我正在尝试使用BeautifulSoup进行网页抓取,我需要从中提取标题,特别是从“更多”标题部分。这是迄今为止我尝试使用的代码 import requests from bs4 import BeautifulSoup from csv import writer response = requests.get('https://www.cnbc.com/finance/?page=1') soup = BeautifulSoup(response.text,'html.parser')

我正在尝试使用BeautifulSoup进行网页抓取,我需要从中提取标题,特别是从“更多”标题部分。这是迄今为止我尝试使用的代码

import requests
from bs4 import BeautifulSoup
from csv import writer

response = requests.get('https://www.cnbc.com/finance/?page=1')

soup = BeautifulSoup(response.text,'html.parser')

posts = soup.find_all(id='pipeline')

for post in posts:
    data = post.find_all('li')
    for entry in data:
        title = entry.find(class_='headline')
        print(title)
运行此代码将以以下输出格式显示页面中的所有标题:

<div class="headline">
<a class=" " data-nodeid="105372063" href="/2018/08/02/after-apple-rallies-to-1-trillion-even-the-uber-bullish-crowd-on-wal.html">
           {{{*HEADLINE TEXT HERE*}}}
</a> </div>
然后是此错误:

Traceback (most recent call last):
  File "C:\Users\Tanay Roman\Documents\python projects\scrapper.py", line 16, in <module>
    title = entry.find(class_='headline').get_text()
AttributeError: 'NoneType' object has no attribute 'get_text'
回溯(最近一次呼叫最后一次):
文件“C:\Users\Tanay-Roman\Documents\python-projects\scraster.py”,第16行,在
title=entry.find(class='headline').get_text()
AttributeError:“非类型”对象没有属性“获取文本”

为什么添加get_text()方法只返回部分结果。我该如何解决呢?

您误解了错误消息。不是
.get_text()
调用返回
NoneType
对象,而是
NoneType
类型的对象没有该方法

只有一个类型为
NoneType
的对象,值为
None
。这里它是由
entry.find(class='headline')
返回的,因为它在
entry
中找不到与搜索条件匹配的元素。换句话说,对于该
条目
元素,没有类为
标题
的子元素

有两个这样的
  • 元素,一个id为
    nativedvriver3
    ,另一个id为
    nativedvriver9
    ,这两个元素都会出现错误。您需要首先检查是否有匹配的元素:

    for entry in data:
        headline = entry.find(class_='headline')
        if headline is not None:
            title = headline.get_text()
    
    如果您使用以下工具,您的工作会轻松得多:

    这将产生:

    >>> headlines = soup.select('#pipeline li .headline')
    >>> for headline in headlines:
    ...     headline_text = headline.get_text(strip=True)
    ...     print(headline_text)
    ...
    Hedge funds fight back against tech in the war for talent
    Goldman Sachs sees more price pain ahead for bitcoin
    Dish Network shares rise 15% after subscriber losses are less than expected
    Bitcoin whale makes ‘enormous’ losing bet, so now other traders have to foot the bill
    The 'Netflix of fitness' looks to become a publicly traded stock as soon as next year
    Amazon slammed for ‘insult’ tax bill in the UK despite record profits
    Nasdaq could plunge 15 percent or more as ‘rolling bear market’ grips stocks: Morgan Stanley
    Take-Two shares surge 9% after gamemaker beats expectations due to 'Grand Theft Auto Online'
    UK bank RBS announces first dividend in 10 years
    Michael Cohen reportedly secured a $10 million deal with Trump donor to advance a nuclear project
    After-hours buzz: GPRO, AIG & more
    Bitcoin is still too 'unstable' to become mainstream money, UBS says
    Apple just hit a trillion but its stock performance has been dwarfed by the other tech giants
    The first company to ever reach $1 trillion in market value was in China and got crushed
    Apple at a trillion-dollar valuation isn’t crazy like the dot-com bubble
    After Apple rallies to $1 trillion, even the uber bullish crowd on Wall Street believes it may need to cool off
    

    返回非类型对象的不是
    get_text()
    ,而是
    .find()
    调用return
    None
    ,这意味着至少有一个
    li
    元素没有包含类
    headline
    的元素。
    headlines = soup.select('#pipeline li .headline')
    for headline in headlines:
        headline_text = headline.get_text(strip=True)
        print(headline_text)
    
    >>> headlines = soup.select('#pipeline li .headline')
    >>> for headline in headlines:
    ...     headline_text = headline.get_text(strip=True)
    ...     print(headline_text)
    ...
    Hedge funds fight back against tech in the war for talent
    Goldman Sachs sees more price pain ahead for bitcoin
    Dish Network shares rise 15% after subscriber losses are less than expected
    Bitcoin whale makes ‘enormous’ losing bet, so now other traders have to foot the bill
    The 'Netflix of fitness' looks to become a publicly traded stock as soon as next year
    Amazon slammed for ‘insult’ tax bill in the UK despite record profits
    Nasdaq could plunge 15 percent or more as ‘rolling bear market’ grips stocks: Morgan Stanley
    Take-Two shares surge 9% after gamemaker beats expectations due to 'Grand Theft Auto Online'
    UK bank RBS announces first dividend in 10 years
    Michael Cohen reportedly secured a $10 million deal with Trump donor to advance a nuclear project
    After-hours buzz: GPRO, AIG & more
    Bitcoin is still too 'unstable' to become mainstream money, UBS says
    Apple just hit a trillion but its stock performance has been dwarfed by the other tech giants
    The first company to ever reach $1 trillion in market value was in China and got crushed
    Apple at a trillion-dollar valuation isn’t crazy like the dot-com bubble
    After Apple rallies to $1 trillion, even the uber bullish crowd on Wall Street believes it may need to cool off