Python 使用beautifulsoup进行刮除<；h2>；标签_Python_Web Scraping_Beautifulsoup

Python 使用beautifulsoup进行刮除<；h2>；标签

python web-scraping

Python 使用beautifulsoup进行刮除<；h2>；标签,python,web-scraping,beautifulsoup,Python,Web Scraping,Beautifulsoup,我正在用漂亮的汤刮一个网站的数据。我想要以下的锚定值（我的名字是nick）。但我在谷歌上搜索了很多，但找不到任何完美的解决方案来解决我的问题 news_panel = soup.findAll('div', {'class': 'menuNewsPanel_MenuNews1'}) for news in news_panel: temp = news.find('h2') print temp 输出： <h2 class="menuNewsHl

我正在用漂亮的汤刮一个网站的数据。我想要以下的锚定值（我的名字是nick）。但我在谷歌上搜索了很多，但找不到任何完美的解决方案来解决我的问题

news_panel = soup.findAll('div', {'class': 'menuNewsPanel_MenuNews1'})
for news in news_panel:
    temp = news.find('h2')        
    print temp

输出：

<h2 class="menuNewsHl2_MenuNews1"><a href="index.php?ref=MjBfMDFfMDhfMTRfMV84XzFfOTk2NDA=">My name is nick</a></h2>

但是我想要这样的输出：

我的名字是nick

只要抓取

文本

属性：

>>> soup = BeautifulSoup('''<h2 class="menuNewsHl2_MenuNews1"><a href="index.php?ref=MjBfMDFfMDhfMTRfMV84XzFfOTk2NDA=">My name is nick</a></h2>''')
>>> soup.text
u'My name is nick'

>>汤=美丽的汤
>>>soup.text
我叫尼克

只需抓取

文本

属性：

>>> soup = BeautifulSoup('''<h2 class="menuNewsHl2_MenuNews1"><a href="index.php?ref=MjBfMDFfMDhfMTRfMV84XzFfOTk2NDA=">My name is nick</a></h2>''')
>>> soup.text
u'My name is nick'

>>汤=美丽的汤
>>>soup.text
我叫尼克

您的错误可能是因为您的输入字符串中没有特定的标记

检查

temp

是否为无

news_panel = soup.findAll('div', {'class': 'menuNewsPanel_MenuNews1'})
for news in news_panel:
    temp = news.find('h2')
    if temp:
        print temp.text

或者将打印语句放入

中，然后重试。。。除了

块

news_panel = soup.findAll('div', {'class': 'menuNewsPanel_MenuNews1'})
for news in news_panel:
    try:
        print news.find('h2').text
    except AttributeError:
        continue

您的错误可能是因为您的输入字符串中没有特定的标记

检查

temp

是否为无

news_panel = soup.findAll('div', {'class': 'menuNewsPanel_MenuNews1'})
for news in news_panel:
    temp = news.find('h2')
    if temp:
        print temp.text

或者将打印语句放入

中，然后重试。。。除了

块

news_panel = soup.findAll('div', {'class': 'menuNewsPanel_MenuNews1'})
for news in news_panel:
    try:
        print news.find('h2').text
    except AttributeError:
        continue

尝试使用以下方法：

all_string=soup.find_all("h2")[0].get_text()

尝试使用以下方法：

all_string=soup.find_all("h2")[0].get_text()

我希望它在函数中。我试过临时文本。但是它显示的是error.temp3=temp.text AttributeError:“NoneType”对象没有属性“text”，代码是：for news\u面板中的news:temp=news.find（'h2'）temp3=temp.textAh，这意味着并非所有

news.find（'h2'）

元素都有文本。是否要绕过没有文本的标记？首先检查标记是否有文本：

如果temp3.text不是None:

我希望它出现在函数中。我试过临时文本。但是它显示的是error.temp3=temp.text AttributeError:“NoneType”对象没有属性“text”，代码是：for news\u面板中的news:temp=news.find（'h2'）temp3=temp.textAh，这意味着并非所有

news.find（'h2'）

元素都有文本。是否要绕过没有文本的标记？请先检查标记是否有文本：

如果temp3.text不是None:

请跳过上下文？请跳过上下文？