Python 抓取网页的评论部分_Python_Beautifulsoup

Python 抓取网页的评论部分

python

Python 抓取网页的评论部分,python,beautifulsoup,Python,Beautifulsoup,我正试图从网页上刮取评论计数器。代码如下所示当我要求它打印字母时，输出是一个空列表。为什么会这样 import urllib2 from bs4 import BeautifulSoup r2 = urllib2.urlopen("http://www.ign.com/articles/2016/01/03/steam-surpasses-12-million-concurrent-users").read() soup2 = BeautifulSoup(r2) letters = sou

我正试图从网页上刮取评论计数器。代码如下所示

当我要求它打印字母时，输出是一个空列表。为什么会这样

import urllib2
from bs4 import BeautifulSoup 
r2 = urllib2.urlopen("http://www.ign.com/articles/2016/01/03/steam-surpasses-12-million-concurrent-users").read()

soup2 = BeautifulSoup(r2)
letters = soup2.find_all("div",class_="fyre-comment-count")
print letters

您的代码非常接近，几乎正确。你只是错过了一些东西。检查下面的代码

import urlparse
from bs4 import BeautifulSoup
import urllib2
r2 = urllib2.urlopen("http://www.ign.com/articles/2016/01/03/steam-surpasses-12-million-concurrent-users").read()
soup = BeautifulSoup(r2, 'html.parser')
for line in soup.find_all("div",class_="fyre-comment-count"):
    comments = ''.join(line.find_all(text=True))
    print (comments)

列表为空，因为该页上没有注释

div#livefyre comment

为空，

div.fyre-comment-count

不存在

在页面的标题中，有一个可疑的

script

标记从

http://cdn.livefyre.com/Livefyre.js

。我不知道Livefyre是什么，但我假设它从某个数据库中提取注释，并将它们插入

div#Livefyre comment

或其周围的

div.article-comments

。脚本完成后，

div.fyre-comment-count

可能还会出现在DOM中的某个地方

这种。。。设计决策在网站上越来越普遍。要查看网页的真实外观，请在关闭JavaScript和Cookie的情况下浏览网页（并做好准备，应对从未想到会出现这种流氓行为的网站偶尔出现的“500内部服务器错误”）

我对屏幕清理的了解还不够，无法告诉你接下来该怎么做。您可以拼凑一个URL，直接从Livefyre获取评论（及其数量）。首先，我将仔细阅读它们提供的JavaScript函数，以及

div#livefyre comment

的

data settings

属性，该属性似乎是一个包含相关参数的JSON字典