Web scraping Can';t检索属性之间的值,刮取

Web scraping Can';t检索属性之间的值,刮取,web-scraping,beautifulsoup,urllib2,Web Scraping,Beautifulsoup,Urllib2,我试图从一个站点检索一个值,但是在属性之间我没有得到任何值(除了id=Avg)。我试着用清汤和靓汤都没用! 以下是我的BeautifulSoup/Urllib2代码: import urllib2 from bs4 import BeautifulSoup site = "http://www.lolking.net/champions/singed?#/overview" request= urllib2.Request(site, headers={'User-Agent':'Chrom

我试图从一个站点检索一个值,但是在属性之间我没有得到任何值(除了id=Avg)。我试着用清汤和靓汤都没用! 以下是我的BeautifulSoup/Urllib2代码:

import urllib2
from bs4 import BeautifulSoup


site = "http://www.lolking.net/champions/singed?#/overview"
request= urllib2.Request(site, headers={'User-Agent':'Chrome/44.0.2403.107'})
response = urllib2.urlopen(request)
html = response.read()

soup = BeautifulSoup(html, 'lxml')

champ_stats = soup.findAll('div', attrs={"class" : "champ-stats"})

champ_stats2 = soup.findAll('strong', attrs={"class" : "champ-stats"})


for x in champ_stats:
    print x.text, x

print '\n now showing more specifically: \n'    
for x in champ_stats2:
    print x.text, x
我还使用Scrapy制作了一个刮刀(得到了相同的结果):

这是html在浏览器中的外观(我要检索的内容):

html=”“”
48.3%
获胜率
0.8%
人气
0.5%
禁烟率
10.2
平均播放
"""

我猜这个网站有一个防止人们删除这些数据的方法吗?如果是这样的话,有办法解决吗?

您最好使用requests模块,而不是urllib2;只是使用起来更简单。我应该提到,尽管BeautifulSoup可能不足以完全刮除此页面,这取决于您想要从中获得什么。你可能需要使用硒或刮痧

>>> import requests
>>> page = requests.get('http://www.lolking.net/champions/singed?#/overview').content
>>> import bs4
>>> soup = bs4.BeautifulSoup(page, 'lxml')
>>> champ_stats = soup.findAll('div', attrs={"class" : "champ-stats"})
>>> for x in champ_stats:
...     x.text, x
...     
('\n%\nWin Rate\n', <div class="champ-stats">
<strong id="winrate"></strong><small>%</small>
<span>Win Rate</span>
</div>)
('\n%\nPopularity\n', <div class="champ-stats">
<strong id="popularity"></strong><small>%</small>
<span>Popularity</span>
</div>)
('\n%\nBan Rate\n', <div class="champ-stats">
<strong id="banrate"></strong><small>%</small>
<span>Ban Rate</span>
</div>)
('\n10.2\nAvg Played\n', <div class="champ-stats">
<strong>10.2</strong>
<span>Avg Played</span>
</div>)

嗨,比尔,我求助于刮痧,但没有成功。问题是我没有得到id为“winrate”、“banrate”和“popularity”的强属性之间的任何值。嗨,芬恩:请参见编辑。如果你需要使用Scrapy,请发表评论。谢谢!我试了一下。然而,我打算将其用于模拟网站。猜测硒的使用对于在网站上显示数据来说太慢了?不客气。你可能需要的是:刮水、飞溅和刮水飞溅的组合。splash是一种轻量级浏览器,它“执行”页面中的所有Javascript并返回DOM。刮擦飞溅是一种危险。猜猜我为什么不在一个答案中提出讨论这个问题
html = """            <div class="champ-stats">
                <strong id="winrate">48.3</strong><small>%</small>
                <span>Win Rate</span>
            </div>
            <div class="divider"></div>
            <div class="champ-stats">
                <strong id="popularity">0.8</strong><small>%</small>
                <span>Popularity</span>
            </div>
            <div class="divider"></div>
            <div class="champ-stats">
                <strong id="banrate">0.5</strong><small>%</small>
                <span>Ban Rate</span>
            </div>
            <div class="divider"></div>
            <div class="champ-stats">
                <strong>10.2</strong>
                <span>Avg Played</span>
            </div>
        </div> """
>>> import requests
>>> page = requests.get('http://www.lolking.net/champions/singed?#/overview').content
>>> import bs4
>>> soup = bs4.BeautifulSoup(page, 'lxml')
>>> champ_stats = soup.findAll('div', attrs={"class" : "champ-stats"})
>>> for x in champ_stats:
...     x.text, x
...     
('\n%\nWin Rate\n', <div class="champ-stats">
<strong id="winrate"></strong><small>%</small>
<span>Win Rate</span>
</div>)
('\n%\nPopularity\n', <div class="champ-stats">
<strong id="popularity"></strong><small>%</small>
<span>Popularity</span>
</div>)
('\n%\nBan Rate\n', <div class="champ-stats">
<strong id="banrate"></strong><small>%</small>
<span>Ban Rate</span>
</div>)
('\n10.2\nAvg Played\n', <div class="champ-stats">
<strong>10.2</strong>
<span>Avg Played</span>
</div>)
>>> from selenium import webdriver
>>> driver = webdriver.Chrome()
>>> driver.get('http://www.lolking.net/champions/singed?#/overview')
>>> for item in driver.find_elements_by_xpath('.//div[@class="champ-stats"]/strong'):
...     item.text
...     
'48.4'
'0.8'
'0.4'
'10.2'