Python 餐桌和靓汤的问题_Python_Beautifulsoup

Python 餐桌和靓汤的问题

python

Python 餐桌和靓汤的问题,python,beautifulsoup,Python,Beautifulsoup,我试图删除嵌套在tr标记中的标记，但用于查找正确值的标识符嵌套在tr标记中的另一个td中也就是说，我正在使用这个网站并试图根据一个名字（例如，Ahri）获取统计数据 HTML是： <tr> <td data-sorttype="string" data-sortval="Ahri" style="text-align: left;"> <div style="display: table-cell;">

我试图删除嵌套在tr标记中的标记，但用于查找正确值的标识符嵌套在tr标记中的另一个td中

也就是说，我正在使用这个网站

并试图根据一个名字（例如，Ahri）获取统计数据

HTML是：

<tr>
            <td data-sorttype="string" data-sortval="Ahri" style="text-align: left;">
                <div style="display: table-cell;">
                <div class="champion-list-icon" style="background:url(//lkimg.zamimg.com/shared/riot/images/champions/103_32.png)">
                    <a style="display: inline-block; width: 28px; height: 28px;" href="/champions/ahri"></a>
                </div>
                </div>
                <div style="display: table-cell; vertical-align: middle; padding-top: 3px; padding-left: 5px;"><a href="/champions/ahri">Ahri</a></div>
            </td>
            <td style="text-align: center;"  data-sortval="975"><img src='//lkimg.zamimg.com/images/rp_logo.png' width='18' class='champion-price-icon'>975</td>
            <td style="text-align: center;" data-sortval="6300"><img src='//lkimg.zamimg.com/images/ip_logo.png' width='18' class='champion-price-icon'>6300</td>
            <td style="text-align: center;" data-sortval="10.98">10.98%</td>
            <td style="text-align: center;" data-sortval="48.44">48.44%</td>
            <td style="text-align: center;" data-sortval="18.85">18.85%</td>
            <td style="text-align: center;" data-sorttype="string" data-sortval="Middle Lane">Middle Lane</td>
            <td style="text-align: center;" data-sortval="1323849600">12/14/2011</td>
        </tr>

只有那张桌子

编辑：

我为变量键入了以下内容：

for champ in champs:
    a = str(champ)
    print type(a) is str
    td_name = soup.find('td',{"data-sortval":a})

它确认a是一个字符串。但它抛出了一个错误：

  File "lolrec.py", line 82, in StatScrape
    tr = td_name.parent
AttributeError: 'NoneType' object has no attribute 'parent'

加油，哈哈

出于商业目的，请在刮片前阅读服务条款。

（1）要获取英雄列表，您可以这样做，这与您描述的逻辑类似

from bs4 import BeautifulSoup
import urllib2
html = urllib2.urlopen('http://www.lolking.net/champions/')
soup = BeautifulSoup(html)
# locate the cell that contains hero name: Ahri 
hero_list = ["Blitzcrank", "Ahri", "Akali"]
for hero in hero_list:
    td_name = soup.find('td', {"data-sortval":hero})
    tr = td_name.parent
    popularity = tr.find_all('td', recursive=False)[3].text
    print hero, popularity

输出

Blitzcrank 12.58%
Ahri 10.98%
Akali 7.52%

10.98%

输出

Blitzcrank 12.58%
Ahri 10.98%
Akali 7.52%

10.98%

（2）把所有的英雄都杀了

from bs4 import BeautifulSoup
import urllib2
html = urllib2.urlopen('http://www.lolking.net/champions/')
soup = BeautifulSoup(html)
# find the table first
table = soup.find('table', {"class":"clientsort champion-list"})
# find the all the rows
for row in table.find('tbody').find_all("tr", recursive=False):
    cols = row.find_all("td")
    hero = cols[0].text.strip()
    popularity = cols[3].text
    print hero, popularity

输出：

Aatrox 6.86%
Ahri 10.98%
Akali 7.52%
Alistar 4.9%
Amumu 8.75%
...

加油，哈哈

出于商业目的，请在刮片前阅读服务条款。

（1）要获取英雄列表，您可以这样做，这与您描述的逻辑类似

from bs4 import BeautifulSoup
import urllib2
html = urllib2.urlopen('http://www.lolking.net/champions/')
soup = BeautifulSoup(html)
# locate the cell that contains hero name: Ahri 
hero_list = ["Blitzcrank", "Ahri", "Akali"]
for hero in hero_list:
    td_name = soup.find('td', {"data-sortval":hero})
    tr = td_name.parent
    popularity = tr.find_all('td', recursive=False)[3].text
    print hero, popularity

输出

Blitzcrank 12.58%
Ahri 10.98%
Akali 7.52%

10.98%

输出

Blitzcrank 12.58%
Ahri 10.98%
Akali 7.52%

10.98%

（2）把所有的英雄都杀了

from bs4 import BeautifulSoup
import urllib2
html = urllib2.urlopen('http://www.lolking.net/champions/')
soup = BeautifulSoup(html)
# find the table first
table = soup.find('table', {"class":"clientsort champion-list"})
# find the all the rows
for row in table.find('tbody').find_all("tr", recursive=False):
    cols = row.find_all("td")
    hero = cols[0].text.strip()
    popularity = cols[3].text
    print hero, popularity

输出：

Aatrox 6.86%
Ahri 10.98%
Akali 7.52%
Alistar 4.9%
Amumu 8.75%
...

非常感谢你！这实际上是为了研究目的，因为我是我大学的一名学生研究员。如果可能的话，我希望免费发布，但我一定会按照你的建议去做，并阅读服务条款。不过，我有一个问题。我如何设法更改soup.find（'td'，{“data sortval”：“Ahri”}）以使用变量代替“Ahri”，比如说字典的所有键？目前，我正在将键强制转换为字符串，然后尝试在for循环中传递它们，但似乎find不会接受变量headerssoup。find（“td”，{“data sortval”：variable}）你看，这是我的直觉，所以我做了更改（参见编辑的帖子），它抛出了一个错误（参见编辑的帖子）忽略，我发现了。网站上的字符串有标点符号，而我的数据结构中的字符串没有标点符号。这提出了一个有趣的问题。谢谢你的帮助！非常感谢你！这实际上是为了研究目的，因为我是我大学的一名学生研究员。如果可能的话，我希望免费发布，但我一定会按照你的建议去做，并阅读服务条款。不过，我有一个问题。我如何设法更改soup.find（'td'，{“data sortval”：“Ahri”}）以使用变量代替“Ahri”，比如说字典的所有键？目前，我正在将键强制转换为字符串，然后尝试在for循环中传递它们，但似乎find不会接受变量headerssoup。find（“td”，{“data sortval”：variable}）你看，这是我的直觉，所以我做了更改（参见编辑的帖子），它抛出了一个错误（参见编辑的帖子）忽略，我发现了。网站上的字符串有标点符号，而我的数据结构中的字符串没有标点符号。这提出了一个有趣的问题。谢谢你的帮助！