Python web scrape篮球参考如何排除4月份的季后赛_Python_Beautifulsoup

Python web scrape篮球参考如何排除4月份的季后赛

python

Python web scrape篮球参考如何排除4月份的季后赛,python,beautifulsoup,Python,Beautifulsoup,我在努力刮我能够勉强获得10月到3月的结果，但是4月我遇到了问题，因为有一个季后赛（季后赛）的thead），它显示ValueError:Unknown string format。我想结束或跳过这一行这是我的密码： data = [[td.getText() for td in data_rows[i].findAll(['th','td'])] for i in range(len(data_rows))] 循环浏览tr元素，并确保您没有进入“季后赛”tr，然后继续： from bs4

我在努力刮

我能够勉强获得10月到3月的结果，但是4月我遇到了问题，因为有一个季后赛（

季后赛

）的

thead

），它显示

ValueError:Unknown string format

。我想结束或跳过这一行

这是我的密码：

data = [[td.getText() for td in data_rows[i].findAll(['th','td'])] for i in range(len(data_rows))]

循环浏览

tr

元素，并确保您没有进入“季后赛”

tr

，然后继续：

from bs4 import BeautifulSoup
from urllib.request import urlopen

webpage = urlopen("https://www.basketball-reference.com/leagues/NBA_2017_games-april.html")
soup = BeautifulSoup(webpage, 'html.parser')
data_rows = soup.find('table', {"id": "schedule"}).find_all('tr') # find all the 'tr' elements

for tr in data_rows: 
    if tr.text.strip() != "Playoffs": # check if were on the 'Playoffs' title tr
        data = [td.text for td in tr.find_all(["td", "th"])]
        print(data)

如果您使用的是最新版本的BeautifulSoup，请注意。改用

find_all（）

。同样，使用

.text

而不是

getText（）

在

tr

元素中循环，并确保在继续之前您没有进入“季后赛”

tr

：

from bs4 import BeautifulSoup
from urllib.request import urlopen

webpage = urlopen("https://www.basketball-reference.com/leagues/NBA_2017_games-april.html")
soup = BeautifulSoup(webpage, 'html.parser')
data_rows = soup.find('table', {"id": "schedule"}).find_all('tr') # find all the 'tr' elements

for tr in data_rows: 
    if tr.text.strip() != "Playoffs": # check if were on the 'Playoffs' title tr
        data = [td.text for td in tr.find_all(["td", "th"])]
        print(data)

如果您使用的是最新版本的BeautifulSoup，请注意。改用

find_all（）

。类似地，使用

.text

而不是

getText（）

你正在抓取哪一页？该站点是。而季后赛正在制造错误。我想结束或跳过这节课。谢谢，我可能需要登录或者做些什么，但是。哦，这是因为你要去运动参考网站。应该是basketball-reference.com。然后转到四季，单击2016-2017，然后单击时间表和结果。最后去四月，你会发现我的问题。Thanks@Benchpress你能编辑你的问题并包括上面的信息和链接吗？现在还不清楚；你需要阅读评论才能完全理解这个问题。你在刮哪一页？这个网站是。而季后赛正在制造错误。我想结束或跳过这节课。谢谢，我可能需要登录或者做些什么，但是。哦，这是因为你要去运动参考网站。应该是basketball-reference.com。然后转到四季，单击2016-2017，然后单击时间表和结果。最后去四月，你会发现我的问题。Thanks@Benchpress你能编辑你的问题并包括上面的信息和链接吗？现在还不清楚；需要阅读注释才能完全理解问题。@SeanBreckenridge，方法

findAll（）

，

getText（）

等没有被弃用。bs4与bs2和bs3都具有向后兼容性。在中有这样的行->

findAll=find#all#BS3

@KeyurPotdar说：“尽管BS4基本上与BS3向后兼容，但它的大多数方法已被弃用，并为PEP 8合规性赋予了新名称。”。我的印象是，这只是对使用函数的劝阻，而不是它不起作用。不推荐使用不是正确的词吗？我的不好。我的印象是，弃用意味着该函数已过时（不能再使用），但我读到这是一个警告，将来它可能会过时。不推荐在这里使用。@SeanBreckenridge，方法

findAll（）

，

getText（）

等不推荐使用。bs4与bs2和bs3都具有向后兼容性。在中有这样的行->

findAll=find#all#BS3