Python 3.x 制作webcrawler-不会进入我的for循环_Python 3.x_Web Crawler

Python 3.x 制作webcrawler-不会进入我的for循环

python-3.x web-crawler

Python 3.x 制作webcrawler-不会进入我的for循环,python-3.x,web-crawler,Python 3.x,Web Crawler,我正在做一个网络爬虫游戏。基本上我想做的就是抓取这个页面首先，让所有的主队上场。这是我的密码： def urslit_spider(max_years): year = 2010 while year <= max_years: url = 'http://www.premierleague.com/content/premierleague/en-gb/matchday/results.html?paramClubId=ALL&paramComp_8=true&am

我正在做一个网络爬虫游戏。基本上我想做的就是抓取这个页面

首先，让所有的主队上场。这是我的密码：

def urslit_spider(max_years):

year = 2010
while year <= max_years:
    url = 'http://www.premierleague.com/content/premierleague/en-gb/matchday/results.html?paramClubId=ALL&paramComp_8=true&paramSeasonId=' + str(year) + '&view=.dateSeason'
    source_code = requests.get(url)
    plain_text = source_code.text 
    soup = BeautifulSoup(plain_text, "html.parser")
    for link in soup.findAll('a', {'class' : 'clubs rHome'}):
        lid = link.string
        print(lid)
    year += 1

def urslit_spider（最长年）：
年份=2010年
而你提供的链接将我重定向到了主页。修改我找到的URL
在这个URL中，我使用
soup.findAll('td', {'class' : 'home'}):

如何导航到您提供的链接？也许那个页面上的HTML是不同的
编辑：此网站的内容似乎是从以下URL加载的：
通过修改url参数，您可以找到大量信息。
我仍然无法打开您提供的url，它一直在重定向我，但在我提供的链接中，我无法从html（和BeautifulSoup）中提取表信息，因为它正在从上面的JSON收集信息
最好的方法是使用json获取所需的信息。我的建议是使用python中的json包
如果您不熟悉JSON，可以使用此网站使JSON更具可读性：
在您的soup.findAll（）语句中找不到任何链接。请检查来源，然后再试一次。很抱歉，我是这方面的新手，但是这个链接：是否会将您引导到一个页面，其中显示了整个赛季的结果？然后我的想法是，你只需要在url中将2010年更改为2011年，就可以从2011年开始，以此类推。。当我使用你的URl时，我也不会得到主队。对于soup.findAll（'td'，{'class'：'home'}）中的链接，print（'hae'）lid=link.string print（lid）嘿@Sjonni，检查我所做的编辑，也许它能有所帮助