Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/19.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 3.x 为什么我会得到重复链接?如何获取下一页的链接?_Python 3.x_Beautifulsoup - Fatal编程技术网

Python 3.x 为什么我会得到重复链接?如何获取下一页的链接?

Python 3.x 为什么我会得到重复链接?如何获取下一页的链接?,python-3.x,beautifulsoup,Python 3.x,Beautifulsoup,我得到重复的链接时,我试图获得的链接,我不知道为什么。此外,我还试图获取所有的链接,就像我从所有页面获得的链接一样。但我不知道如何编写代码来单击下一页。有人能帮我理解我会怎么做吗 import requests from bs4 import BeautifulSoup url = 'http://www.gosugamers.net/counterstrike/teams' r = requests.get(url) page = r.text soup = BeautifulSoup(

我得到重复的链接时,我试图获得的链接,我不知道为什么。此外,我还试图获取所有的链接,就像我从所有页面获得的链接一样。但我不知道如何编写代码来单击下一页。有人能帮我理解我会怎么做吗

import requests

from bs4 import BeautifulSoup

url = 'http://www.gosugamers.net/counterstrike/teams'
r = requests.get(url)
page = r.text

soup = BeautifulSoup(page)

#all_teams = []

for team_links in soup.find_all('a', href=True):
    if team_links['href'] == '' or team_links['href'].startswith('/counterstrike/teams'):
       print (team_links.get('href').replace('/counterstrike/teams', url))
团队链接位于h3标记内的锚定标记中,h3标记位于具有详细信息类的div内:

这给了你:

http://www.gosugamers.net/counterstrike/teams/16338-motv
http://www.gosugamers.net/counterstrike/teams/16337-absolute-monster
http://www.gosugamers.net/counterstrike/teams/16258-immortals-cs
http://www.gosugamers.net/counterstrike/teams/16251-ireal-star-gaming
http://www.gosugamers.net/counterstrike/teams/16176-team-genesis
http://www.gosugamers.net/counterstrike/teams/16175-potadies
http://www.gosugamers.net/counterstrike/teams/16174-crowns-gg
http://www.gosugamers.net/counterstrike/teams/16173-visomvet
http://www.gosugamers.net/counterstrike/teams/16172-team-phenomenon
http://www.gosugamers.net/counterstrike/teams/16152-kriklekrakle
http://www.gosugamers.net/counterstrike/teams/16148-begenius
http://www.gosugamers.net/counterstrike/teams/16144-blubblub
http://www.gosugamers.net/counterstrike/teams/16142-team-1231
http://www.gosugamers.net/counterstrike/teams/16141-vsv
http://www.gosugamers.net/counterstrike/teams/16140-tbi
http://www.gosugamers.net/counterstrike/teams/16136-deadweight
http://www.gosugamers.net/counterstrike/teams/16135-me-myself-and-i
http://www.gosugamers.net/counterstrike/teams/16085-pur-esports
http://www.gosugamers.net/counterstrike/teams/15850-falken
http://www.gosugamers.net/counterstrike/teams/15815-team-abyssal
您实际上是在解析页面上的所有链接,这就是您看到重复的原因

为了获得所有团队,我们可以解析下一页链接,直到带有
“next”
文本的跨度不再存在,这只发生在最后一页:

def get_all(url, base):
    r = requests.get(url)
    page = r.text

    soup = BeautifulSoup(page)
    for team_links in soup.select("div.details h3 a"):
        yield (urljoin(base, team_links["href"]))
    nxt = soup.find("div", {"class": "pages"}).find("span", text="Next")
    while nxt:
        r = requests.get(urljoin(base, nxt.find_previous("a")["href"]))
        page = r.text
        soup = BeautifulSoup(page)
        for team_links in soup.select("div.details h3 a"):
            yield (urljoin(base, team_links["href"]))
        nxt = soup.find("div", {"class": "pages"}).find("span", text="Next")
如果我们运行几秒钟,您可以看到接下来的页面:

In [26]: for link in (get_all(url, base)):
   ....:         print(link)
   ....:     
http://www.gosugamers.net/counterstrike/teams/16386-cantonese-cs
http://www.gosugamers.net/counterstrike/teams/16338-motv
http://www.gosugamers.net/counterstrike/teams/16337-absolute-monster
http://www.gosugamers.net/counterstrike/teams/16258-immortals-cs
http://www.gosugamers.net/counterstrike/teams/16251-ireal-star-gaming
http://www.gosugamers.net/counterstrike/teams/16176-team-genesis
http://www.gosugamers.net/counterstrike/teams/16175-potadies
http://www.gosugamers.net/counterstrike/teams/16174-crowns-gg
http://www.gosugamers.net/counterstrike/teams/16173-visomvet
http://www.gosugamers.net/counterstrike/teams/16172-team-phenomenon
http://www.gosugamers.net/counterstrike/teams/16152-kriklekrakle
http://www.gosugamers.net/counterstrike/teams/16148-begenius
http://www.gosugamers.net/counterstrike/teams/16144-blubblub
http://www.gosugamers.net/counterstrike/teams/16142-team-1231
http://www.gosugamers.net/counterstrike/teams/16141-vsv
http://www.gosugamers.net/counterstrike/teams/16140-tbi
http://www.gosugamers.net/counterstrike/teams/16136-deadweight
http://www.gosugamers.net/counterstrike/teams/16135-me-myself-and-i
http://www.gosugamers.net/counterstrike/teams/16085-pur-esports
http://www.gosugamers.net/counterstrike/teams/15850-falken
http://www.gosugamers.net/counterstrike/teams/15815-team-abyssal
http://www.gosugamers.net/counterstrike/teams/15810-ex-deathtrap
http://www.gosugamers.net/counterstrike/teams/15808-mix123
http://www.gosugamers.net/counterstrike/teams/15651-undertake-esports
http://www.gosugamers.net/counterstrike/teams/15644-five
http://www.gosugamers.net/counterstrike/teams/15630-five
http://www.gosugamers.net/counterstrike/teams/15627-inetkoxtv
http://www.gosugamers.net/counterstrike/teams/15626-tetr-s
http://www.gosugamers.net/counterstrike/teams/15625-rozenoir-esports-white
http://www.gosugamers.net/counterstrike/teams/15619-fragment-gg
http://www.gosugamers.net/counterstrike/teams/15615-monarchs-gg
http://www.gosugamers.net/counterstrike/teams/15602-ottoman-fire
http://www.gosugamers.net/counterstrike/teams/15591-respect
http://www.gosugamers.net/counterstrike/teams/15569-moonbeam-gaming
http://www.gosugamers.net/counterstrike/teams/15563-team-tilt
http://www.gosugamers.net/counterstrike/teams/15534-dynasty-uk
http://www.gosugamers.net/counterstrike/teams/15507-urbantech
http://www.gosugamers.net/counterstrike/teams/15374-innova
http://www.gosugamers.net/counterstrike/teams/15373-g3x
http://www.gosugamers.net/counterstrike/teams/15372-cnb
http://www.gosugamers.net/counterstrike/teams/15370-intz
http://www.gosugamers.net/counterstrike/teams/15369-2kill
http://www.gosugamers.net/counterstrike/teams/15368-supernova
http://www.gosugamers.net/counterstrike/teams/15367-biggods
http://www.gosugamers.net/counterstrike/teams/15366-playzone
http://www.gosugamers.net/counterstrike/teams/15365-pride
http://www.gosugamers.net/counterstrike/teams/15359-rising-orkam
http://www.gosugamers.net/counterstrike/teams/15342-team-foxez
http://www.gosugamers.net/counterstrike/teams/15336-angels
http://www.gosugamers.net/counterstrike/teams/15331-atlando-esports
http://www.gosugamers.net/counterstrike/teams/15329-xfinity-esports
http://www.gosugamers.net/counterstrike/teams/15326-nano-reapers
http://www.gosugamers.net/counterstrike/teams/15322-erase-team
http://www.gosugamers.net/counterstrike/teams/15318-heyguys
http://www.gosugamers.net/counterstrike/teams/15317-illusory
http://www.gosugamers.net/counterstrike/teams/15285-dismay
http://www.gosugamers.net/counterstrike/teams/15284-kingdom-esports
http://www.gosugamers.net/counterstrike/teams/15283-team-rival
http://www.gosugamers.net/counterstrike/teams/15282-ze-pug-godz
http://www.gosugamers.net/counterstrike/teams/15281-unlimited-potential1
您可以在源代码的第一页和最后一页的任何栏中看到带有
Next
的跨距:

当我们到达最后一个时,只有上一个和第一个的跨度:


非常感谢,这帮了大忙。我还将研究我以前从未见过的新功能!没问题,选择使用css选择器,您可以将find和find_all链接起来,但我不喜欢键入,所以如果可能的话,我更喜欢选择器的简洁性。
In [26]: for link in (get_all(url, base)):
   ....:         print(link)
   ....:     
http://www.gosugamers.net/counterstrike/teams/16386-cantonese-cs
http://www.gosugamers.net/counterstrike/teams/16338-motv
http://www.gosugamers.net/counterstrike/teams/16337-absolute-monster
http://www.gosugamers.net/counterstrike/teams/16258-immortals-cs
http://www.gosugamers.net/counterstrike/teams/16251-ireal-star-gaming
http://www.gosugamers.net/counterstrike/teams/16176-team-genesis
http://www.gosugamers.net/counterstrike/teams/16175-potadies
http://www.gosugamers.net/counterstrike/teams/16174-crowns-gg
http://www.gosugamers.net/counterstrike/teams/16173-visomvet
http://www.gosugamers.net/counterstrike/teams/16172-team-phenomenon
http://www.gosugamers.net/counterstrike/teams/16152-kriklekrakle
http://www.gosugamers.net/counterstrike/teams/16148-begenius
http://www.gosugamers.net/counterstrike/teams/16144-blubblub
http://www.gosugamers.net/counterstrike/teams/16142-team-1231
http://www.gosugamers.net/counterstrike/teams/16141-vsv
http://www.gosugamers.net/counterstrike/teams/16140-tbi
http://www.gosugamers.net/counterstrike/teams/16136-deadweight
http://www.gosugamers.net/counterstrike/teams/16135-me-myself-and-i
http://www.gosugamers.net/counterstrike/teams/16085-pur-esports
http://www.gosugamers.net/counterstrike/teams/15850-falken
http://www.gosugamers.net/counterstrike/teams/15815-team-abyssal
http://www.gosugamers.net/counterstrike/teams/15810-ex-deathtrap
http://www.gosugamers.net/counterstrike/teams/15808-mix123
http://www.gosugamers.net/counterstrike/teams/15651-undertake-esports
http://www.gosugamers.net/counterstrike/teams/15644-five
http://www.gosugamers.net/counterstrike/teams/15630-five
http://www.gosugamers.net/counterstrike/teams/15627-inetkoxtv
http://www.gosugamers.net/counterstrike/teams/15626-tetr-s
http://www.gosugamers.net/counterstrike/teams/15625-rozenoir-esports-white
http://www.gosugamers.net/counterstrike/teams/15619-fragment-gg
http://www.gosugamers.net/counterstrike/teams/15615-monarchs-gg
http://www.gosugamers.net/counterstrike/teams/15602-ottoman-fire
http://www.gosugamers.net/counterstrike/teams/15591-respect
http://www.gosugamers.net/counterstrike/teams/15569-moonbeam-gaming
http://www.gosugamers.net/counterstrike/teams/15563-team-tilt
http://www.gosugamers.net/counterstrike/teams/15534-dynasty-uk
http://www.gosugamers.net/counterstrike/teams/15507-urbantech
http://www.gosugamers.net/counterstrike/teams/15374-innova
http://www.gosugamers.net/counterstrike/teams/15373-g3x
http://www.gosugamers.net/counterstrike/teams/15372-cnb
http://www.gosugamers.net/counterstrike/teams/15370-intz
http://www.gosugamers.net/counterstrike/teams/15369-2kill
http://www.gosugamers.net/counterstrike/teams/15368-supernova
http://www.gosugamers.net/counterstrike/teams/15367-biggods
http://www.gosugamers.net/counterstrike/teams/15366-playzone
http://www.gosugamers.net/counterstrike/teams/15365-pride
http://www.gosugamers.net/counterstrike/teams/15359-rising-orkam
http://www.gosugamers.net/counterstrike/teams/15342-team-foxez
http://www.gosugamers.net/counterstrike/teams/15336-angels
http://www.gosugamers.net/counterstrike/teams/15331-atlando-esports
http://www.gosugamers.net/counterstrike/teams/15329-xfinity-esports
http://www.gosugamers.net/counterstrike/teams/15326-nano-reapers
http://www.gosugamers.net/counterstrike/teams/15322-erase-team
http://www.gosugamers.net/counterstrike/teams/15318-heyguys
http://www.gosugamers.net/counterstrike/teams/15317-illusory
http://www.gosugamers.net/counterstrike/teams/15285-dismay
http://www.gosugamers.net/counterstrike/teams/15284-kingdom-esports
http://www.gosugamers.net/counterstrike/teams/15283-team-rival
http://www.gosugamers.net/counterstrike/teams/15282-ze-pug-godz
http://www.gosugamers.net/counterstrike/teams/15281-unlimited-potential1