Python 3.x 如何获取特定类下的链接_Python 3.x_Web Scraping_Beautifulsoup_Urllib

Python 3.x 如何获取特定类下的链接

python-3.x web-scraping

Python 3.x 如何获取特定类下的链接,python-3.x,web-scraping,beautifulsoup,urllib,Python 3.x,Web Scraping,Beautifulsoup,Urllib,因此，两天前，我试图在两个相同的类之间解析数据，而在Keyur将其他问题抛在脑后后，他帮了我很多忙D 现在我想获取特定类下的链接，这是我的代码，这是错误 from bs4 import BeautifulSoup import urllib.request import datetime headers = {} # Headers gives information about you like your operation system, your browser etc. heade

因此，两天前，我试图在两个相同的类之间解析数据，而在Keyur将其他问题抛在脑后后，他帮了我很多忙D

现在我想获取特定类下的链接，这是我的代码，这是错误

from bs4 import BeautifulSoup
import urllib.request
import datetime

headers = {}  # Headers gives information about you like your operation system, your browser etc.
headers['User-Agent'] = 'Mozilla/5.0'  # I defined a user agent because HLTV perceive my connection as bot.
hltv = urllib.request.Request('https://www.hltv.org/matches', headers=headers)  # Basically connecting to website
session = urllib.request.urlopen(hltv)
sauce = session.read()  # Getting the source of website
soup = BeautifulSoup(sauce, 'lxml')

a = 0
b = 1
# Getting the match pages' links.
for x in soup.find('span', text=datetime.date.today()).parent:
    print(x.find('a'))

错误：

实际上没有任何错误，但输出如下：

None

None
None
-1
None
None
-1

然后我研究发现，若并没有任何数据可以提供，那个么find函数并没有提供任何数据。然后我试着用find_all

代码：

输出：

AttributeError: 'NavigableString' object has no attribute 'find_all'

https://www.hltv.org/matches/2322508/yeah-vs-sharks-ggbet-ascenso
https://www.hltv.org/matches/2322633/team-australia-vs-team-uk-showmatch-csgo
https://www.hltv.org/matches/2322638/sydney-saints-vs-control-fe-lil-suzi-winner-esl-womens-sydney-open-finals
https://www.hltv.org/matches/2322426/faze-vs-astralis-iem-sydney-2018
https://www.hltv.org/matches/2322601/max-vs-fierce-tiger-starseries-i-league-season-5-asian-qualifier

这是类名：

<div class="standard-headline">2018-05-01</div>

2018-05-01

我不想把所有的代码都发布到这里，所以这里是hltv.org/matches/链接，这样你可以更容易地检查类。

我不太确定我是否能理解OP真正想要获取的链接。不过，我猜了一下。链接位于复合类

a-reset块匹配标准框中，如果您能找到正确的类，那么一个单独的CALS就足以像选择器那样获取数据。试试看
from bs4 import BeautifulSoup
from urllib.request import Request, urlopen
from urllib.parse import urljoin
import datetime

url = 'https://www.hltv.org/matches'

req = Request(url, headers={"User-Agent":"Mozilla/5.0"}) 
res = urlopen(req).read()
soup = BeautifulSoup(res, 'lxml')
for links in soup.find(class_="standard-headline",text=(datetime.date.today())).find_parent().find_all(class_="upcoming-match")[:-2]: 
    print(urljoin(url,links.get('href')))

输出：
AttributeError: 'NavigableString' object has no attribute 'find_all'

https://www.hltv.org/matches/2322508/yeah-vs-sharks-ggbet-ascenso
https://www.hltv.org/matches/2322633/team-australia-vs-team-uk-showmatch-csgo
https://www.hltv.org/matches/2322638/sydney-saints-vs-control-fe-lil-suzi-winner-esl-womens-sydney-open-finals
https://www.hltv.org/matches/2322426/faze-vs-astralis-iem-sydney-2018
https://www.hltv.org/matches/2322601/max-vs-fierce-tiger-starseries-i-league-season-5-asian-qualifier

以此类推------
将bs4作为BeautifulSoup导入是不正确的。你的代码实际上是什么样子的？你是不是无意或故意忘记提到你提到的类名？你是说对于一个在汤中的类。find（'span'，text=（datetime.date.today（））。parent.find_all（“a”）：print（a）
？在标准标题下没有链接。它们都是文本。至少我找不到。请具体说明。@akagna，我在早期运行了代码编译器，并返回了许多锚定标记，因此关于的注释中的代码应该返回多个锚定标记。但关键是只获取今天的匹配，我已经能够做到这一点。顺便说一句，我喜欢你缩短代码的方式。如果你只想抓取今天的链接，那么我最近的编辑应该可以让你达到目的。谢谢。很高兴我们最终都能解决。谢谢。你知道西姆，一切都很好，但你为什么用[：-2]，我知道它的作用，但我的意思是，为什么DIf如果您在没有索引的情况下运行脚本，您可能会得到两个您不想要的额外链接。这就是为什么我用它来满足你的需要。谢谢