Python 使用BeautifulSoup内部标记进行解析
还在学习如何使用BeautifulSoup,我正在尝试从一个使用python3和BeautifulSoup的NFL网站上寻找一些信息。我将该站点解析为lxml:Python 使用BeautifulSoup内部标记进行解析,python,beautifulsoup,Python,Beautifulsoup,还在学习如何使用BeautifulSoup,我正在尝试从一个使用python3和BeautifulSoup的NFL网站上寻找一些信息。我将该站点解析为lxml: soup = BeautifulSoup(source, 'lxml') 然后我找到所有匹配信息: matchups = soup.findAll("div", {"class": "cmg_game_data cmg_matchup_game_box"}) 此时,匹配列表
soup = BeautifulSoup(source, 'lxml')
然后我找到所有匹配信息:
matchups = soup.findAll("div", {"class": "cmg_game_data cmg_matchup_game_box"})
此时,匹配列表中的每个匹配都包含大量数据,如下所示:
<div class="cmg_game_data cmg_matchup_game_box" data-away-conference="American Football Conference" data-away-team-city-search="Houston" data-away-team-fullname-search="Houston" data-away-team-nickname-search="Texans" data-away-team-shortname-search="HOU" data-competition-type="Week 1" data-conference="American Football Conference" data-event-id="80767" data-following="false" data-game-date="2020-09-10 20:20:00" data-game-odd="-10" data-game-total="54.5" data-handicap-difference="0" data-home-conference="American Football Conference" data-home-team-city-search="Kansas City" data-home-team-fullname-search="Kansas City" data-home-team-nickname-search="Chiefs" data-home-team-shortname-search="KC" data-index="0" data-last-update="2020-05-07T22:50:26.5700000" data-link="/sport/football/nfl/matchup/201993" data-sdi-event-id="/sport/football/competition:80767" data-top-twenty-five="false">
然而,这不会返回任何结果。在中提取这些元素的正确方法是什么使用
[]
访问标签的属性:
from bs4 import BeautifulSoup
txt = '''
<div class="cmg_game_data cmg_matchup_game_box" data-away-conference="American Football Conference" data-away-team-city-search="Houston" data-away-team-fullname-search="Houston" data-away-team-nickname-search="Texans" data-away-team-shortname-search="HOU" data-competition-type="Week 1" data-conference="American Football Conference" data-event-id="80767" data-following="false" data-game-date="2020-09-10 20:20:00" data-game-odd="-10" data-game-total="54.5" data-handicap-difference="0" data-home-conference="American Football Conference" data-home-team-city-search="Kansas City" data-home-team-fullname-search="Kansas City" data-home-team-nickname-search="Chiefs" data-home-team-shortname-search="KC" data-index="0" data-last-update="2020-05-07T22:50:26.5700000" data-link="/sport/football/nfl/matchup/201993" data-sdi-event-id="/sport/football/competition:80767" data-top-twenty-five="false"></div>
'''
soup = BeautifulSoup(txt, 'html.parser')
matchups = soup.findAll("div", {"class": "cmg_game_data cmg_matchup_game_box"})
for matchup in matchups:
awayconference = matchup["data-away-conference"] # or you can use matchup.get("data-away-conference")
print(awayconference)
from bs4 import BeautifulSoup
txt = '''
<div class="cmg_game_data cmg_matchup_game_box" data-away-conference="American Football Conference" data-away-team-city-search="Houston" data-away-team-fullname-search="Houston" data-away-team-nickname-search="Texans" data-away-team-shortname-search="HOU" data-competition-type="Week 1" data-conference="American Football Conference" data-event-id="80767" data-following="false" data-game-date="2020-09-10 20:20:00" data-game-odd="-10" data-game-total="54.5" data-handicap-difference="0" data-home-conference="American Football Conference" data-home-team-city-search="Kansas City" data-home-team-fullname-search="Kansas City" data-home-team-nickname-search="Chiefs" data-home-team-shortname-search="KC" data-index="0" data-last-update="2020-05-07T22:50:26.5700000" data-link="/sport/football/nfl/matchup/201993" data-sdi-event-id="/sport/football/competition:80767" data-top-twenty-five="false"></div>
'''
soup = BeautifulSoup(txt, 'html.parser')
matchups = soup.findAll("div", {"class": "cmg_game_data cmg_matchup_game_box"})
for matchup in matchups:
awayconference = matchup["data-away-conference"] # or you can use matchup.get("data-away-conference")
print(awayconference)
American Football Conference