Python 使用BeautifulSoup在Div中查找表

Python 使用BeautifulSoup在Div中查找表,python,beautifulsoup,Python,Beautifulsoup,我试图写一些东西,提取NFL分数的价差。以下是所有比赛的职业足球参考资料,我正试图更进一步。我要刮取的示例页面如下: 到目前为止,我的代码是: def get_spread(row): a = row.findAll('a',href=True) box_link = 'https://www.pro-football-reference.com/'+a[-1]['href'] temp_soup = BeautifulSoup(urlopen(box_link),'html.par

我试图写一些东西,提取NFL分数的价差。以下是所有比赛的职业足球参考资料,我正试图更进一步。我要刮取的示例页面如下:

到目前为止,我的代码是:

def get_spread(row):
  a = row.findAll('a',href=True)
  box_link = 'https://www.pro-football-reference.com/'+a[-1]['href']
  temp_soup = BeautifulSoup(urlopen(box_link),'html.parser')
  table = temp_soup.find('div', {'id':'all_game_info'})
  return table
其中,行定义为
soup.findAll('tbody',limit=1)[0]。findAll('tr')[0://code>

忽略这一点并尝试只刮取那个示例页面,如果我使用
table=temp\u soup.find('div',{'id':'all\u game\u info')
,我得到
table

<div class="table_wrapper setup_commented commented" id="all_game_info">
<div class="section_heading">
<span class="section_anchor" data-label="Game Info" id="game_info_link"></span><h2>Game Info</h2> <div class="section_heading_text">
<ul>
</ul>
</div>
</div>
<div class="placeholder"></div>
<!--
   <div class="table_outer_container">
      <div class="overthrow table_container" id="div_game_info">
      
  <table class="suppress_all sortable stats_table" id="game_info" data-cols-to-freeze="0"><caption>Game Info Table</caption><tr class="thead onecell" ><td class="right center" data-stat="onecell" colspan="2" >Game Info</td></tr>
<tr ><th scope="row" class="center " data-stat="info" >Won Toss</th><td class="center " data-stat="stat" >Chiefs (deferred)</td></tr>
<tr ><th scope="row" class="center " data-stat="info" >Roof</th><td class="center " data-stat="stat" >outdoors</td></tr>
<tr ><th scope="row" class="center " data-stat="info" >Surface</th><td class="center " data-stat="stat" >fieldturf </td></tr>
<tr ><th scope="row" class="center " data-stat="info" >Duration</th><td class="center " data-stat="stat" >3:37</td></tr>
<tr ><th scope="row" class="center " data-stat="info" >Attendance</th><td class="center " data-stat="stat" ><a href="/years/2017/attendance.htm">65,878</a></td></tr>
<tr ><th scope="row" class="center " data-stat="info" >Weather</th><td class="center " data-stat="stat" >63 degrees, wind 8 mph</td></tr>
<tr ><th scope="row" class="center " data-stat="info" >Vegas Line</th><td class="center " data-stat="stat" >New England Patriots -8.0</td></tr>
<tr ><th scope="row" class="center " data-stat="info" >Over/Under</th><td class="center " data-stat="stat" >47.5 <b>(over)</b></td></tr>

</table>

      </div>
   </div>
-->
</div>

游戏信息
我想要最后两个('Vegas Line'和'Over/Under'),但是如果我运行
table.findall('tr')
,它将返回None,就像我尝试查找'td'、'table'、'th'一样。因此,我很好奇如何从表变量中提取这些值。

位于HTML注释(
)中,因此需要额外的步骤来提取它:

import requests
from bs4 import BeautifulSoup, Comment


url = 'https://www.pro-football-reference.com/boxscores/201709070nwe.htm'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
table = soup.select_one('h2:contains("Game Info")').find_next(text=lambda t: isinstance(t, Comment))

# load <table> from HTML comments <!-- ... -->
soup = BeautifulSoup(str(table), 'html.parser')
vegas_line = soup.select_one('th:contains("Vegas Line")').find_next('td').text
over_under = soup.select_one('th:contains("Over/Under")').find_next('td').text

print(vegas_line)
print(over_under)

哦我不知道关于HTML。我想这就是为什么它没有出现在soup.findall('table')中的原因?@yankefan11是的,这正是原因。这给了我一个“未实现的错误:只实现了以下伪类:类型的第n个”。表=行?@yankefan11您使用的是
beautifulsou
的古老版本。我正在使用版本
beautifulsoup4==4.9.1
尝试更新模块。谢谢。谷歌Colab不是最新的。
New England Patriots -8.0
47.5 (over)