Python 使用BeautifulSoup在Div中查找表_Python_Beautifulsoup

Python 使用BeautifulSoup在Div中查找表

python

Python 使用BeautifulSoup在Div中查找表,python,beautifulsoup,Python,Beautifulsoup,我试图写一些东西，提取NFL分数的价差。以下是所有比赛的职业足球参考资料，我正试图更进一步。我要刮取的示例页面如下：到目前为止，我的代码是： def get_spread(row): a = row.findAll('a',href=True) box_link = 'https://www.pro-football-reference.com/'+a[-1]['href'] temp_soup = BeautifulSoup(urlopen(box_link),'html.par

我试图写一些东西，提取NFL分数的价差。以下是所有比赛的职业足球参考资料，我正试图更进一步。我要刮取的示例页面如下：

到目前为止，我的代码是：

def get_spread(row):
  a = row.findAll('a',href=True)
  box_link = 'https://www.pro-football-reference.com/'+a[-1]['href']
  temp_soup = BeautifulSoup(urlopen(box_link),'html.parser')
  table = temp_soup.find('div', {'id':'all_game_info'})
  return table

其中，行定义为

soup.findAll（'tbody'，limit=1）[0]。findAll（'tr'）[0://code>
忽略这一点并尝试只刮取那个示例页面，如果我使用table=temp\u soup.find（'div'，{'id'：'all\u game\u info'）
，我得到table
是
<div class="table_wrapper setup_commented commented" id="all_game_info">
<div class="section_heading">
<span class="section_anchor" data-label="Game Info" id="game_info_link"></span><h2>Game Info</h2> <div class="section_heading_text">
<ul>
</ul>
</div>
</div>
<div class="placeholder"></div>
<!--
   <div class="table_outer_container">
      <div class="overthrow table_container" id="div_game_info">
      
  <table class="suppress_all sortable stats_table" id="game_info" data-cols-to-freeze="0"><caption>Game Info Table</caption><tr class="thead onecell" ><td class="right center" data-stat="onecell" colspan="2" >Game Info</td></tr>
<tr ><th scope="row" class="center " data-stat="info" >Won Toss</th><td class="center " data-stat="stat" >Chiefs (deferred)</td></tr>
<tr ><th scope="row" class="center " data-stat="info" >Roof</th><td class="center " data-stat="stat" >outdoors</td></tr>
<tr ><th scope="row" class="center " data-stat="info" >Surface</th><td class="center " data-stat="stat" >fieldturf </td></tr>
<tr ><th scope="row" class="center " data-stat="info" >Duration</th><td class="center " data-stat="stat" >3:37</td></tr>
<tr ><th scope="row" class="center " data-stat="info" >Attendance</th><td class="center " data-stat="stat" ><a href="/years/2017/attendance.htm">65,878</a></td></tr>
<tr ><th scope="row" class="center " data-stat="info" >Weather</th><td class="center " data-stat="stat" >63 degrees, wind 8 mph</td></tr>
<tr ><th scope="row" class="center " data-stat="info" >Vegas Line</th><td class="center " data-stat="stat" >New England Patriots -8.0</td></tr>
<tr ><th scope="row" class="center " data-stat="info" >Over/Under</th><td class="center " data-stat="stat" >47.5 <b>(over)</b></td></tr>

</table>

      </div>
   </div>
-->
</div>


游戏信息



我想要最后两个（'Vegas Line'和'Over/Under'），但是如果我运行table.findall（'tr'）
，它将返回None，就像我尝试查找'td'、'table'、'th'一样。因此，我很好奇如何从表变量中提取这些值。
该
位于HTML注释（
）中，因此需要额外的步骤来提取它：
import requests
from bs4 import BeautifulSoup, Comment


url = 'https://www.pro-football-reference.com/boxscores/201709070nwe.htm'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
table = soup.select_one('h2:contains("Game Info")').find_next(text=lambda t: isinstance(t, Comment))

# load <table> from HTML comments <!-- ... -->
soup = BeautifulSoup(str(table), 'html.parser')
vegas_line = soup.select_one('th:contains("Vegas Line")').find_next('td').text
over_under = soup.select_one('th:contains("Over/Under")').find_next('td').text

print(vegas_line)
print(over_under)

哦我不知道关于HTML。我想这就是为什么它没有出现在soup.findall（'table'）中的原因？@yankefan11是的，这正是原因。这给了我一个“未实现的错误：只实现了以下伪类：类型的第n个”。表=行？@yankefan11您使用的是beautifulsou
的古老版本。我正在使用版本beautifulsoup4==4.9.1尝试更新模块。谢谢。谷歌Colab不是最新的。
New England Patriots -8.0
47.5 (over)