用Python/BS4实现刮表

用Python/BS4实现刮表,python,python-2.7,beautifulsoup,Python,Python 2.7,Beautifulsoup,我试图用BS4和Python2.7从“团队统计”表中删除。但是我无法接近它 url = 'http://www.pro-football-reference.com/boxscores/201602070den.htm' page = requests.get(url) soup = BeautifulSoup(page.text, "html5lib") table=soup.findAll('table', {'id':"team_stats", "class":"stats_table"}

我试图用BS4和Python2.7从“团队统计”表中删除。但是我无法接近它

url = 'http://www.pro-football-reference.com/boxscores/201602070den.htm'
page = requests.get(url)
soup = BeautifulSoup(page.text, "html5lib")
table=soup.findAll('table', {'id':"team_stats", "class":"stats_table"})  
print table

我原以为上面的代码会有用,但运气不好

本例中的问题是“Team Stats”表位于HTML源中的注释内,您可以通过
请求下载该注释。找到注释并使用
BeautifulSoup
将其重新分析为“soup”对象:

和/或,您可以将表格加载到,例如,非常方便使用的:

import pandas as pd
import requests
from bs4 import BeautifulSoup
from bs4 import NavigableString

url = 'http://www.pro-football-reference.com/boxscores/201602070den.htm'
page = requests.get(url, headers={'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36'})

soup = BeautifulSoup(page.content, "html5lib")
comment = soup.find(text=lambda x: isinstance(x, NavigableString) and "team_stats" in x)

df = pd.read_html(comment)[0]
print(df)
印刷品:

            Unnamed: 0            DEN            CAR
0          First Downs             11             21
1         Rush-Yds-TDs        28-90-1       27-118-1
2    Cmp-Att-Yd-TD-INT  13-23-141-0-1  18-41-265-0-1
3         Sacked-Yards           5-37           7-68
4       Net Pass Yards            104            197
5          Total Yards            194            315
6         Fumbles-Lost            3-1            4-3
7            Turnovers              2              4
8      Penalties-Yards           6-51         12-102
9     Third Down Conv.           1-14           3-15
10   Fourth Down Conv.            0-0            0-0
11  Time of Possession          27:13          32:47

你到底想刮什么?仅仅是表格?为了获得有效的帮助,你需要提供更多的信息(在你原来的帖子中,而不是在看不到的评论中)。不工作:不运行?或者,运行,但给出不正确的结果?你在期待什么?发生了什么事?还包括任何错误消息(如果适用)。另外,看起来您缺少一些加载了javascript的
import
语句。。。所以你需要像ghost.js或selenium这样的东西…哇,我需要一段时间才能理解这里发生了什么lol谢谢。
            Unnamed: 0            DEN            CAR
0          First Downs             11             21
1         Rush-Yds-TDs        28-90-1       27-118-1
2    Cmp-Att-Yd-TD-INT  13-23-141-0-1  18-41-265-0-1
3         Sacked-Yards           5-37           7-68
4       Net Pass Yards            104            197
5          Total Yards            194            315
6         Fumbles-Lost            3-1            4-3
7            Turnovers              2              4
8      Penalties-Yards           6-51         12-102
9     Third Down Conv.           1-14           3-15
10   Fourth Down Conv.            0-0            0-0
11  Time of Possession          27:13          32:47