Python 使用beautifulsoup从网站上刮表，最后出错_Python_Beautifulsoup_Scrape

Python 使用beautifulsoup从网站上刮表，最后出错

python

Python 使用beautifulsoup从网站上刮表，最后出错,python,beautifulsoup,scrape,Python,Beautifulsoup,Scrape,我正试图从NFL网站上搜刮一张桌子，但总是出错，不知道我做错了什么我使用的代码是： import pandas import urllib2 #specify the url NFLpage = "http://www.espn.com/nhl/statistics/player/_/stat/points/sort/points/year/2015/seasontype/2" #Query the website and return the html to the variable '

我正试图从NFL网站上搜刮一张桌子，但总是出错，不知道我做错了什么

我使用的代码是：

import pandas
import urllib2

#specify the url
NFLpage = "http://www.espn.com/nhl/statistics/player/_/stat/points/sort/points/year/2015/seasontype/2"

#Query the website and return the html to the variable 'page'
page = urllib2.urlopen(NFLpage)

#import the Beautiful soup functions to parse the data returned from the website
from bs4 import BeautifulSoup

#Parse the html in the 'page' variable, and store it in Beautiful Soup format
soup = BeautifulSoup(page)

print soup.prettify(page)


#Find the right table
all_tables=soup.find_all('table')
right_table=soup.find('table', class_='tablehead')
right_table 

for row in right_table.findAll("tr"):

    col = row.find_all('td')

    column_1 = col[0].string.strip()
    RK.append(column_1)

    column_2 = col[1].string.strip()
    PLAYER.append(column_2)

    column_3 = col[2].string.strip()
    TEAM.append(column_3)

    column_4 = col[3].string.strip()
    GP.append(column_4)

    column_5 = col[4].string.strip()
    G1.append(column_5)

    column_6 = col[5].string.strip()
    A1.append(column_6)

    column_7 = col[6].string.strip()
    PTS.append(column_7)

    column_8 = col[7].string.strip()
    Diff.append(column_8)

    column_9 = col[8].string.strip()
    PIM.append(column_9)

    column_10 = col[9].string.strip()
    PTSG.append(column_10)

    column_11 = col[10].string.strip()
    SOG.append(column_11)

    column_12 = col[11].string.strip()
    PCT.append(column_12)

    column_13 = col[12].string.strip()
    GWG.append(column_13)


    column_14 = col[13].string.strip()
    G2.append(column_14)

    column_15 = col[14].string.strip()
    A2.append(column_15)

    column_16 = col[15].string.strip()
    G3.append(column_16)

    column_17 = col[15].string.strip()
    A3.append(column_17)


columns = {'RK': RK, 'PLAYER':PLAYER, 'TEAM'=TEAM, 'GP': GP, 'G1': G1, 'A1': A1, 'PTS': PTS, 'Diff'=Diff, 'PIM'=PIM, 'PTSG'=PTSG, 'SOG'=SOG, 'PCT'=PCT, 'GWG'=GWG, 'G2'=G2, 'A2'=A2, 'G3'=G3,'A3'=A3}

df = pd.DataFrame(columns)

df

当前在列指定行（从末尾算起的第三行）上出现错误。你能帮我看看我做错了什么吗

干杯，

安德里亚

熊猫

可以从url读取表格，您可以参考

输出：

欢迎来到Stackoverflow。这真的是一个例子吗？我的第一个问题，仍然在学习如何最好地使用它谢谢，我得到了提示：没有名为lxml的模块。而且似乎无法安装它。我正在使用anaconda.had通过conda install命令安装lxml和html5lib，然后是的，能够看到表有另一个问题，需要表是一个数据帧，并猜测这不是一个数据帧使用df=pd.concat将这个（数据帧列表）的输出连接到一个数据帧中（数据帧列表）这似乎已经奏效了。@Andreia Domz这将返回一个数据帧列表，您使用concat是正确的，我很高兴您能找到答案。请接受我的回答，结束这个问题。

import pandas as pd

pd.read_html('http://www.espn.com/nhl/statistics/player/_/stat/points/sort/points/year/2015/seasontype/2')

[     0                       1     2    3    4    5    6    7    8      9   \
 0   NaN                      PP    SH  NaN  NaN  NaN  NaN  NaN  NaN    NaN   
 1    RK                  PLAYER  TEAM   GP    G    A  PTS  +/-  PIM  PTS/G   
 2     1          Jamie Benn, LW   DAL   82   35   52   87    1   64   1.06   
 3     2         John Tavares, C   NYI   82   38   48   86    5   46   1.05   
 4     3        Sidney Crosby, C   PIT   77   28   56   84    5   47   1.09   
 5     4       Alex Ovechkin, LW   WSH   81   53   28   81   10   58   1.00   
 6   NaN       Jakub Voracek, RW   PHI   82   22   59   81    1   78   0.99   
 7     6    Nicklas Backstrom, C   WSH   82   18   60   78    5   40   0.95   
 8     7         Tyler Seguin, C   DAL   71   37   40   77   -1   20   1.08   
 9     8         Jiri Hudler, LW   CGY   78   31   45   76   17   14   0.97   
 10  NaN        Daniel Sedin, LW   VAN   82   20   56   76    5   18   0.93