Python 基于另一列值将多列转换为单列_Python_Python 2.7_Pandas_Web Scraping_Beautifulsoup

Python 基于另一列值将多列转换为单列

python python-2.7 pandas web-scraping

Python 基于另一列值将多列转换为单列,python,python-2.7,pandas,web-scraping,beautifulsoup,Python,Python 2.7,Pandas,Web Scraping,Beautifulsoup,我正在努力进行一些分析，我的目标如下 0 2014-2015年第一届Marc Gasol 2014-2015年安东尼·戴维斯第1届会议 2第一勒布朗·詹姆斯2014-2015 3第一詹姆斯·哈登2014-2015 4 2014-2015年第1届斯蒂芬咖喱节 5第二保罗加索尔2014-2015等这是我到目前为止的代码，有没有其他方法可以做到这一点？非常感谢您的任何建议/帮助 r = requests.get('http://www.basketball-reference.com/awards

我正在努力进行一些分析，我的目标如下

0 2014-2015年第一届Marc Gasol 2014-2015年安东尼·戴维斯第1届会议
2第一勒布朗·詹姆斯2014-2015
3第一詹姆斯·哈登2014-2015
4 2014-2015年第1届斯蒂芬咖喱节
5第二保罗加索尔2014-2015等

这是我到目前为止的代码，有没有其他方法可以做到这一点？非常感谢您的任何建议/帮助

r = requests.get('http://www.basketball-reference.com/awards/all_league.html')
soup=BeautifulSoup(r.text.replace('&nbsp;','').replace('&gt;','').encode('ascii','ignore'),"html.parser")
all_league_data = pd.DataFrame(columns = ['year','team','player']) 


stw_list = soup.findAll('div', attrs={'class': 'stw'}) # Find all 'stw's'
for stw in stw_list:
    table = stw.find('table', attrs = {'class':'no_highlight stats_table'})
    for row in table.findAll('tr'):
        col = row.findAll('td')
        if col:
            year = col[0].find(text=True)
            team = col[2].find(text=True)
            player = col[3].find(text=True)
            all_league_data.loc[len(all_league_data)] = [team, player, year]
    all_league_data

看起来您的代码应该可以正常工作，但这里有一个没有熊猫的工作版本：

import requests
from bs4 import BeautifulSoup

r = requests.get('http://www.basketball-reference.com/awards/all_league.html')
soup=BeautifulSoup(r.text.replace('&nbsp;','').replace('&gt;','').encode('ascii','ignore'),"html.parser")
all_league_data = []

stw_list = soup.findAll('div', attrs={'class': 'stw'}) # Find all 'stw's'
for stw in stw_list:
    table = stw.find('table', attrs = {'class':'no_highlight stats_table'})
    for row in table.findAll('tr'):
        col = row.findAll('td')
        if col:
            year = col[0].find(text=True)
            team = col[2].find(text=True)
            player = col[3].find(text=True)
            all_league_data.append([team, player, year])

for i, line in enumerate(all_league_data):
    print(i, *line)

您已经在使用熊猫，所以请使用

这将为您提供数据帧中的所有表数据：

  In [7]:  print(all_league_data[0].dropna().head(5))
         0    1    2                 3                   4  \
0  2014-15  NBA  1st      Marc Gasol C     Anthony Davis F   
1  2014-15  NBA  2nd       Pau Gasol C  DeMarcus Cousins C   
2  2014-15  NBA  3rd  DeAndre Jordan C        Tim Duncan F   
4  2013-14  NBA  1st     Joakim Noah C      LeBron James F   
5  2013-14  NBA  2nd   Dwight Howard C     Blake Griffin F   

                     5                6                    7  
0       LeBron James F   James Harden G      Stephen Curry G  
1  LaMarcus Aldridge F     Chris Paul G  Russell Westbrook G  
2      Blake Griffin F   Kyrie Irving G      Klay Thompson G  
4       Kevin Durant F   James Harden G         Chris Paul G  
5         Kevin Love F  Stephen Curry G        Tony Parker G

不管你喜欢怎样重新排列或删除某些列都是很简单的，read_html会使用一些参数，比如attr，你也可以应用它们，它们都在链接中

  In [7]:  print(all_league_data[0].dropna().head(5))
         0    1    2                 3                   4  \
0  2014-15  NBA  1st      Marc Gasol C     Anthony Davis F   
1  2014-15  NBA  2nd       Pau Gasol C  DeMarcus Cousins C   
2  2014-15  NBA  3rd  DeAndre Jordan C        Tim Duncan F   
4  2013-14  NBA  1st     Joakim Noah C      LeBron James F   
5  2013-14  NBA  2nd   Dwight Howard C     Blake Griffin F   

                     5                6                    7  
0       LeBron James F   James Harden G      Stephen Curry G  
1  LaMarcus Aldridge F     Chris Paul G  Russell Westbrook G  
2      Blake Griffin F   Kyrie Irving G      Klay Thompson G  
4       Kevin Durant F   James Harden G         Chris Paul G  
5         Kevin Love F  Stephen Curry G        Tony Parker G