Python 如何基于列值合并数据帧中的行？_Python_Python 3.x_Pandas_Dataframe_Data Structures

Python 如何基于列值合并数据帧中的行？

python python-3.x pandas dataframe data-structures

Python 如何基于列值合并数据帧中的行？,python,python-3.x,pandas,dataframe,data-structures,Python,Python 3.x,Pandas,Dataframe,Data Structures,我有一个这样的数据集，其中每一行代表一个特定匹配中的一个，该匹配由gameID指定 gameID赢/输主客场metric2 metric3 metric4 team1 team2 team3 team4 2017020001 1 1 0 10 10 10 1 0 0 0 2017020001 0 0 1 10 10

我有一个这样的数据集，其中每一行代表一个特定匹配中的一个，该匹配由

gameID

指定

gameID赢/输主客场metric2 metric3 metric4 team1 team2 team3 team4
2017020001         1          1      0      10      10      10      1     0     0      0
2017020001         0          0      1      10      10      10      0     1     0      0

我想做的事情是创建一个函数，它获取具有相同

gameID

的行并将它们连接起来。正如您在下面的数据示例中所看到的，这两行代表一场比赛，分为主队（第1行）和客队（第2行）。我想让这两排坐在一排上

获胜/失败的h_metric2 h_metric3 h_metric2 a_Metric 3 a_Metric 4 h_团队1 h_团队2 h_团队3 h_团队4 a_团队1 a_团队2 a_团队3 a_团队4
1            10       10         10        10         10        10      1       0        0      0         0      1        0      0

我如何得到这个结果

编辑：我制造了太多的混乱，发布了我的代码，这样你就可以更好地理解我想要解决的问题

import numpy as np
import pandas as pd
import requests
import json
from sklearn import preprocessing
from sklearn.preprocessing import OneHotEncoder

results = []
for game_id in range(2017020001, 2017020010, 1):
    url = 'https://statsapi.web.nhl.com/api/v1/game/{}/boxscore'.format(game_id)
r = requests.get(url)
game_data = r.json()

for homeaway in ['home','away']:

    game_dict = game_data.get('teams').get(homeaway).get('teamStats').get('teamSkaterStats')
    game_dict['team'] = game_data.get('teams').get(homeaway).get('team').get('name')
    game_dict['homeaway'] = homeaway
    game_dict['game_id'] = game_id
    results.append(game_dict)

df = pd.DataFrame(results)

df['Won/Lost'] = df.groupby('game_id')['goals'].apply(lambda g: (g == g.max()).map({True: 1, False: 0}))

df["faceOffWinPercentage"] = df["faceOffWinPercentage"].astype('float')
df["powerPlayPercentage"] = df["powerPlayPercentage"].astype('float')
df["team"] = df["team"].astype('category')
df = pd.get_dummies(df, columns=['homeaway'])
df = pd.get_dummies(df, columns=['team'])

我只是想，你在用面包和黄油工作： numpy熊猫公司

如果是这样，我进一步假设您的表当前存储在一个名为“df”的pandas.DataFrame实例中：

将您的df分为两个df，然后将它们连接起来：

df_team1 = df[df['Won/Lost']==1]
df_team2 = df[df['Won/Lost']==0]
final_df = df_team1.join(df_team2, lsuffix='_team1', rsuffix='_team2', on='gameID')

当然，您可以对其进行编辑以更好地符合您的目的。例如，基于主/客场列创建df等

溴本

：]

这是基于这样的假设，即每个

gameID

正好有两行，并且您希望根据该ID进行分组。（它还假设我理解这个问题。）

改进的解决方案

给定数据帧

df

，例如

       gameID  Won/Lost  Home  Away  metric2  metric3  metric4  team1  team2  team3  team4
0  2017020001         1     1     0       10       10       10      1      0      0      0
1  2017020001         0     0     1       10       10       10      0      1      0      0
2  2017020002         1     1     0       10       10       10      1      0      0      0
3  2017020002         0     0     1       10       10       10      0      1      0      0

您可以像这样使用

pd.merge

（和一些数据搜索）：

>>> is_home = df['Home'] == 1                                                                                                                                                                                                                   
>>> home = df[is_home].drop(['Home', 'Away'], axis=1).add_prefix('h_').rename(columns={'h_gameID':'gameID'})                                                                                                                                    
>>> away = df[~is_home].drop(['Won/Lost', 'Home', 'Away'], axis=1).add_prefix('a_').rename(columns={'a_gameID':'gameID'})                                                                                                                       
>>> pd.merge(home, away, on='gameID')                                                                                                                                                                                                           
       gameID  h_Won/Lost  h_metric2  h_metric3  h_metric4  h_team1  h_team2  h_team3  h_team4  a_metric2  a_metric3  a_metric4  a_team1  a_team2  a_team3  a_team4
0  2017020001           1         10         10         10        1        0        0        0         10         10         10        0        1        0        0
1  2017020002           1         10         10         10        1        0        0        0         10         10         10        0        1        0        0

>>> df.groupby('gameID').apply(munge).reset_index(level=1, drop=True)                                                                                                                                                                           
            Won/Lost  h_metric2  h_metric3  h_metric4  h_team1  h_team2  h_team3  h_team4  a_metric2  a_metric3  a_metric4  a_team1  a_team2  a_team3  a_team4
gameID                                                                                                                                                        
2017020001         1         10         10         10        1        0        0        0         10         10         10        0        1        0        0
2017020002         1         10         10         10        1        0        0        0         10         10         10        0        1        0        0

（我保留了

win/Lost

的前缀，因为它表明这是主队的统计数据。另外，如果有人知道如何更优雅地添加前缀，而不必重新命名

gameID

，请留下评论。）

原始尝试

分组后，可以应用以下函数

def munge(group): 
     is_home = group.Home == 1 
     wonlost = group.loc[is_home, 'Won/Lost'].reset_index(drop=True) 
     group = group.loc[:, 'metric2':] 
     home = group[is_home].add_prefix('h_').reset_index(drop=True) 
     away = group[~is_home].add_prefix('a_').reset_index(drop=True) 
     return pd.concat([wonlost, home, away], axis=1)

。。。像这样：

>>> is_home = df['Home'] == 1                                                                                                                                                                                                                   
>>> home = df[is_home].drop(['Home', 'Away'], axis=1).add_prefix('h_').rename(columns={'h_gameID':'gameID'})                                                                                                                                    
>>> away = df[~is_home].drop(['Won/Lost', 'Home', 'Away'], axis=1).add_prefix('a_').rename(columns={'a_gameID':'gameID'})                                                                                                                       
>>> pd.merge(home, away, on='gameID')                                                                                                                                                                                                           
       gameID  h_Won/Lost  h_metric2  h_metric3  h_metric4  h_team1  h_team2  h_team3  h_team4  a_metric2  a_metric3  a_metric4  a_team1  a_team2  a_team3  a_team4
0  2017020001           1         10         10         10        1        0        0        0         10         10         10        0        1        0        0
1  2017020002           1         10         10         10        1        0        0        0         10         10         10        0        1        0        0

>>> df.groupby('gameID').apply(munge).reset_index(level=1, drop=True)                                                                                                                                                                           
            Won/Lost  h_metric2  h_metric3  h_metric4  h_team1  h_team2  h_team3  h_team4  a_metric2  a_metric3  a_metric4  a_team1  a_team2  a_team3  a_team4
gameID                                                                                                                                                        
2017020001         1         10         10         10        1        0        0        0         10         10         10        0        1        0        0
2017020002         1         10         10         10        1        0        0        0         10         10         10        0        1        0        0

“赢/输”列在期望的输出中有什么意义？对不起，不清楚，赢/输列将是主队的。谢谢你！它完成了我要求的问题，你所有的假设都是正确的。我用我的代码发布了一个编辑，这样你就可以更好地理解数据的外观。我可以告诉你，你完全理解它，执行得更好，非常感谢你，这正是我想要的！！