Python 使用Pandas计算人头对人头的统计数据

Python 使用Pandas计算人头对人头的统计数据,python,python-3.x,pandas,dataframe,pandas-groupby,Python,Python 3.x,Pandas,Dataframe,Pandas Groupby,我有一个如下所示的数据帧: home_team away_team score home_goals away_goals winner 1 Arsenal Chelsea 3-0 3 0 Arsenal 2 ManCity Arsenal 1-1 1 1 draw 3 Chelsea Arsenal 2-1 2 1

我有一个如下所示的数据帧:

  home_team  away_team  score  home_goals  away_goals  winner
1  Arsenal    Chelsea    3-0        3          0       Arsenal
2  ManCity    Arsenal    1-1        1          1       draw
3  Chelsea    Arsenal    2-1        2          1       Chelsea
4  Arsenal    Chelsea    5-5        5          5       draw
5  Arsenal    ManCity    1-2        1          2       ManCity
我的问题是:我如何计算阿森纳对另一支球队的赢-平-负/正面交锋记录

潜在的预期结果可能如下所示:

   team      opponent  games_played  wins  draws  losses  goals_scored  goals_conceded
1  Arsenal   Chelsea        3          1     1      1          9              7
2  Arsenal   ManCity        2          0     1      1          2              3

非常感谢您的帮助。请注意,数据帧不是真实的(以防任何英超专家潜伏)

首先,您需要复制数据,并翻转主客场团队以获取数据 您想要的球队/对手风格的统计信息

这是因为每场比赛你需要数到两次,一次是赢家,一次是输家。复制df并翻转字段,然后使用
df.concat
将数据帧放在一起

现在您可以聚合

你应该在主场、客场和赢家三场比赛中获得积分。数一数这一步的路线和目标。使用
df.groupby(dimensions).agg(metrics)

现在需要将索引重置回df,以便再次使用winner列。使用
df.reset\u索引(inplace=True)
完成此操作

一旦你有了这个,你就可以创建新的列
win,loss,draw'
,将获胜者与主队列或静态字符串“draw”进行比较

您现在可以再次累加df并汇总赢/输/抽列。

检查此代码:

import pandas as pd

df_in = pd.read_csv('data.csv')
df_out = pd.DataFrame(columns = ['team', 'opponent', 'games_played', 'wins', 'draws', 'losses', 'goals_scored', 'goals_conceded'])

team = 'Arsenal'

for index, row in df_in.iterrows():
    if row['home_team'] == team:
        opponent = row['away_team']
        if row['home_goals'] > row['away_goals']:
            win = 1
            draw = 0
            loss = 0
        elif row['home_goals'] < row['away_goals']:
            win = 0
            draw = 0
            loss = 1
        else:
            win = 0
            draw = 1
            loss = 0
        goals_scored = row['home_goals']
        goals_conceded = row['away_goals']
    else:
        opponent = row['home_team']
        if row['home_goals'] > row['away_goals']:
            win = 0
            draw = 0
            loss = 1
        elif row['home_goals'] < row['away_goals']:
            win = 1
            draw = 0
            loss = 0
        else:
            win = 0
            draw = 1
            loss = 0
        goals_scored = row['away_goals']
        goals_conceded = row['home_goals']

    games_played = 1



    if opponent not in df_out['opponent'].unique():
        match = pd.DataFrame({'team': team,
                              'opponent': opponent,
                              'games_played': games_played,
                              'wins': win,
                              'draws': draw,
                              'losses': loss,
                              'goals_scored': goals_scored,
                              'goals_conceded': goals_conceded},
                             index = [0])
        df_out = pd.concat([df_out, match], ignore_index = True)
    else:
        df_out.loc[df_out['opponent'] == opponent, 'games_played'] += games_played
        df_out.loc[df_out['opponent'] == opponent, 'wins'] += win
        df_out.loc[df_out['opponent'] == opponent, 'draws'] += draw
        df_out.loc[df_out['opponent'] == opponent, 'losses'] += loss
        df_out.loc[df_out['opponent'] == opponent, 'goals_scored'] += goals_scored
        df_out.loc[df_out['opponent'] == opponent, 'goals_conceded'] += goals_conceded

太棒了,谢谢你,安德里亚。你知道这是否有可能进一步细分(例如主场比赛、客场比赛、比赛……,主场输、客场输)?我想最好再问一个问题
      team opponent games_played wins draws losses goals_scored goals_conceded
0  Arsenal  Chelsea            3    1     1      1            9              7
1  Arsenal  ManCity            2    0     1      1            2              3