Python 使用Pandas计算人头对人头的统计数据
我有一个如下所示的数据帧:Python 使用Pandas计算人头对人头的统计数据,python,python-3.x,pandas,dataframe,pandas-groupby,Python,Python 3.x,Pandas,Dataframe,Pandas Groupby,我有一个如下所示的数据帧: home_team away_team score home_goals away_goals winner 1 Arsenal Chelsea 3-0 3 0 Arsenal 2 ManCity Arsenal 1-1 1 1 draw 3 Chelsea Arsenal 2-1 2 1
home_team away_team score home_goals away_goals winner
1 Arsenal Chelsea 3-0 3 0 Arsenal
2 ManCity Arsenal 1-1 1 1 draw
3 Chelsea Arsenal 2-1 2 1 Chelsea
4 Arsenal Chelsea 5-5 5 5 draw
5 Arsenal ManCity 1-2 1 2 ManCity
我的问题是:我如何计算阿森纳对另一支球队的赢-平-负/正面交锋记录
潜在的预期结果可能如下所示:
team opponent games_played wins draws losses goals_scored goals_conceded
1 Arsenal Chelsea 3 1 1 1 9 7
2 Arsenal ManCity 2 0 1 1 2 3
非常感谢您的帮助。请注意,数据帧不是真实的(以防任何英超专家潜伏) 首先,您需要复制数据,并翻转主客场团队以获取数据 您想要的球队/对手风格的统计信息 这是因为每场比赛你需要数到两次,一次是赢家,一次是输家。复制df并翻转字段,然后使用
df.concat
将数据帧放在一起
现在您可以聚合
你应该在主场、客场和赢家三场比赛中获得积分。数一数这一步的路线和目标。使用df.groupby(dimensions).agg(metrics)
现在需要将索引重置回df,以便再次使用winner列。使用df.reset\u索引(inplace=True)
完成此操作
一旦你有了这个,你就可以创建新的列win,loss,draw'
,将获胜者与主队列或静态字符串“draw”进行比较
您现在可以再次累加df并汇总赢/输/抽列。检查此代码:
import pandas as pd
df_in = pd.read_csv('data.csv')
df_out = pd.DataFrame(columns = ['team', 'opponent', 'games_played', 'wins', 'draws', 'losses', 'goals_scored', 'goals_conceded'])
team = 'Arsenal'
for index, row in df_in.iterrows():
if row['home_team'] == team:
opponent = row['away_team']
if row['home_goals'] > row['away_goals']:
win = 1
draw = 0
loss = 0
elif row['home_goals'] < row['away_goals']:
win = 0
draw = 0
loss = 1
else:
win = 0
draw = 1
loss = 0
goals_scored = row['home_goals']
goals_conceded = row['away_goals']
else:
opponent = row['home_team']
if row['home_goals'] > row['away_goals']:
win = 0
draw = 0
loss = 1
elif row['home_goals'] < row['away_goals']:
win = 1
draw = 0
loss = 0
else:
win = 0
draw = 1
loss = 0
goals_scored = row['away_goals']
goals_conceded = row['home_goals']
games_played = 1
if opponent not in df_out['opponent'].unique():
match = pd.DataFrame({'team': team,
'opponent': opponent,
'games_played': games_played,
'wins': win,
'draws': draw,
'losses': loss,
'goals_scored': goals_scored,
'goals_conceded': goals_conceded},
index = [0])
df_out = pd.concat([df_out, match], ignore_index = True)
else:
df_out.loc[df_out['opponent'] == opponent, 'games_played'] += games_played
df_out.loc[df_out['opponent'] == opponent, 'wins'] += win
df_out.loc[df_out['opponent'] == opponent, 'draws'] += draw
df_out.loc[df_out['opponent'] == opponent, 'losses'] += loss
df_out.loc[df_out['opponent'] == opponent, 'goals_scored'] += goals_scored
df_out.loc[df_out['opponent'] == opponent, 'goals_conceded'] += goals_conceded
太棒了,谢谢你,安德里亚。你知道这是否有可能进一步细分(例如主场比赛、客场比赛、比赛……,主场输、客场输)?我想最好再问一个问题
team opponent games_played wins draws losses goals_scored goals_conceded
0 Arsenal Chelsea 3 1 1 1 9 7
1 Arsenal ManCity 2 0 1 1 2 3