Python Pandas-两列的条件累积和
我想为足球队计算分数。每场比赛我都有分数,无论是主场还是客场,我都会得到积分。我不知道如何获得每个队的总分(主客场分) 这就是我迄今为止所做的:Python Pandas-两列的条件累积和,python,pandas,Python,Pandas,我想为足球队计算分数。每场比赛我都有分数,无论是主场还是客场,我都会得到积分。我不知道如何获得每个队的总分(主客场分) 这就是我迄今为止所做的: df = pd.DataFrame([ ["Gothenburg", "Malmo", 2018, 1, 1], ["Malmo","Gothenburg", 2018, 1, 1], ["Malmo", "Gothenburg", 2018, 0, 3], ["Gothenburg", "Malmo", 2018, 1, 1], ["Goth
df = pd.DataFrame([
["Gothenburg", "Malmo", 2018, 1, 1],
["Malmo","Gothenburg", 2018, 1, 1],
["Malmo", "Gothenburg", 2018, 0, 3],
["Gothenburg", "Malmo", 2018, 1, 1],
["Gothenburg", "Malmo" ,2018, 0, 3],
["Gothenburg", "Malmo", 2018, 1, 1],
["Gothenburg", "Malmo", 2018, 0, 3],
["Malmo", "Gothenburg", 2018, 0, 3],
["Gothenburg", "Malmo", 2018, 1, 1],
["Malmo", "Gothenburg", 2018, 0, 3],
[ "Malmo","Gothenburg", 2018, 1, 1],
[ "Malmo", "Gothenburg",2018, 0, 3],
])
df.columns = ['H_team', 'A_team', "Year", 'H_points', 'A_points']
# Cumulaive sum for home/ away team with shift 1 row
df["H_cumsum"] = df.groupby(['H_team', "Year"])['H_points'].transform(
lambda x: x.cumsum().shift())
df["A_cumsum"] = df.groupby(['A_team', "Year"])['A_points'].transform(
lambda x: x.cumsum().shift())
print(df)
H_team A_team Year H_points A_points H_cumsum A_cumsum
0 Gothenburg Malmo 2018 1 1 NaN NaN
1 Malmo Gothenburg 2018 1 1 NaN NaN
2 Malmo Gothenburg 2018 0 3 1.0 1.0
3 Gothenburg Malmo 2018 1 1 1.0 1.0
4 Gothenburg Malmo 2018 0 3 2.0 2.0
5 Gothenburg Malmo 2018 1 1 2.0 5.0
6 Gothenburg Malmo 2018 0 3 3.0 6.0
7 Malmo Gothenburg 2018 0 3 1.0 4.0
8 Gothenburg Malmo 2018 1 1 3.0 9.0
9 Malmo Gothenburg 2018 0 3 1.0 7.0
10 Malmo Gothenburg 2018 1 1 1.0 10.0
11 Malmo Gothenburg 2018 0 3 2.0 11.0
这张表给了我每个队的累计主场和客场积分,移动了一行。但我需要主客场比赛的总得分。H_cumsum和A_cumsum应该加上之前主客场比赛的积分
期望输出:
row 0: Malmo = NaN, Gothenburg = NaN
row 1: Gothenburg = 1, Malmo = 1
row 2: Malmo = 1 + 1 = 2, Gothenburg = 1 + 1 = 2
row 3: Gothenburg = 1 + 1 + 3 = 5, Malmo = 1 + 1 + 0 = 2
row 4: Gothenburg = 1 + 1 + 3 + 1 = 6, Malmo = 1 + 1 + 0 + 1 = 3
And so on...
最后一行11应为:
H_cumsum (team Malmo) = 12 H_cumsum (team Gothenburg) = 15
在我这方面,这似乎算是不错的。这有点太过分了
df.columns = ['H_team', 'A_team', "Year", 'H_points', 'A_points']
# H_team cumsum() for science.
df['H_cumsum'] = df[['H_team', 'H_points']].groupby(['H_team']).cumsum()
# A_team cumsum() for more science.
df['A_cumsum'] = df[['A_team', 'A_points']].groupby(['A_team']).cumsum()
# Creating a column for the sum of the two, or total points scored by either side.
df['T_sum'] = df['H_points'] + df['A_points']
# Creating the cumsum() column for T_sum
df['T_cumsum'] = df['T_sum'].cumsum()
print(df)
在我这方面,这似乎算是不错的。这有点太过分了
df.columns = ['H_team', 'A_team', "Year", 'H_points', 'A_points']
# H_team cumsum() for science.
df['H_cumsum'] = df[['H_team', 'H_points']].groupby(['H_team']).cumsum()
# A_team cumsum() for more science.
df['A_cumsum'] = df[['A_team', 'A_points']].groupby(['A_team']).cumsum()
# Creating a column for the sum of the two, or total points scored by either side.
df['T_sum'] = df['H_points'] + df['A_points']
# Creating the cumsum() column for T_sum
df['T_cumsum'] = df['T_sum'].cumsum()
print(df)
我找到了一个解决方案,使用stack,但不是一个好的解决方案:
df = pd.DataFrame([
["Gothenburg", "Malmo", 2018, 1, 1],
["Malmo","Gothenburg", 2018, 1, 1],
["Malmo", "Gothenburg", 2018, 0, 3],
["Gothenburg", "Malmo", 2018, 1, 1],
["Gothenburg", "Malmo" ,2018, 0, 3],
["Gothenburg", "Malmo", 2018, 1, 1],
["Gothenburg", "Malmo", 2018, 0, 3],
["Malmo", "Gothenburg", 2018, 0, 3],
["Gothenburg", "Malmo", 2018, 1, 1],
["Malmo", "Gothenburg", 2018, 0, 3],
[ "Malmo","Gothenburg", 2018, 1, 1],
[ "Malmo", "Gothenburg",2018, 0, 3],
])
df.columns = [['Team', 'Team', "Year", 'Points', 'Points'],
['Home', 'Away', 'Year', 'Home', 'Away']]
d1 = df.stack()
total = d1.groupby('Team').Points.apply(lambda x: x.shift().cumsum())
df = d1.assign(Total=total).unstack()
print(df)
Points Team Year Total
Away Home Year Away Home Year Away Home Year Away Home Year
0 1.0 1.0 NaN Malmo Gothenburg NaN NaN NaN 2018.0 NaN NaN NaN
1 1.0 1.0 NaN Gothenburg Malmo NaN NaN NaN 2018.0 1.0 1.0 NaN
2 3.0 0.0 NaN Gothenburg Malmo NaN NaN NaN 2018.0 2.0 2.0 NaN
3 1.0 1.0 NaN Malmo Gothenburg NaN NaN NaN 2018.0 2.0 5.0 NaN
4 3.0 0.0 NaN Malmo Gothenburg NaN NaN NaN 2018.0 3.0 6.0 NaN
5 1.0 1.0 NaN Malmo Gothenburg NaN NaN NaN 2018.0 6.0 6.0 NaN
6 3.0 0.0 NaN Malmo Gothenburg NaN NaN NaN 2018.0 7.0 7.0 NaN
7 3.0 0.0 NaN Gothenburg Malmo NaN NaN NaN 2018.0 7.0 10.0 NaN
8 1.0 1.0 NaN Malmo Gothenburg NaN NaN NaN 2018.0 10.0 10.0 NaN
9 3.0 0.0 NaN Gothenburg Malmo NaN NaN NaN 2018.0 11.0 11.0 NaN
10 1.0 1.0 NaN Gothenburg Malmo NaN NaN NaN 2018.0 14.0 11.0 NaN
11 3.0 0.0 NaN Gothenburg Malmo NaN NaN NaN 2018.0 15.0 12.0 NaN
总积分/客场积分和总积分/主场积分正确。但是,使用所有额外的、不必要的列来查看该表变得非常困难。(我在本例中没有显示每行另外有10列,所以这真是一团糟。)
所需输出为:
H_team A_team Year H_points A_points H_cumsum A_cumsum
0 Gothenburg Malmo 2018 1 1 NaN NaN
1 Malmo Gothenburg 2018 1 1 1.0 1.0
2 Malmo Gothenburg 2018 0 3 2.0 2.0
3 Gothenburg Malmo 2018 1 1 5.0 2.0
4 Gothenburg Malmo 2018 0 3 6.0 3.0
5 Gothenburg Malmo 2018 1 1 6.0 6.0
6 Gothenburg Malmo 2018 0 3 7.0 7.0
7 Malmo Gothenburg 2018 0 3 10.0 7.0
8 Gothenburg Malmo 2018 1 1 10.0 10.0
9 Malmo Gothenburg 2018 0 3 11.0 11.0
10 Malmo Gothenburg 2018 1 1 11.0 14.0
11 Malmo Gothenburg 2018 0 3 12.0 15.0
我找到了一个解决方案,使用stack,但不是一个好的解决方案:
df = pd.DataFrame([
["Gothenburg", "Malmo", 2018, 1, 1],
["Malmo","Gothenburg", 2018, 1, 1],
["Malmo", "Gothenburg", 2018, 0, 3],
["Gothenburg", "Malmo", 2018, 1, 1],
["Gothenburg", "Malmo" ,2018, 0, 3],
["Gothenburg", "Malmo", 2018, 1, 1],
["Gothenburg", "Malmo", 2018, 0, 3],
["Malmo", "Gothenburg", 2018, 0, 3],
["Gothenburg", "Malmo", 2018, 1, 1],
["Malmo", "Gothenburg", 2018, 0, 3],
[ "Malmo","Gothenburg", 2018, 1, 1],
[ "Malmo", "Gothenburg",2018, 0, 3],
])
df.columns = [['Team', 'Team', "Year", 'Points', 'Points'],
['Home', 'Away', 'Year', 'Home', 'Away']]
d1 = df.stack()
total = d1.groupby('Team').Points.apply(lambda x: x.shift().cumsum())
df = d1.assign(Total=total).unstack()
print(df)
Points Team Year Total
Away Home Year Away Home Year Away Home Year Away Home Year
0 1.0 1.0 NaN Malmo Gothenburg NaN NaN NaN 2018.0 NaN NaN NaN
1 1.0 1.0 NaN Gothenburg Malmo NaN NaN NaN 2018.0 1.0 1.0 NaN
2 3.0 0.0 NaN Gothenburg Malmo NaN NaN NaN 2018.0 2.0 2.0 NaN
3 1.0 1.0 NaN Malmo Gothenburg NaN NaN NaN 2018.0 2.0 5.0 NaN
4 3.0 0.0 NaN Malmo Gothenburg NaN NaN NaN 2018.0 3.0 6.0 NaN
5 1.0 1.0 NaN Malmo Gothenburg NaN NaN NaN 2018.0 6.0 6.0 NaN
6 3.0 0.0 NaN Malmo Gothenburg NaN NaN NaN 2018.0 7.0 7.0 NaN
7 3.0 0.0 NaN Gothenburg Malmo NaN NaN NaN 2018.0 7.0 10.0 NaN
8 1.0 1.0 NaN Malmo Gothenburg NaN NaN NaN 2018.0 10.0 10.0 NaN
9 3.0 0.0 NaN Gothenburg Malmo NaN NaN NaN 2018.0 11.0 11.0 NaN
10 1.0 1.0 NaN Gothenburg Malmo NaN NaN NaN 2018.0 14.0 11.0 NaN
11 3.0 0.0 NaN Gothenburg Malmo NaN NaN NaN 2018.0 15.0 12.0 NaN
总积分/客场积分和总积分/主场积分正确。但是,使用所有额外的、不必要的列来查看该表变得非常困难。(我在本例中没有显示每行另外有10列,所以这真是一团糟。)
所需输出为:
H_team A_team Year H_points A_points H_cumsum A_cumsum
0 Gothenburg Malmo 2018 1 1 NaN NaN
1 Malmo Gothenburg 2018 1 1 1.0 1.0
2 Malmo Gothenburg 2018 0 3 2.0 2.0
3 Gothenburg Malmo 2018 1 1 5.0 2.0
4 Gothenburg Malmo 2018 0 3 6.0 3.0
5 Gothenburg Malmo 2018 1 1 6.0 6.0
6 Gothenburg Malmo 2018 0 3 7.0 7.0
7 Malmo Gothenburg 2018 0 3 10.0 7.0
8 Gothenburg Malmo 2018 1 1 10.0 10.0
9 Malmo Gothenburg 2018 0 3 11.0 11.0
10 Malmo Gothenburg 2018 1 1 11.0 14.0
11 Malmo Gothenburg 2018 0 3 12.0 15.0
预期输出是什么?我在问题中添加了输出。预期输出是什么?我在问题中添加了输出。谢谢,但这不是我想要的。我编辑了我的问题,希望现在更清楚。谢谢,但这不是我想要的。我编辑了我的问题,希望现在更清楚。