Python Pandas-两列的条件累积和_Python_Pandas

Python Pandas-两列的条件累积和

python pandas

Python Pandas-两列的条件累积和,python,pandas,Python,Pandas,我想为足球队计算分数。每场比赛我都有分数，无论是主场还是客场，我都会得到积分。我不知道如何获得每个队的总分（主客场分）这就是我迄今为止所做的： df = pd.DataFrame([ ["Gothenburg", "Malmo", 2018, 1, 1], ["Malmo","Gothenburg", 2018, 1, 1], ["Malmo", "Gothenburg", 2018, 0, 3], ["Gothenburg", "Malmo", 2018, 1, 1], ["Goth

我想为足球队计算分数。每场比赛我都有分数，无论是主场还是客场，我都会得到积分。我不知道如何获得每个队的总分（主客场分）

这就是我迄今为止所做的：

  df  = pd.DataFrame([
["Gothenburg", "Malmo", 2018, 1, 1],
["Malmo","Gothenburg",  2018, 1, 1],
["Malmo", "Gothenburg", 2018, 0, 3],
["Gothenburg", "Malmo", 2018, 1, 1],
["Gothenburg", "Malmo" ,2018, 0, 3],
["Gothenburg", "Malmo", 2018, 1, 1],
["Gothenburg", "Malmo", 2018, 0, 3],
["Malmo", "Gothenburg", 2018, 0, 3],
["Gothenburg", "Malmo", 2018, 1, 1],
["Malmo", "Gothenburg", 2018, 0, 3],
[ "Malmo","Gothenburg", 2018, 1, 1],
[ "Malmo", "Gothenburg",2018, 0, 3],
])


df.columns = ['H_team', 'A_team', "Year", 'H_points', 'A_points']

# Cumulaive sum for home/ away team with shift 1 row
df["H_cumsum"] = df.groupby(['H_team', "Year"])['H_points'].transform(
                             lambda x: x.cumsum().shift())
df["A_cumsum"] = df.groupby(['A_team', "Year"])['A_points'].transform(
                             lambda x: x.cumsum().shift())

print(df)

    H_team      A_team  Year  H_points  A_points  H_cumsum  A_cumsum
0   Gothenburg       Malmo  2018         1         1       NaN       NaN
1        Malmo  Gothenburg  2018         1         1       NaN       NaN
2        Malmo  Gothenburg  2018         0         3       1.0       1.0
3   Gothenburg       Malmo  2018         1         1       1.0       1.0
4   Gothenburg       Malmo  2018         0         3       2.0       2.0
5   Gothenburg       Malmo  2018         1         1       2.0       5.0
6   Gothenburg       Malmo  2018         0         3       3.0       6.0
7        Malmo  Gothenburg  2018         0         3       1.0       4.0
8   Gothenburg       Malmo  2018         1         1       3.0       9.0
9        Malmo  Gothenburg  2018         0         3       1.0       7.0
10       Malmo  Gothenburg  2018         1         1       1.0      10.0
11       Malmo  Gothenburg  2018         0         3       2.0      11.0

这张表给了我每个队的累计主场和客场积分，移动了一行。但我需要主客场比赛的总得分。H_cumsum和A_cumsum应该加上之前主客场比赛的积分

期望输出：

row 0: Malmo = NaN, Gothenburg = NaN
row 1: Gothenburg = 1, Malmo = 1
row 2: Malmo = 1 + 1 = 2, Gothenburg = 1 + 1 = 2
row 3: Gothenburg = 1 + 1 + 3 = 5, Malmo = 1 + 1 + 0 = 2
row 4: Gothenburg = 1 + 1 + 3 + 1 = 6, Malmo = 1 + 1 + 0 + 1 = 3
And so on...

最后一行11应为：

H_cumsum (team Malmo) = 12     H_cumsum (team Gothenburg) = 15

在我这方面，这似乎算是不错的。这有点太过分了

df.columns = ['H_team', 'A_team', "Year", 'H_points', 'A_points']
# H_team cumsum() for science.

df['H_cumsum'] = df[['H_team', 'H_points']].groupby(['H_team']).cumsum()
# A_team cumsum() for more science.

df['A_cumsum'] = df[['A_team', 'A_points']].groupby(['A_team']).cumsum()
# Creating a column for the sum of the two, or total points scored by either side.

df['T_sum'] = df['H_points'] + df['A_points']

# Creating the cumsum() column for T_sum
df['T_cumsum'] = df['T_sum'].cumsum()

print(df)

在我这方面，这似乎算是不错的。这有点太过分了

df.columns = ['H_team', 'A_team', "Year", 'H_points', 'A_points']
# H_team cumsum() for science.

df['H_cumsum'] = df[['H_team', 'H_points']].groupby(['H_team']).cumsum()
# A_team cumsum() for more science.

df['A_cumsum'] = df[['A_team', 'A_points']].groupby(['A_team']).cumsum()
# Creating a column for the sum of the two, or total points scored by either side.

df['T_sum'] = df['H_points'] + df['A_points']

# Creating the cumsum() column for T_sum
df['T_cumsum'] = df['T_sum'].cumsum()

print(df)

我找到了一个解决方案，使用stack，但不是一个好的解决方案：

df  = pd.DataFrame([
["Gothenburg", "Malmo", 2018, 1, 1],
["Malmo","Gothenburg",  2018, 1, 1],
["Malmo", "Gothenburg", 2018, 0, 3],
["Gothenburg", "Malmo", 2018, 1, 1],
["Gothenburg", "Malmo" ,2018, 0, 3],
["Gothenburg", "Malmo", 2018, 1, 1],
["Gothenburg", "Malmo", 2018, 0, 3],
["Malmo", "Gothenburg", 2018, 0, 3],
["Gothenburg", "Malmo", 2018, 1, 1],
["Malmo", "Gothenburg", 2018, 0, 3],
[ "Malmo","Gothenburg", 2018, 1, 1],
[ "Malmo", "Gothenburg",2018, 0, 3],
])


df.columns = [['Team', 'Team', "Year", 'Points', 'Points'],
    ['Home', 'Away', 'Year', 'Home', 'Away']]

d1 = df.stack()
total = d1.groupby('Team').Points.apply(lambda x: x.shift().cumsum())
df = d1.assign(Total=total).unstack()

print(df)

   Points                  Team                  Year              Total           
     Away Home Year        Away        Home Year Away Home    Year  Away  Home Year
0     1.0  1.0  NaN       Malmo  Gothenburg  NaN  NaN  NaN  2018.0   NaN   NaN  NaN
1     1.0  1.0  NaN  Gothenburg       Malmo  NaN  NaN  NaN  2018.0   1.0   1.0  NaN
2     3.0  0.0  NaN  Gothenburg       Malmo  NaN  NaN  NaN  2018.0   2.0   2.0  NaN
3     1.0  1.0  NaN       Malmo  Gothenburg  NaN  NaN  NaN  2018.0   2.0   5.0  NaN
4     3.0  0.0  NaN       Malmo  Gothenburg  NaN  NaN  NaN  2018.0   3.0   6.0  NaN
5     1.0  1.0  NaN       Malmo  Gothenburg  NaN  NaN  NaN  2018.0   6.0   6.0  NaN
6     3.0  0.0  NaN       Malmo  Gothenburg  NaN  NaN  NaN  2018.0   7.0   7.0  NaN
7     3.0  0.0  NaN  Gothenburg       Malmo  NaN  NaN  NaN  2018.0   7.0  10.0  NaN
8     1.0  1.0  NaN       Malmo  Gothenburg  NaN  NaN  NaN  2018.0  10.0  10.0  NaN
9     3.0  0.0  NaN  Gothenburg       Malmo  NaN  NaN  NaN  2018.0  11.0  11.0  NaN
10    1.0  1.0  NaN  Gothenburg       Malmo  NaN  NaN  NaN  2018.0  14.0  11.0  NaN
11    3.0  0.0  NaN  Gothenburg       Malmo  NaN  NaN  NaN  2018.0  15.0  12.0  NaN

总积分/客场积分和总积分/主场积分正确。但是，使用所有额外的、不必要的列来查看该表变得非常困难。（我在本例中没有显示每行另外有10列，所以这真是一团糟。）

所需输出为：

        H_team      A_team  Year  H_points  A_points  H_cumsum  A_cumsum
0   Gothenburg       Malmo  2018         1         1       NaN       NaN
1        Malmo  Gothenburg  2018         1         1       1.0       1.0
2        Malmo  Gothenburg  2018         0         3       2.0       2.0
3   Gothenburg       Malmo  2018         1         1       5.0       2.0
4   Gothenburg       Malmo  2018         0         3       6.0       3.0
5   Gothenburg       Malmo  2018         1         1       6.0       6.0
6   Gothenburg       Malmo  2018         0         3       7.0       7.0
7        Malmo  Gothenburg  2018         0         3       10.0      7.0
8   Gothenburg       Malmo  2018         1         1       10.0      10.0
9        Malmo  Gothenburg  2018         0         3       11.0      11.0
10       Malmo  Gothenburg  2018         1         1       11.0      14.0
11       Malmo  Gothenburg  2018         0         3       12.0      15.0

我找到了一个解决方案，使用stack，但不是一个好的解决方案：

df  = pd.DataFrame([
["Gothenburg", "Malmo", 2018, 1, 1],
["Malmo","Gothenburg",  2018, 1, 1],
["Malmo", "Gothenburg", 2018, 0, 3],
["Gothenburg", "Malmo", 2018, 1, 1],
["Gothenburg", "Malmo" ,2018, 0, 3],
["Gothenburg", "Malmo", 2018, 1, 1],
["Gothenburg", "Malmo", 2018, 0, 3],
["Malmo", "Gothenburg", 2018, 0, 3],
["Gothenburg", "Malmo", 2018, 1, 1],
["Malmo", "Gothenburg", 2018, 0, 3],
[ "Malmo","Gothenburg", 2018, 1, 1],
[ "Malmo", "Gothenburg",2018, 0, 3],
])


df.columns = [['Team', 'Team', "Year", 'Points', 'Points'],
    ['Home', 'Away', 'Year', 'Home', 'Away']]

d1 = df.stack()
total = d1.groupby('Team').Points.apply(lambda x: x.shift().cumsum())
df = d1.assign(Total=total).unstack()

print(df)

   Points                  Team                  Year              Total           
     Away Home Year        Away        Home Year Away Home    Year  Away  Home Year
0     1.0  1.0  NaN       Malmo  Gothenburg  NaN  NaN  NaN  2018.0   NaN   NaN  NaN
1     1.0  1.0  NaN  Gothenburg       Malmo  NaN  NaN  NaN  2018.0   1.0   1.0  NaN
2     3.0  0.0  NaN  Gothenburg       Malmo  NaN  NaN  NaN  2018.0   2.0   2.0  NaN
3     1.0  1.0  NaN       Malmo  Gothenburg  NaN  NaN  NaN  2018.0   2.0   5.0  NaN
4     3.0  0.0  NaN       Malmo  Gothenburg  NaN  NaN  NaN  2018.0   3.0   6.0  NaN
5     1.0  1.0  NaN       Malmo  Gothenburg  NaN  NaN  NaN  2018.0   6.0   6.0  NaN
6     3.0  0.0  NaN       Malmo  Gothenburg  NaN  NaN  NaN  2018.0   7.0   7.0  NaN
7     3.0  0.0  NaN  Gothenburg       Malmo  NaN  NaN  NaN  2018.0   7.0  10.0  NaN
8     1.0  1.0  NaN       Malmo  Gothenburg  NaN  NaN  NaN  2018.0  10.0  10.0  NaN
9     3.0  0.0  NaN  Gothenburg       Malmo  NaN  NaN  NaN  2018.0  11.0  11.0  NaN
10    1.0  1.0  NaN  Gothenburg       Malmo  NaN  NaN  NaN  2018.0  14.0  11.0  NaN
11    3.0  0.0  NaN  Gothenburg       Malmo  NaN  NaN  NaN  2018.0  15.0  12.0  NaN

所需输出为：

        H_team      A_team  Year  H_points  A_points  H_cumsum  A_cumsum
0   Gothenburg       Malmo  2018         1         1       NaN       NaN
1        Malmo  Gothenburg  2018         1         1       1.0       1.0
2        Malmo  Gothenburg  2018         0         3       2.0       2.0
3   Gothenburg       Malmo  2018         1         1       5.0       2.0
4   Gothenburg       Malmo  2018         0         3       6.0       3.0
5   Gothenburg       Malmo  2018         1         1       6.0       6.0
6   Gothenburg       Malmo  2018         0         3       7.0       7.0
7        Malmo  Gothenburg  2018         0         3       10.0      7.0
8   Gothenburg       Malmo  2018         1         1       10.0      10.0
9        Malmo  Gothenburg  2018         0         3       11.0      11.0
10       Malmo  Gothenburg  2018         1         1       11.0      14.0
11       Malmo  Gothenburg  2018         0         3       12.0      15.0

预期输出是什么？我在问题中添加了输出。预期输出是什么？我在问题中添加了输出。谢谢，但这不是我想要的。我编辑了我的问题，希望现在更清楚。谢谢，但这不是我想要的。我编辑了我的问题，希望现在更清楚。