Pandas 计算每个名称的滞后平均数和四舍五入的滞后平均数

Pandas 计算每个名称的滞后平均数和四舍五入的滞后平均数,pandas,Pandas,我需要计算数据帧中每个组的滞后平均值。我的df是这样的: name value round 0 a 5 3 1 b 4 3 2 c 3 2 3 d 1 2 4 a 2 1 5 c 1 1 0 c 1 3 1 d 4 3 2 b 3 2 3 a 1 2 4

我需要计算数据帧中每个组的滞后平均值。我的df是这样的:

  name value  round
0    a     5      3
1    b     4      3
2    c     3      2
3    d     1      2
4    a     2      1
5    c     1      1
0    c     1      3
1    d     4      3
2    b     3      2
3    a     1      2
4    b     5      1
5    d     2      1

我想计算列
名称
四舍五入
的滞后平均值。也就是说,对于第3轮中的名称a,我需要将值_mean=1.5(因为(1+2)/2)。当然,当round=1时会有nan值

我试过这个:

df['value_mean'] = df.groupby('name').expanding().mean().groupby('name').shift(1)['value'].values

但这是胡说八道:

  name value  round  value_mean
0    a     5      3         NaN
1    b     4      3         5.0
2    c     3      2         3.5
3    d     1      2         NaN
4    a     2      1         4.0
5    c     1      1         3.5
0    c     1      3         NaN
1    d     4      3         3.0
2    b     3      2         2.0
3    a     1      2         NaN
4    b     5      1         1.0
5    d     2      1         2.5

你知道我怎么做吗?我发现了这一点,但它似乎与我的问题无关:

您可以按如下方式进行操作

# sort the values as they need to be counted
df.sort_values(['name', 'round'], inplace=True)
df.reset_index(drop=True, inplace=True)

# create a grouper to calculate the running count
# and running sum as the basis of the average
grouper= df.groupby('name')
ser_sum=   grouper['value'].cumsum()
ser_count= grouper['value'].cumcount()+1
ser_mean= ser_sum.div(ser_count)
ser_same_name= df['name'] == df['name'].shift(1)
# finally you just have to set the first entry
# in each name-group to NaN (this usually would
# set the entries for each name and round=1 to NaN)
df['value_mean']= ser_mean.shift(1).where(ser_same_name, np.NaN)

# if you want to see the intermediate products, 
# you can uncomment the following lines
#df['sum']= ser_sum
#df['count']= ser_count
df
输出:

   name  value  round  value_mean
0     a      2      1         NaN
1     a      1      2         2.0
2     a      5      3         1.5
3     b      5      1         NaN
4     b      3      2         5.0
5     b      4      3         4.0
6     c      1      1         NaN
7     c      3      2         1.0
8     c      1      3         2.0
9     d      2      1         NaN
10    d      1      2         2.0
11    d      4      3         1.5

你可以这样做

# sort the values as they need to be counted
df.sort_values(['name', 'round'], inplace=True)
df.reset_index(drop=True, inplace=True)

# create a grouper to calculate the running count
# and running sum as the basis of the average
grouper= df.groupby('name')
ser_sum=   grouper['value'].cumsum()
ser_count= grouper['value'].cumcount()+1
ser_mean= ser_sum.div(ser_count)
ser_same_name= df['name'] == df['name'].shift(1)
# finally you just have to set the first entry
# in each name-group to NaN (this usually would
# set the entries for each name and round=1 to NaN)
df['value_mean']= ser_mean.shift(1).where(ser_same_name, np.NaN)

# if you want to see the intermediate products, 
# you can uncomment the following lines
#df['sum']= ser_sum
#df['count']= ser_count
df
输出:

   name  value  round  value_mean
0     a      2      1         NaN
1     a      1      2         2.0
2     a      5      3         1.5
3     b      5      1         NaN
4     b      3      2         5.0
5     b      4      3         4.0
6     c      1      1         NaN
7     c      3      2         1.0
8     c      1      3         2.0
9     d      2      1         NaN
10    d      1      2         2.0
11    d      4      3         1.5