Python 3.x 计算组';基于pct_变化和Pandas中以前的值的多个当前值

Python 3.x 计算组';基于pct_变化和Pandas中以前的值的多个当前值,python-3.x,pandas,dataframe,Python 3.x,Pandas,Dataframe,对于以下数据帧,我想重新计算值,如果预测值等于1,它将基于当前日期的pct和上一日期的值进行计算 city district date value pct predicted 0 a c 2019-09 9.48 0.004237 0 1 a c 2019-10 9.35 -0.013713 0 2 a c 2019-11 9.05 -0.032

对于以下数据帧,我想重新计算
,如果
预测值
等于
1
,它将基于当前日期的
pct
和上一日期的
进行计算

   city district     date  value       pct  predicted
0     a        c  2019-09   9.48  0.004237          0
1     a        c  2019-10   9.35 -0.013713          0
2     a        c  2019-11   9.05 -0.032086          0
3     a        c  2019-12   9.04 -0.001105          1    --> need to recalculate values based on  pct and previous values
4     a        c  2020-01   8.80 -0.020000          1    --> need to recalculate values based on  pct and previous values
5     a        c  2020-02   8.91  0.012500          1    --> need to recalculate values based on  pct and previous values
6     b        d  2019-09   9.48  0.004237          0
7     b        d  2019-10   9.35 -0.013713          0
8     b        d  2019-11   9.05 -0.032086          0
9     b        d  2019-12   9.04 -0.001105          1    --> need to recalculate values based on  pct and previous values
10    b        d  2020-01   8.80 -0.020000          1   --> need to recalculate values based on  pct and previous values
11    b        d  2020-02   8.91  0.012500          1   --> need to recalculate values based on  pct and previous values
我尝试使用以下代码,但结果似乎与我用excel公式计算的结果不同:

df.loc[df["predicted"]==1, "value"] = np.nan
df['value'] = df['value'].ffill().mul(df['pct']).add(df['value'].ffill(), fill_value=0)
print(df)
输出:

   district     date     value       pct  predicted
0         c  2019-09  9.520169  0.004237          0
1         c  2019-10  9.221783 -0.013713          0
2         c  2019-11  8.759626 -0.032086          0
3         c  2019-12  9.040000 -0.001105          1
4         c  2020-01  8.869000 -0.020000          1
5         c  2020-02  9.163125  0.012500          1
6         d  2019-09  9.520169  0.004237          0
7         d  2019-10  9.221783 -0.013713          0
8         d  2019-11  8.759626 -0.032086          0
9         d  2019-12  9.040000 -0.001105          1
10        d  2020-01  8.869000 -0.020000          1
11        d  2020-02  9.163125  0.012500          1
   city district     date      value       pct  predicted   new_pct
0     a        c  2018-12  10.170000       NaN          0       NaN
1     a        c  2019-01   9.990001 -0.017699          0 -0.017699
2     a        c  2019-02  10.660001  0.067067          0  0.067067
3     a        c  2019-03  10.559999 -0.009381          0 -0.009381
4     a        c  2019-04  10.060004 -0.047348          0 -0.047348
5     a        c  2019-05  10.690002  0.062624          0  0.062624
6     a        c  2019-06  10.770006  0.007484          0  0.007484
7     a        c  2019-07  10.670006 -0.009285          0 -0.009285
8     a        c  2019-08  10.510010 -0.014995          0 -0.014995
9     a        c  2019-09  10.280009 -0.021884          0 -0.021884
10    a        c  2019-10  10.050004 -0.022374          0 -0.022374
11    a        c  2019-11   9.720002 -0.032836          0 -0.032836
12    a        c  2019-12   9.840005  0.012346          1  0.012346
13    a        c  2020-01  10.036804  0.020000          1  0.020000
14    a        c  2020-02   9.988548 -0.004808          1 -0.004808
15    a        c  2020-03   9.788778 -0.020000          1 -0.020000
16    a        c  2020-04   9.592998 -0.020000          1 -0.020000
17    a        c  2020-05   9.401140 -0.020000          1 -0.020000
18    a        c  2020-06   9.213114 -0.020000          1 -0.020000
19    a        c  2020-07   9.397375  0.020000          1  0.020000
20    a        c  2020-08   9.585320  0.020000          1  0.020000
21    a        c  2020-09   9.777027  0.020000          1  0.020000
22    a        c  2020-10   9.757712 -0.001976          1 -0.001976
23    a        c  2020-11   9.562560 -0.020000          1 -0.020000
24    b        d  2018-12   6.320000       NaN          0       NaN
25    b        d  2019-01   6.320000  0.000000          0  0.000000
26    b        d  2019-02   6.320000  0.000000          0  0.000000
27    b        d  2019-03   6.320000  0.000000          0  0.000000
28    b        d  2019-04   6.320000  0.000000          0  0.000000
29    b        d  2019-05   6.320000  0.000000          0  0.000000
30    b        d  2019-06   5.999999 -0.050633          0 -0.050633
31    b        d  2019-07   5.999999  0.000000          0  0.000000
32    b        d  2019-08   5.999999  0.000000          0  0.000000
33    b        d  2019-09   5.999999  0.000000          0  0.000000
34    b        d  2019-10   5.999999  0.000000          0  0.000000
35    b        d  2019-11   5.999999  0.000000          0  0.000000
36    b        d  2019-12   5.879999 -0.020000          1 -0.020000
37    b        d  2020-01   5.997599  0.020000          1  0.020000
38    b        d  2020-02   5.877647 -0.020000          1 -0.020000
39    b        d  2020-03   5.995200  0.020000          1  0.020000
40    b        d  2020-04   5.875296 -0.020000          1 -0.020000
41    b        d  2020-05   5.992802  0.020000          1  0.020000
42    b        d  2020-06   5.872947 -0.020000          1 -0.020000
43    b        d  2020-07   5.990406  0.020000          1  0.020000
44    b        d  2020-08   5.870598 -0.020000          1 -0.020000
45    b        d  2020-09   5.988010  0.020000          1  0.020000
46    b        d  2020-10   5.868249 -0.020000          1 -0.020000
47    b        d  2020-11   5.985614  0.020000          1  0.020000
我用于计算2019-12年
2019-12年
value
的公式:
2019-12年
中的
value
=(2019-12年
中的1+
pct
)*
2019-11年
中的
value
,其他月份的逻辑相同

   district     date    value       pct  predicted
0         c  2019-09  9.48000  0.004237          0
1         c  2019-10  9.35000 -0.013713          0
2         c  2019-11  9.05000 -0.032086          0
3         c  2019-12  9.04000 -0.001105          1
4         c  2020-01  8.85920 -0.020000          1
5         c  2020-02  8.96994  0.012500          1
6         d  2019-09  9.48000  0.004237          0
7         d  2019-10  9.35000 -0.013713          0
8         d  2019-11  9.05000 -0.032086          0
9         d  2019-12  9.04000 -0.001105          1
10        d  2020-01  8.85920 -0.020000          1
11        d  2020-02  8.96994  0.012500          1
如何更正代码?多谢各位

更新:

df:

运行以下代码后:

m = df["predicted"]==1
s = df[m].groupby('district')['value'].shift()
df['value'] = (1 + df['pct']).mul(s).fillna(df['value'])

df['new_pct'] = df.groupby('city')['value'].apply(lambda x: x.pct_change())
print(df)
通常,列
pct
new\u pct
应该具有相同的值,但您可以看到,对于某些行,它们是不同的

   city district     date      value       pct  predicted   new_pct
0     a        c  2018-12  10.170000       NaN          0       NaN
1     a        c  2019-01   9.990000 -0.017699          0 -0.017699
2     a        c  2019-02  10.660000  0.067067          0  0.067067
3     a        c  2019-03  10.560000 -0.009381          0 -0.009381
4     a        c  2019-04  10.060000 -0.047348          0 -0.047348
5     a        c  2019-05  10.690000  0.062624          0  0.062624
6     a        c  2019-06  10.770000  0.007484          0  0.007484
7     a        c  2019-07  10.670000 -0.009285          0 -0.009285
8     a        c  2019-08  10.510000 -0.014995          0 -0.014995
9     a        c  2019-09  10.280000 -0.021884          0 -0.021884
10    a        c  2019-10  10.050000 -0.022374          0 -0.022374
11    a        c  2019-11   9.720000 -0.032836          0 -0.032836
12    a        c  2019-12   9.840000  0.012346          1  0.012346
13    a        c  2020-01  10.036800  0.020000          1  0.020000
14    a        c  2020-02   9.988546 -0.004808          1 -0.004808
15    a        c  2020-03  10.143000 -0.020000          1  0.015463
16    a        c  2020-04   9.940140 -0.020000          1 -0.020000
17    a        c  2020-05   9.690436 -0.020000          1 -0.025121
18    a        c  2020-06   9.335088 -0.020000          1 -0.036670
19    a        c  2020-07   9.136344  0.020000          1 -0.021290
20    a        c  2020-08   9.269964  0.020000          1  0.014625
21    a        c  2020-09   9.488448  0.020000          1  0.023569
22    a        c  2020-10   9.884626 -0.001976          1  0.041754
23    a        c  2020-11   9.898000 -0.020000          1  0.001353
24    b        d  2018-12   6.320000       NaN          0       NaN
25    b        d  2019-01   6.320000  0.000000          0  0.000000
26    b        d  2019-02   6.320000  0.000000          0  0.000000
27    b        d  2019-03   6.320000  0.000000          0  0.000000
28    b        d  2019-04   6.320000  0.000000          0  0.000000
29    b        d  2019-05   6.320000  0.000000          0  0.000000
30    b        d  2019-06   6.000000 -0.050633          0 -0.050633
31    b        d  2019-07   6.000000  0.000000          0  0.000000
32    b        d  2019-08   6.000000  0.000000          0  0.000000
33    b        d  2019-09   6.000000  0.000000          0  0.000000
34    b        d  2019-10   6.000000  0.000000          0  0.000000
35    b        d  2019-11   6.000000  0.000000          0  0.000000
36    b        d  2019-12   5.780000 -0.020000          1 -0.036667
37    b        d  2020-01   5.895600  0.020000          1  0.020000
38    b        d  2020-02   5.777688 -0.020000          1 -0.020000
39    b        d  2020-03   5.897640  0.020000          1  0.020761
40    b        d  2020-04   5.677728 -0.020000          1 -0.037288
41    b        d  2020-05   5.857656  0.020000          1  0.031690
42    b        d  2020-06   5.607756 -0.020000          1 -0.042662
43    b        d  2020-07   5.857656  0.020000          1  0.044563
44    b        d  2020-08   5.427828 -0.020000          1 -0.073379
45    b        d  2020-09   5.897640  0.020000          1  0.086556
46    b        d  2020-10   5.207916 -0.020000          1 -0.116949
47    b        d  2020-11   6.007596  0.020000          1  0.153551
参考链接: 我认为您可以使用:

df['value'] = (1 + df['pct']).mul(df.groupby('district')['value'].shift()).fillna(df['value'])
print(df)
   city district     date     value       pct  predicted
0     a        c  2019-09  9.480000  0.004237          0
1     a        c  2019-10  9.350001 -0.013713          0
2     a        c  2019-11  9.049996 -0.032086          0
3     a        c  2019-12  9.040000 -0.001105          1
4     a        c  2020-01  8.859200 -0.020000          1
5     a        c  2020-02  8.910000  0.012500          1
6     b        d  2019-09  9.480000  0.004237          0
7     b        d  2019-10  9.350001 -0.013713          0
8     b        d  2019-11  9.049996 -0.032086          0
9     b        d  2019-12  9.040000 -0.001105          1
10    b        d  2020-01  8.859200 -0.020000          1
11    b        d  2020-02  8.910000  0.012500          1
工作原理:

您可以将以前日期的每个组的值按进行移位,并通过将
1
添加到
pct
,最后一次将组的第一个值替换为原始值按
fillna

df = df.assign(add = (1 + df['pct']),
               shifted=df.groupby('district')['value'].shift(),
               mult = (1 + df['pct']).mul(df.groupby('district')['value'].shift()),
               fin = (1 + df['pct']).mul(df.groupby('district')['value'].shift()).fillna(df['value']))
print(df)
   city district     date  value       pct  predicted       add  shifted  \
0     a        c  2019-09   9.48  0.004237          0  1.004237      NaN   
1     a        c  2019-10   9.35 -0.013713          0  0.986287     9.48   
2     a        c  2019-11   9.05 -0.032086          0  0.967914     9.35   
3     a        c  2019-12   9.04 -0.001105          1  0.998895     9.05   
4     a        c  2020-01   8.80 -0.020000          1  0.980000     9.04   
5     a        c  2020-02   8.91  0.012500          1  1.012500     8.80   
6     b        d  2019-09   9.48  0.004237          0  1.004237      NaN   
7     b        d  2019-10   9.35 -0.013713          0  0.986287     9.48   
8     b        d  2019-11   9.05 -0.032086          0  0.967914     9.35   
9     b        d  2019-12   9.04 -0.001105          1  0.998895     9.05   
10    b        d  2020-01   8.80 -0.020000          1  0.980000     9.04   
11    b        d  2020-02   8.91  0.012500          1  1.012500     8.80   

        mult       fin  
0        NaN  9.480000  
1   9.350001  9.350001  
2   9.049996  9.049996  
3   9.040000  9.040000  
4   8.859200  8.859200  
5   8.910000  8.910000  
6        NaN  9.480000  
7   9.350001  9.350001  
8   9.049996  9.049996  
9   9.040000  9.040000  
10  8.859200  8.859200  
11  8.910000  8.910000  
如果ant仅按条件处理行:

m = df["predicted"]==1
s = df[m].groupby('district')['value'].shift()
df['value'] = (1 + df['pct']).mul(s).fillna(df['value'])
print(df)
   city district     date   value       pct  predicted
0     a        c  2019-09  9.4800  0.004237          0
1     a        c  2019-10  9.3500 -0.013713          0
2     a        c  2019-11  9.0500 -0.032086          0
3     a        c  2019-12  9.0400 -0.001105          1
4     a        c  2020-01  8.8592 -0.020000          1
5     a        c  2020-02  8.9100  0.012500          1
6     b        d  2019-09  9.4800  0.004237          0
7     b        d  2019-10  9.3500 -0.013713          0
8     b        d  2019-11  9.0500 -0.032086          0
9     b        d  2019-12  9.0400 -0.001105          1
10    b        d  2020-01  8.8592 -0.020000          1
11    b        d  2020-02  8.9100  0.012500          1

这似乎解决了问题:

df.loc[df["predicted"]==1, "value"] = np.nan
while len(df.loc[df['value'].isin(['', np.nan])]) > 0:
    df['value'] = (1 + df['pct']).mul(df.groupby('district')['value'].shift()).fillna(df['value'])
df['new_pct'] = df.groupby('district')['value'].apply(lambda x: x.pct_change())

print(df)
输出:

   district     date     value       pct  predicted
0         c  2019-09  9.520169  0.004237          0
1         c  2019-10  9.221783 -0.013713          0
2         c  2019-11  8.759626 -0.032086          0
3         c  2019-12  9.040000 -0.001105          1
4         c  2020-01  8.869000 -0.020000          1
5         c  2020-02  9.163125  0.012500          1
6         d  2019-09  9.520169  0.004237          0
7         d  2019-10  9.221783 -0.013713          0
8         d  2019-11  8.759626 -0.032086          0
9         d  2019-12  9.040000 -0.001105          1
10        d  2020-01  8.869000 -0.020000          1
11        d  2020-02  9.163125  0.012500          1
   city district     date      value       pct  predicted   new_pct
0     a        c  2018-12  10.170000       NaN          0       NaN
1     a        c  2019-01   9.990001 -0.017699          0 -0.017699
2     a        c  2019-02  10.660001  0.067067          0  0.067067
3     a        c  2019-03  10.559999 -0.009381          0 -0.009381
4     a        c  2019-04  10.060004 -0.047348          0 -0.047348
5     a        c  2019-05  10.690002  0.062624          0  0.062624
6     a        c  2019-06  10.770006  0.007484          0  0.007484
7     a        c  2019-07  10.670006 -0.009285          0 -0.009285
8     a        c  2019-08  10.510010 -0.014995          0 -0.014995
9     a        c  2019-09  10.280009 -0.021884          0 -0.021884
10    a        c  2019-10  10.050004 -0.022374          0 -0.022374
11    a        c  2019-11   9.720002 -0.032836          0 -0.032836
12    a        c  2019-12   9.840005  0.012346          1  0.012346
13    a        c  2020-01  10.036804  0.020000          1  0.020000
14    a        c  2020-02   9.988548 -0.004808          1 -0.004808
15    a        c  2020-03   9.788778 -0.020000          1 -0.020000
16    a        c  2020-04   9.592998 -0.020000          1 -0.020000
17    a        c  2020-05   9.401140 -0.020000          1 -0.020000
18    a        c  2020-06   9.213114 -0.020000          1 -0.020000
19    a        c  2020-07   9.397375  0.020000          1  0.020000
20    a        c  2020-08   9.585320  0.020000          1  0.020000
21    a        c  2020-09   9.777027  0.020000          1  0.020000
22    a        c  2020-10   9.757712 -0.001976          1 -0.001976
23    a        c  2020-11   9.562560 -0.020000          1 -0.020000
24    b        d  2018-12   6.320000       NaN          0       NaN
25    b        d  2019-01   6.320000  0.000000          0  0.000000
26    b        d  2019-02   6.320000  0.000000          0  0.000000
27    b        d  2019-03   6.320000  0.000000          0  0.000000
28    b        d  2019-04   6.320000  0.000000          0  0.000000
29    b        d  2019-05   6.320000  0.000000          0  0.000000
30    b        d  2019-06   5.999999 -0.050633          0 -0.050633
31    b        d  2019-07   5.999999  0.000000          0  0.000000
32    b        d  2019-08   5.999999  0.000000          0  0.000000
33    b        d  2019-09   5.999999  0.000000          0  0.000000
34    b        d  2019-10   5.999999  0.000000          0  0.000000
35    b        d  2019-11   5.999999  0.000000          0  0.000000
36    b        d  2019-12   5.879999 -0.020000          1 -0.020000
37    b        d  2020-01   5.997599  0.020000          1  0.020000
38    b        d  2020-02   5.877647 -0.020000          1 -0.020000
39    b        d  2020-03   5.995200  0.020000          1  0.020000
40    b        d  2020-04   5.875296 -0.020000          1 -0.020000
41    b        d  2020-05   5.992802  0.020000          1  0.020000
42    b        d  2020-06   5.872947 -0.020000          1 -0.020000
43    b        d  2020-07   5.990406  0.020000          1  0.020000
44    b        d  2020-08   5.870598 -0.020000          1 -0.020000
45    b        d  2020-09   5.988010  0.020000          1  0.020000
46    b        d  2020-10   5.868249 -0.020000          1 -0.020000
47    b        d  2020-11   5.985614  0.020000          1  0.020000

谢谢,但这很奇怪,除了预测的
predicted
is
0
@ahbon之外,我没有得到任何关于
value
s的真实数据的结果-不确定是否是Underand,你能将你的ourpur从样本数据添加到问题中吗?我在运行你的代码后得到相同的结果,如
df.loc[df[“predicted”]==1,“value”]=np.nan
@ahbon-如果省略
df.loc[df[“predicted”]==1,“value”]=np.nan
那么它会工作吗?是的,但我不知道为什么
pct
new\u pct
不同,
2020-02
中的
pct
,应该是
(2020-02年的值/2020-01年的值)-1
,这是反向计算(