Python 3.x 计算组';基于pct_变化和Pandas中以前的值的多个当前值
对于以下数据帧,我想重新计算Python 3.x 计算组';基于pct_变化和Pandas中以前的值的多个当前值,python-3.x,pandas,dataframe,Python 3.x,Pandas,Dataframe,对于以下数据帧,我想重新计算值,如果预测值等于1,它将基于当前日期的pct和上一日期的值进行计算 city district date value pct predicted 0 a c 2019-09 9.48 0.004237 0 1 a c 2019-10 9.35 -0.013713 0 2 a c 2019-11 9.05 -0.032
值
,如果预测值
等于1
,它将基于当前日期的pct
和上一日期的值
进行计算
city district date value pct predicted
0 a c 2019-09 9.48 0.004237 0
1 a c 2019-10 9.35 -0.013713 0
2 a c 2019-11 9.05 -0.032086 0
3 a c 2019-12 9.04 -0.001105 1 --> need to recalculate values based on pct and previous values
4 a c 2020-01 8.80 -0.020000 1 --> need to recalculate values based on pct and previous values
5 a c 2020-02 8.91 0.012500 1 --> need to recalculate values based on pct and previous values
6 b d 2019-09 9.48 0.004237 0
7 b d 2019-10 9.35 -0.013713 0
8 b d 2019-11 9.05 -0.032086 0
9 b d 2019-12 9.04 -0.001105 1 --> need to recalculate values based on pct and previous values
10 b d 2020-01 8.80 -0.020000 1 --> need to recalculate values based on pct and previous values
11 b d 2020-02 8.91 0.012500 1 --> need to recalculate values based on pct and previous values
我尝试使用以下代码,但结果似乎与我用excel公式计算的结果不同:
df.loc[df["predicted"]==1, "value"] = np.nan
df['value'] = df['value'].ffill().mul(df['pct']).add(df['value'].ffill(), fill_value=0)
print(df)
输出:
district date value pct predicted
0 c 2019-09 9.520169 0.004237 0
1 c 2019-10 9.221783 -0.013713 0
2 c 2019-11 8.759626 -0.032086 0
3 c 2019-12 9.040000 -0.001105 1
4 c 2020-01 8.869000 -0.020000 1
5 c 2020-02 9.163125 0.012500 1
6 d 2019-09 9.520169 0.004237 0
7 d 2019-10 9.221783 -0.013713 0
8 d 2019-11 8.759626 -0.032086 0
9 d 2019-12 9.040000 -0.001105 1
10 d 2020-01 8.869000 -0.020000 1
11 d 2020-02 9.163125 0.012500 1
city district date value pct predicted new_pct
0 a c 2018-12 10.170000 NaN 0 NaN
1 a c 2019-01 9.990001 -0.017699 0 -0.017699
2 a c 2019-02 10.660001 0.067067 0 0.067067
3 a c 2019-03 10.559999 -0.009381 0 -0.009381
4 a c 2019-04 10.060004 -0.047348 0 -0.047348
5 a c 2019-05 10.690002 0.062624 0 0.062624
6 a c 2019-06 10.770006 0.007484 0 0.007484
7 a c 2019-07 10.670006 -0.009285 0 -0.009285
8 a c 2019-08 10.510010 -0.014995 0 -0.014995
9 a c 2019-09 10.280009 -0.021884 0 -0.021884
10 a c 2019-10 10.050004 -0.022374 0 -0.022374
11 a c 2019-11 9.720002 -0.032836 0 -0.032836
12 a c 2019-12 9.840005 0.012346 1 0.012346
13 a c 2020-01 10.036804 0.020000 1 0.020000
14 a c 2020-02 9.988548 -0.004808 1 -0.004808
15 a c 2020-03 9.788778 -0.020000 1 -0.020000
16 a c 2020-04 9.592998 -0.020000 1 -0.020000
17 a c 2020-05 9.401140 -0.020000 1 -0.020000
18 a c 2020-06 9.213114 -0.020000 1 -0.020000
19 a c 2020-07 9.397375 0.020000 1 0.020000
20 a c 2020-08 9.585320 0.020000 1 0.020000
21 a c 2020-09 9.777027 0.020000 1 0.020000
22 a c 2020-10 9.757712 -0.001976 1 -0.001976
23 a c 2020-11 9.562560 -0.020000 1 -0.020000
24 b d 2018-12 6.320000 NaN 0 NaN
25 b d 2019-01 6.320000 0.000000 0 0.000000
26 b d 2019-02 6.320000 0.000000 0 0.000000
27 b d 2019-03 6.320000 0.000000 0 0.000000
28 b d 2019-04 6.320000 0.000000 0 0.000000
29 b d 2019-05 6.320000 0.000000 0 0.000000
30 b d 2019-06 5.999999 -0.050633 0 -0.050633
31 b d 2019-07 5.999999 0.000000 0 0.000000
32 b d 2019-08 5.999999 0.000000 0 0.000000
33 b d 2019-09 5.999999 0.000000 0 0.000000
34 b d 2019-10 5.999999 0.000000 0 0.000000
35 b d 2019-11 5.999999 0.000000 0 0.000000
36 b d 2019-12 5.879999 -0.020000 1 -0.020000
37 b d 2020-01 5.997599 0.020000 1 0.020000
38 b d 2020-02 5.877647 -0.020000 1 -0.020000
39 b d 2020-03 5.995200 0.020000 1 0.020000
40 b d 2020-04 5.875296 -0.020000 1 -0.020000
41 b d 2020-05 5.992802 0.020000 1 0.020000
42 b d 2020-06 5.872947 -0.020000 1 -0.020000
43 b d 2020-07 5.990406 0.020000 1 0.020000
44 b d 2020-08 5.870598 -0.020000 1 -0.020000
45 b d 2020-09 5.988010 0.020000 1 0.020000
46 b d 2020-10 5.868249 -0.020000 1 -0.020000
47 b d 2020-11 5.985614 0.020000 1 0.020000
我用于计算2019-12年2019-12年中value
的公式:2019-12年中的value
=(2019-12年中的1+pct
)*2019-11年中的value
,其他月份的逻辑相同
district date value pct predicted
0 c 2019-09 9.48000 0.004237 0
1 c 2019-10 9.35000 -0.013713 0
2 c 2019-11 9.05000 -0.032086 0
3 c 2019-12 9.04000 -0.001105 1
4 c 2020-01 8.85920 -0.020000 1
5 c 2020-02 8.96994 0.012500 1
6 d 2019-09 9.48000 0.004237 0
7 d 2019-10 9.35000 -0.013713 0
8 d 2019-11 9.05000 -0.032086 0
9 d 2019-12 9.04000 -0.001105 1
10 d 2020-01 8.85920 -0.020000 1
11 d 2020-02 8.96994 0.012500 1
如何更正代码?多谢各位
更新:
df:
运行以下代码后:
m = df["predicted"]==1
s = df[m].groupby('district')['value'].shift()
df['value'] = (1 + df['pct']).mul(s).fillna(df['value'])
df['new_pct'] = df.groupby('city')['value'].apply(lambda x: x.pct_change())
print(df)
通常,列pct
和new\u pct
应该具有相同的值,但您可以看到,对于某些行,它们是不同的
city district date value pct predicted new_pct
0 a c 2018-12 10.170000 NaN 0 NaN
1 a c 2019-01 9.990000 -0.017699 0 -0.017699
2 a c 2019-02 10.660000 0.067067 0 0.067067
3 a c 2019-03 10.560000 -0.009381 0 -0.009381
4 a c 2019-04 10.060000 -0.047348 0 -0.047348
5 a c 2019-05 10.690000 0.062624 0 0.062624
6 a c 2019-06 10.770000 0.007484 0 0.007484
7 a c 2019-07 10.670000 -0.009285 0 -0.009285
8 a c 2019-08 10.510000 -0.014995 0 -0.014995
9 a c 2019-09 10.280000 -0.021884 0 -0.021884
10 a c 2019-10 10.050000 -0.022374 0 -0.022374
11 a c 2019-11 9.720000 -0.032836 0 -0.032836
12 a c 2019-12 9.840000 0.012346 1 0.012346
13 a c 2020-01 10.036800 0.020000 1 0.020000
14 a c 2020-02 9.988546 -0.004808 1 -0.004808
15 a c 2020-03 10.143000 -0.020000 1 0.015463
16 a c 2020-04 9.940140 -0.020000 1 -0.020000
17 a c 2020-05 9.690436 -0.020000 1 -0.025121
18 a c 2020-06 9.335088 -0.020000 1 -0.036670
19 a c 2020-07 9.136344 0.020000 1 -0.021290
20 a c 2020-08 9.269964 0.020000 1 0.014625
21 a c 2020-09 9.488448 0.020000 1 0.023569
22 a c 2020-10 9.884626 -0.001976 1 0.041754
23 a c 2020-11 9.898000 -0.020000 1 0.001353
24 b d 2018-12 6.320000 NaN 0 NaN
25 b d 2019-01 6.320000 0.000000 0 0.000000
26 b d 2019-02 6.320000 0.000000 0 0.000000
27 b d 2019-03 6.320000 0.000000 0 0.000000
28 b d 2019-04 6.320000 0.000000 0 0.000000
29 b d 2019-05 6.320000 0.000000 0 0.000000
30 b d 2019-06 6.000000 -0.050633 0 -0.050633
31 b d 2019-07 6.000000 0.000000 0 0.000000
32 b d 2019-08 6.000000 0.000000 0 0.000000
33 b d 2019-09 6.000000 0.000000 0 0.000000
34 b d 2019-10 6.000000 0.000000 0 0.000000
35 b d 2019-11 6.000000 0.000000 0 0.000000
36 b d 2019-12 5.780000 -0.020000 1 -0.036667
37 b d 2020-01 5.895600 0.020000 1 0.020000
38 b d 2020-02 5.777688 -0.020000 1 -0.020000
39 b d 2020-03 5.897640 0.020000 1 0.020761
40 b d 2020-04 5.677728 -0.020000 1 -0.037288
41 b d 2020-05 5.857656 0.020000 1 0.031690
42 b d 2020-06 5.607756 -0.020000 1 -0.042662
43 b d 2020-07 5.857656 0.020000 1 0.044563
44 b d 2020-08 5.427828 -0.020000 1 -0.073379
45 b d 2020-09 5.897640 0.020000 1 0.086556
46 b d 2020-10 5.207916 -0.020000 1 -0.116949
47 b d 2020-11 6.007596 0.020000 1 0.153551
参考链接:
我认为您可以使用:
df['value'] = (1 + df['pct']).mul(df.groupby('district')['value'].shift()).fillna(df['value'])
print(df)
city district date value pct predicted
0 a c 2019-09 9.480000 0.004237 0
1 a c 2019-10 9.350001 -0.013713 0
2 a c 2019-11 9.049996 -0.032086 0
3 a c 2019-12 9.040000 -0.001105 1
4 a c 2020-01 8.859200 -0.020000 1
5 a c 2020-02 8.910000 0.012500 1
6 b d 2019-09 9.480000 0.004237 0
7 b d 2019-10 9.350001 -0.013713 0
8 b d 2019-11 9.049996 -0.032086 0
9 b d 2019-12 9.040000 -0.001105 1
10 b d 2020-01 8.859200 -0.020000 1
11 b d 2020-02 8.910000 0.012500 1
工作原理:
您可以将以前日期的每个组的值按进行移位,并通过将1
添加到pct
,最后一次将组的第一个值替换为原始值按fillna
:
df = df.assign(add = (1 + df['pct']),
shifted=df.groupby('district')['value'].shift(),
mult = (1 + df['pct']).mul(df.groupby('district')['value'].shift()),
fin = (1 + df['pct']).mul(df.groupby('district')['value'].shift()).fillna(df['value']))
print(df)
city district date value pct predicted add shifted \
0 a c 2019-09 9.48 0.004237 0 1.004237 NaN
1 a c 2019-10 9.35 -0.013713 0 0.986287 9.48
2 a c 2019-11 9.05 -0.032086 0 0.967914 9.35
3 a c 2019-12 9.04 -0.001105 1 0.998895 9.05
4 a c 2020-01 8.80 -0.020000 1 0.980000 9.04
5 a c 2020-02 8.91 0.012500 1 1.012500 8.80
6 b d 2019-09 9.48 0.004237 0 1.004237 NaN
7 b d 2019-10 9.35 -0.013713 0 0.986287 9.48
8 b d 2019-11 9.05 -0.032086 0 0.967914 9.35
9 b d 2019-12 9.04 -0.001105 1 0.998895 9.05
10 b d 2020-01 8.80 -0.020000 1 0.980000 9.04
11 b d 2020-02 8.91 0.012500 1 1.012500 8.80
mult fin
0 NaN 9.480000
1 9.350001 9.350001
2 9.049996 9.049996
3 9.040000 9.040000
4 8.859200 8.859200
5 8.910000 8.910000
6 NaN 9.480000
7 9.350001 9.350001
8 9.049996 9.049996
9 9.040000 9.040000
10 8.859200 8.859200
11 8.910000 8.910000
如果ant仅按条件处理行:
m = df["predicted"]==1
s = df[m].groupby('district')['value'].shift()
df['value'] = (1 + df['pct']).mul(s).fillna(df['value'])
print(df)
city district date value pct predicted
0 a c 2019-09 9.4800 0.004237 0
1 a c 2019-10 9.3500 -0.013713 0
2 a c 2019-11 9.0500 -0.032086 0
3 a c 2019-12 9.0400 -0.001105 1
4 a c 2020-01 8.8592 -0.020000 1
5 a c 2020-02 8.9100 0.012500 1
6 b d 2019-09 9.4800 0.004237 0
7 b d 2019-10 9.3500 -0.013713 0
8 b d 2019-11 9.0500 -0.032086 0
9 b d 2019-12 9.0400 -0.001105 1
10 b d 2020-01 8.8592 -0.020000 1
11 b d 2020-02 8.9100 0.012500 1
这似乎解决了问题:
df.loc[df["predicted"]==1, "value"] = np.nan
while len(df.loc[df['value'].isin(['', np.nan])]) > 0:
df['value'] = (1 + df['pct']).mul(df.groupby('district')['value'].shift()).fillna(df['value'])
df['new_pct'] = df.groupby('district')['value'].apply(lambda x: x.pct_change())
print(df)
输出:
district date value pct predicted
0 c 2019-09 9.520169 0.004237 0
1 c 2019-10 9.221783 -0.013713 0
2 c 2019-11 8.759626 -0.032086 0
3 c 2019-12 9.040000 -0.001105 1
4 c 2020-01 8.869000 -0.020000 1
5 c 2020-02 9.163125 0.012500 1
6 d 2019-09 9.520169 0.004237 0
7 d 2019-10 9.221783 -0.013713 0
8 d 2019-11 8.759626 -0.032086 0
9 d 2019-12 9.040000 -0.001105 1
10 d 2020-01 8.869000 -0.020000 1
11 d 2020-02 9.163125 0.012500 1
city district date value pct predicted new_pct
0 a c 2018-12 10.170000 NaN 0 NaN
1 a c 2019-01 9.990001 -0.017699 0 -0.017699
2 a c 2019-02 10.660001 0.067067 0 0.067067
3 a c 2019-03 10.559999 -0.009381 0 -0.009381
4 a c 2019-04 10.060004 -0.047348 0 -0.047348
5 a c 2019-05 10.690002 0.062624 0 0.062624
6 a c 2019-06 10.770006 0.007484 0 0.007484
7 a c 2019-07 10.670006 -0.009285 0 -0.009285
8 a c 2019-08 10.510010 -0.014995 0 -0.014995
9 a c 2019-09 10.280009 -0.021884 0 -0.021884
10 a c 2019-10 10.050004 -0.022374 0 -0.022374
11 a c 2019-11 9.720002 -0.032836 0 -0.032836
12 a c 2019-12 9.840005 0.012346 1 0.012346
13 a c 2020-01 10.036804 0.020000 1 0.020000
14 a c 2020-02 9.988548 -0.004808 1 -0.004808
15 a c 2020-03 9.788778 -0.020000 1 -0.020000
16 a c 2020-04 9.592998 -0.020000 1 -0.020000
17 a c 2020-05 9.401140 -0.020000 1 -0.020000
18 a c 2020-06 9.213114 -0.020000 1 -0.020000
19 a c 2020-07 9.397375 0.020000 1 0.020000
20 a c 2020-08 9.585320 0.020000 1 0.020000
21 a c 2020-09 9.777027 0.020000 1 0.020000
22 a c 2020-10 9.757712 -0.001976 1 -0.001976
23 a c 2020-11 9.562560 -0.020000 1 -0.020000
24 b d 2018-12 6.320000 NaN 0 NaN
25 b d 2019-01 6.320000 0.000000 0 0.000000
26 b d 2019-02 6.320000 0.000000 0 0.000000
27 b d 2019-03 6.320000 0.000000 0 0.000000
28 b d 2019-04 6.320000 0.000000 0 0.000000
29 b d 2019-05 6.320000 0.000000 0 0.000000
30 b d 2019-06 5.999999 -0.050633 0 -0.050633
31 b d 2019-07 5.999999 0.000000 0 0.000000
32 b d 2019-08 5.999999 0.000000 0 0.000000
33 b d 2019-09 5.999999 0.000000 0 0.000000
34 b d 2019-10 5.999999 0.000000 0 0.000000
35 b d 2019-11 5.999999 0.000000 0 0.000000
36 b d 2019-12 5.879999 -0.020000 1 -0.020000
37 b d 2020-01 5.997599 0.020000 1 0.020000
38 b d 2020-02 5.877647 -0.020000 1 -0.020000
39 b d 2020-03 5.995200 0.020000 1 0.020000
40 b d 2020-04 5.875296 -0.020000 1 -0.020000
41 b d 2020-05 5.992802 0.020000 1 0.020000
42 b d 2020-06 5.872947 -0.020000 1 -0.020000
43 b d 2020-07 5.990406 0.020000 1 0.020000
44 b d 2020-08 5.870598 -0.020000 1 -0.020000
45 b d 2020-09 5.988010 0.020000 1 0.020000
46 b d 2020-10 5.868249 -0.020000 1 -0.020000
47 b d 2020-11 5.985614 0.020000 1 0.020000
谢谢,但这很奇怪,除了预测的predicted
is0
@ahbon之外,我没有得到任何关于value
s的真实数据的结果-不确定是否是Underand,你能将你的ourpur从样本数据添加到问题中吗?我在运行你的代码后得到相同的结果,如df.loc[df[“predicted”]==1,“value”]=np.nan
@ahbon-如果省略df.loc[df[“predicted”]==1,“value”]=np.nan
那么它会工作吗?是的,但我不知道为什么pct
与new\u pct
不同,2020-02
中的pct
,应该是(2020-02年的值/2020-01年的值)-1
,这是反向计算(