Python Groupby并用前后值';熊猫的意思
我尝试在Python Groupby并用前后值';熊猫的意思,python,python-3.x,pandas,Python,Python 3.x,Pandas,我尝试在NaN单元格中填充其前和后的值的均值 type date v1 v2 0 a 2018-09 21511.11 17696.8 1 a 2018-10 NaN NaN 2 a 2018-11 NaN NaN 3 a 2018-12 30319.98 24553.6 4 a 2019-01 NaN NaN 5 a 20
NaN
单元格中填充其前
和后的值的均值
type date v1 v2
0 a 2018-09 21511.11 17696.8
1 a 2018-10 NaN NaN
2 a 2018-11 NaN NaN
3 a 2018-12 30319.98 24553.6
4 a 2019-01 NaN NaN
5 a 2019-02 NaN NaN
6 a 2019-03 7409.61 6110.0
7 a 2019-04 NaN NaN
8 a 2019-05 NaN NaN
9 a 2019-06 15212.51 12590.5
10 a 2019-07 NaN NaN
11 a 2019-08 NaN NaN
12 a 2019-09 23129.96 19160.9
13 a 2019-10 NaN NaN
14 a 2019-11 NaN NaN
15 b 2018-09 21511.11 17696.8
16 b 2018-10 NaN NaN
17 b 2018-11 NaN NaN
18 b 2018-12 30319.98 24553.6
19 b 2019-01 NaN NaN
20 b 2019-02 NaN NaN
21 b 2019-03 7409.61 6110.0
22 b 2019-04 NaN NaN
23 b 2019-05 NaN NaN
24 b 2019-06 15212.51 12590.5
25 b 2019-07 NaN NaN
26 b 2019-08 NaN NaN
27 b 2019-09 23129.96 19160.9
28 b 2019-10 NaN NaN
29 b 2019-11 NaN NaN
我尝试使用以下代码,并参考:
我得到:
type date v1 v2
0 a 2018-09 21511.110 17696.80
1 a 2018-10 25915.545 21125.20
2 a 2018-11 25915.545 21125.20
3 a 2018-12 30319.980 24553.60
4 a 2019-01 18864.795 15331.80
5 a 2019-02 18864.795 15331.80
6 a 2019-03 7409.610 6110.00
7 a 2019-04 11311.060 9350.25
8 a 2019-05 11311.060 9350.25
9 a 2019-06 15212.510 12590.50
10 a 2019-07 19171.235 15875.70
11 a 2019-08 19171.235 15875.70
12 a 2019-09 23129.960 19160.90
13 a 2019-10 22320.535 18428.85
14 a 2019-11 22320.535 18428.85
15 b 2018-09 21511.110 17696.80
16 b 2018-10 25915.545 21125.20
17 b 2018-11 25915.545 21125.20
18 b 2018-12 30319.980 24553.60
19 b 2019-01 18864.795 15331.80
20 b 2019-02 18864.795 15331.80
21 b 2019-03 7409.610 6110.00
22 b 2019-04 11311.060 9350.25
23 b 2019-05 11311.060 9350.25
24 b 2019-06 15212.510 12590.50
25 b 2019-07 19171.235 15875.70
26 b 2019-08 19171.235 15875.70
27 b 2019-09 23129.960 19160.90
28 b 2019-10 23129.960 19160.90
29 b 2019-11 23129.960 19160.90
但我不知道如何分组键入并应用上面的代码。有人能帮忙吗?谢谢。添加groupby
和列列表以供处理,还使用每个组的第一个和最后一个缺失值应用以避免从一个组值替换到另一个组值(如果组中只存在一些NaN
s值):
g = df.groupby('type')['v1', 'v2']
df[['v1', 'v2']] = (g.ffill()+g.bfill())/2
df[['v1', 'v2']] = g.apply(lambda x: x.bfill().ffill())
仅适用于数字列的解决方案:
cols = df.select_dtypes(np.number).columns
g = df.groupby('type')[cols]
df[cols] = (g.ffill()+g.bfill())/2
df[cols] = g.apply(lambda x: x.bfill().ffill())
就像你说的:
df[['v1','v2']] = (df.groupby('type')[['v1','v2']]
.agg(['bfill','ffill'])
.groupby(level=0, axis=1)
.mean()
)
使用df.groupby('type')
并使用生成的groupbydataframe
上的逻辑,如果我想将其应用于所有number
列,而不是指定v1
、v2
等,谢谢。是numerics=['int16','int32','int64','float16','float32','float64'],cols=df.选择类型(include=numerics).列
与cols=df.选择类型(np.number).列
?@ahbon-我认为是的,我认为还添加了复数;)
df[['v1','v2']] = (df.groupby('type')[['v1','v2']]
.agg(['bfill','ffill'])
.groupby(level=0, axis=1)
.mean()
)