Python 通过在大熊猫组中添加连续数字来填充NaN
我有一个数据帧,比如Python 通过在大熊猫组中添加连续数字来填充NaN,python,python-3.x,pandas,Python,Python 3.x,Pandas,我有一个数据帧,比如 Groups NAME Number G1 A 1 G1 B 2 G1 D NaN G1 D NaN G1 I 3 G1 H NaN G2 E 1 G2 E 1 G2 F NaN G2 J 2 G3 K NaN G3 L 1 我想通过填写数字来填充组中的NaN值 例如,G1中的D获取编号4,因为1、2
Groups NAME Number
G1 A 1
G1 B 2
G1 D NaN
G1 D NaN
G1 I 3
G1 H NaN
G2 E 1
G2 E 1
G2 F NaN
G2 J 2
G3 K NaN
G3 L 1
我想通过填写数字来填充组中的NaN值
例如,G1
中的D
获取编号4,因为1、2和3已经存在。
然后H在G1
中获得数字5
等
在那一刻,我应该得到
Groups NAME Number
G1 A 1
G1 B 2
G1 D 4
G1 D 4
G1 I 3
G1 H 5
G2 E 1
G2 E 1
G2 F 3
G2 J 2
G3 K 2
G3 L 1
有人有什么想法吗?这里有一种方法使用
pd.factorize()
您可以使用
groupby
+ngroup
为每个组/名称添加带递增整数的空值。然后我们减去组内的最小n组
(以确定要添加的数量),然后添加组内已经存在的最大数量
然后,我们用这个系列fillna
s = df[df['Number'].isnull()].groupby(['Groups', 'NAME']).ngroup()
#2 0 #<- G1/D (Series index is DataFrame index)
#3 0 #<- G1/D
#5 1 #<- G1/H
#8 2 #<- G2/F
#10 3 #<- G3/K
to_fill = (s - s.groupby(df['Groups']).transform('min') + 1
+ df.groupby('Groups')['Number'].transform('max'))
#0 NaN
#1 NaN
#2 4.0
#3 4.0
#4 NaN
#5 5.0
#6 NaN
#7 NaN
#8 3.0
#9 NaN
#10 2.0
#11 NaN
df['Number'] = df['Number'].fillna(to_fill, downcast='infer')
# Groups NAME Number
#0 G1 A 1
#1 G1 B 2
#2 G1 D 4
#3 G1 D 4
#4 G1 I 3
#5 G1 H 5
#6 G2 E 1
#7 G2 E 1
#8 G2 F 3
#9 G2 J 2
#10 G3 K 2
#11 G3 L 1
s=df[df['Number'].isnull()].groupby(['group','NAME']).ngroup()
#2 0 #
s = df[df['Number'].isnull()].groupby(['Groups', 'NAME']).ngroup()
#2 0 #<- G1/D (Series index is DataFrame index)
#3 0 #<- G1/D
#5 1 #<- G1/H
#8 2 #<- G2/F
#10 3 #<- G3/K
to_fill = (s - s.groupby(df['Groups']).transform('min') + 1
+ df.groupby('Groups')['Number'].transform('max'))
#0 NaN
#1 NaN
#2 4.0
#3 4.0
#4 NaN
#5 5.0
#6 NaN
#7 NaN
#8 3.0
#9 NaN
#10 2.0
#11 NaN
df['Number'] = df['Number'].fillna(to_fill, downcast='infer')
# Groups NAME Number
#0 G1 A 1
#1 G1 B 2
#2 G1 D 4
#3 G1 D 4
#4 G1 I 3
#5 G1 H 5
#6 G2 E 1
#7 G2 E 1
#8 G2 F 3
#9 G2 J 2
#10 G3 K 2
#11 G3 L 1