Python 填写每组行的数据帧值
假设我有以下数据集:Python 填写每组行的数据帧值,python,pandas,Python,Pandas,假设我有以下数据集: Time Geography Sex Population 1990 Northern Ireland Male NA 1990 Northern Ireland Female NA 1990 Northern Ireland Total NA 1991 Northern Ireland Male NA 1991 Northern Ireland Female
Time Geography Sex Population
1990 Northern Ireland Male NA
1990 Northern Ireland Female NA
1990 Northern Ireland Total NA
1991 Northern Ireland Male NA
1991 Northern Ireland Female NA
1991 Northern Ireland Total NA
1992 Northern Ireland Male 792100
1992 Northern Ireland Female 831100
1992 Northern Ireland Total 1623300
1993 Northern Ireland Male 812100
1993 Northern Ireland Female 851100
1993 Northern Ireland Total 1663200
最后,我想做以下几点:
Time Geography Sex Population
1990 Northern Ireland Male 792100
1990 Northern Ireland Female 831100
1990 Northern Ireland Total 1623300
1991 Northern Ireland Male 792100
1991 Northern Ireland Female 831100
1991 Northern Ireland Total 1623300
1992 Northern Ireland Male 792100
1992 Northern Ireland Female 831100
1992 Northern Ireland Total 1623300
1993 Northern Ireland Male 812100
1993 Northern Ireland Female 851100
1993 Northern Ireland Total 1663200
这意味着基本上我想用没有NAs的第一年的值来填写前几年的值
我该怎么做?您可以尝试以下方法:
df.set_index(['Time','Geography','Sex']).unstack().bfill().stack().reset_index()
输出:
Time Geography Sex Population
0 1990 Northern Ireland Female 831100.0
1 1990 Northern Ireland Male 792100.0
2 1990 Northern Ireland Total 1623300.0
3 1991 Northern Ireland Female 831100.0
4 1991 Northern Ireland Male 792100.0
5 1991 Northern Ireland Total 1623300.0
6 1992 Northern Ireland Female 831100.0
7 1992 Northern Ireland Male 792100.0
8 1992 Northern Ireland Total 1623300.0
您可以使用方法bfill
链接,然后按顺序恢复原始索引:
df = df.sort_values(['Sex']).fillna(method='bfill').sort_index()
print(df)
Time Geography Sex Population
0 1990 Northern Ireland Male 792100.0
1 1990 Northern Ireland Female 831100.0
2 1990 Northern Ireland Total 1623300.0
3 1991 Northern Ireland Male 792100.0
4 1991 Northern Ireland Female 831100.0
5 1991 Northern Ireland Total 1623300.0
6 1992 Northern Ireland Male 792100.0
7 1992 Northern Ireland Female 831100.0
8 1992 Northern Ireland Total 1623300.0
我将使用
groupby
和bfill
和ffill
(我添加ffill
和bfill
只是为了保护)
我更喜欢你的方法。很好的回答+1谢谢你,先生,感谢@ScottBostonHey,谢谢你。我认为这是可行的。不过,为了确保我想用没有NAs的第一年的值来填写前几年的值。我不希望有值需要更改的行中有任何
填充
值。我理解。这适用于所示的示例,但此解决方案的关键是排序正确。因此,没有更多的示例数据集,我们无法进行测试。如果您提供的数据集代表您的真实案例数据集。这很有效。如果您还有更多问题,请提问,很高兴帮助@PoeteMauditNo,我认为它仍然有效。我刚刚编辑了我的文章,为1993年添加了一些值,以表明这些值以及1992年之后不包含NAs的任何其他值都不应受到影响。谢谢你,但我希望保持我在上面的输出中显示的行顺序。嘿,谢谢你。我认为这是可行的。不过,为了确保我想用没有NAs的第一年的值来填写前几年的值。我不希望有值要更改的行中有任何Population
值。@PoeteMaudit如果有df['Population']=df.groupby(['Geography','Sex']).Population.bfill()
此外,当您去年是NaN时,如果没有groupby,输出将不正确
df['Population']=df.groupby(['Geography','Sex']).Population.apply(lambda x : x.ffill().bfill())
df
Time Geography Sex Population
0 1990 NorthernIreland Male 792100.0
1 1990 NorthernIreland Female 831100.0
2 1990 NorthernIreland Total 1623300.0
3 1991 NorthernIreland Male 792100.0
4 1991 NorthernIreland Female 831100.0
5 1991 NorthernIreland Total 1623300.0
6 1992 NorthernIreland Male 792100.0
7 1992 NorthernIreland Female 831100.0
8 1992 NorthernIreland Total 1623300.0