Python 填写每组行的数据帧值_Python_Pandas

Python 填写每组行的数据帧值

python pandas

Python 填写每组行的数据帧值,python,pandas,Python,Pandas,假设我有以下数据集： Time Geography Sex Population 1990 Northern Ireland Male NA 1990 Northern Ireland Female NA 1990 Northern Ireland Total NA 1991 Northern Ireland Male NA 1991 Northern Ireland Female

假设我有以下数据集：

Time    Geography           Sex     Population
1990    Northern Ireland    Male    NA
1990    Northern Ireland    Female  NA
1990    Northern Ireland    Total   NA
1991    Northern Ireland    Male    NA
1991    Northern Ireland    Female  NA
1991    Northern Ireland    Total   NA
1992    Northern Ireland    Male    792100
1992    Northern Ireland    Female  831100
1992    Northern Ireland    Total   1623300
1993    Northern Ireland    Male    812100
1993    Northern Ireland    Female  851100
1993    Northern Ireland    Total   1663200

最后，我想做以下几点：

Time    Geography           Sex     Population
1990    Northern Ireland    Male    792100
1990    Northern Ireland    Female  831100
1990    Northern Ireland    Total   1623300
1991    Northern Ireland    Male    792100
1991    Northern Ireland    Female  831100
1991    Northern Ireland    Total   1623300
1992    Northern Ireland    Male    792100
1992    Northern Ireland    Female  831100
1992    Northern Ireland    Total   1623300
1993    Northern Ireland    Male    812100
1993    Northern Ireland    Female  851100
1993    Northern Ireland    Total   1663200

这意味着基本上我想用没有NAs的第一年的值来填写前几年的值

我该怎么做？

您可以尝试以下方法：

df.set_index(['Time','Geography','Sex']).unstack().bfill().stack().reset_index()

输出：

   Time         Geography     Sex  Population
0  1990  Northern Ireland  Female    831100.0
1  1990  Northern Ireland    Male    792100.0
2  1990  Northern Ireland   Total   1623300.0
3  1991  Northern Ireland  Female    831100.0
4  1991  Northern Ireland    Male    792100.0
5  1991  Northern Ireland   Total   1623300.0
6  1992  Northern Ireland  Female    831100.0
7  1992  Northern Ireland    Male    792100.0
8  1992  Northern Ireland   Total   1623300.0

您可以使用方法

bfill

链接，然后按顺序恢复原始索引：

df = df.sort_values(['Sex']).fillna(method='bfill').sort_index()

print(df)
   Time         Geography     Sex  Population
0  1990  Northern Ireland    Male    792100.0
1  1990  Northern Ireland  Female    831100.0
2  1990  Northern Ireland   Total   1623300.0
3  1991  Northern Ireland    Male    792100.0
4  1991  Northern Ireland  Female    831100.0
5  1991  Northern Ireland   Total   1623300.0
6  1992  Northern Ireland    Male    792100.0
7  1992  Northern Ireland  Female    831100.0
8  1992  Northern Ireland   Total   1623300.0

我将使用

groupby

和

bfill

和

ffill

（我添加

ffill

和

bfill

只是为了保护）

我更喜欢你的方法。很好的回答+1谢谢你，先生，感谢@ScottBostonHey，谢谢你。我认为这是可行的。不过，为了确保我想用没有NAs的第一年的值来填写前几年的值。我不希望有值需要更改的行中有任何

填充

值。我理解。这适用于所示的示例，但此解决方案的关键是排序正确。因此，没有更多的示例数据集，我们无法进行测试。如果您提供的数据集代表您的真实案例数据集。这很有效。如果您还有更多问题，请提问，很高兴帮助@PoeteMauditNo，我认为它仍然有效。我刚刚编辑了我的文章，为1993年添加了一些值，以表明这些值以及1992年之后不包含NAs的任何其他值都不应受到影响。谢谢你，但我希望保持我在上面的输出中显示的行顺序。嘿，谢谢你。我认为这是可行的。不过，为了确保我想用没有NAs的第一年的值来填写前几年的值。我不希望有值要更改的行中有任何

Population

值。@PoeteMaudit如果有

df['Population']=df.groupby（['Geography'，'Sex']）.Population.bfill（）

此外，当您去年是NaN时，如果没有groupby，输出将不正确

df['Population']=df.groupby(['Geography','Sex']).Population.apply(lambda x : x.ffill().bfill())
df
   Time        Geography     Sex  Population
0  1990  NorthernIreland    Male    792100.0
1  1990  NorthernIreland  Female    831100.0
2  1990  NorthernIreland   Total   1623300.0
3  1991  NorthernIreland    Male    792100.0
4  1991  NorthernIreland  Female    831100.0
5  1991  NorthernIreland   Total   1623300.0
6  1992  NorthernIreland    Male    792100.0
7  1992  NorthernIreland  Female    831100.0
8  1992  NorthernIreland   Total   1623300.0