Python 根据表格中的另一列和行填写NaN值_Python_Pandas_Dataframe

Python 根据表格中的另一列和行填写NaN值

python pandas dataframe

Python 根据表格中的另一列和行填写NaN值,python,pandas,dataframe,Python,Pandas,Dataframe,我有一个类似这样的DF： Name Food Year_eaten Month_eaten Maria Rice 2014 3 Maria Rice 2015 NaN Maria Rice 2016 NaN Jack Steak 2011 NaN Jack Ste

我有一个类似这样的DF：

Name      Food      Year_eaten      Month_eaten

Maria     Rice        2014               3
Maria     Rice        2015              NaN
Maria     Rice        2016              NaN
Jack      Steak       2011              NaN
Jack      Steak       2012               5
Jack      Steak       2013              NaN

我希望输出如下所示：

Name      Food      Year_eaten      Month_eaten

Maria     Rice        2014               3
Maria     Rice        2015               3
Maria     Rice        2016               3
Jack      Steak       2011               5
Jack      Steak       2012               5
Jack      Steak       2013               5

我想根据以下条件填写NaN：

If the row's Name, Food is the same and the Year's are consecutive:
     Fill the NaN's with the Month_eaten corresponding to the row that isn't a NaN

会有一个人吃了一个月所有的NaN's，但我现在不需要担心。只有一个在任何一年中吃的月份至少有一个值

任何想法都将不胜感激

您可以对“姓名”、“食物”和由

diff

创建的自定义列进行分组，并对“Year\u Eated”行进行排序

另一种解决方案是，如果没有一个组的所有行都带有NaN，则使用

groupby

和

ffill

（其他所有内容都相同）

使用

diff（）.ne（1）.cumsum（）

创建连续年份组键

continueyear=df.groupby(['Name','Food']).Year_eaten.apply(lambda x : x.diff().ne(1).cumsum())

然后使用

groupby

和

apply

ffill

和

bfill

df.groupby([df.Name,df.Food,continueyear]).Month_eaten.apply(lambda x : x.ffill().bfill().astype(int))
Out[26]:
0    3
1    3
2    3
3    5
4    5
5    5
Name: Month_eaten, dtype: int32

现在看起来好多了：-）

continueyear=df.groupby(['Name','Food']).Year_eaten.apply(lambda x : x.diff().ne(1).cumsum())

df.groupby([df.Name,df.Food,continueyear]).Month_eaten.apply(lambda x : x.ffill().bfill().astype(int))
Out[26]:
0    3
1    3
2    3
3    5
4    5
5    5
Name: Month_eaten, dtype: int32