Python 熊猫将groupby中的最后一项替换为另一列_Python_Pandas_Time Series_Pandas Groupby

Python 熊猫将groupby中的最后一项替换为另一列

python pandas

Python 熊猫将groupby中的最后一项替换为另一列,python,pandas,time-series,pandas-groupby,Python,Pandas,Time Series,Pandas Groupby,我正试图用另一列的值替换group by中的最后一行，前提是它为null。我可以分别做这两件事，但似乎不能把它们结合起来。有人有什么想法吗这些是独立的部分： # replace any NaN values with values from 'target' df.loc[df['target'].isnull(),'target'] = df['value'] # replace last value in groupby with value from 'target' df.loc[d

我正试图用另一列的值替换group by中的最后一行，前提是它为null。我可以分别做这两件事，但似乎不能把它们结合起来。有人有什么想法吗

这些是独立的部分：

# replace any NaN values with values from 'target'
df.loc[df['target'].isnull(),'target'] = df['value']

# replace last value in groupby with value from 'target'
df.loc[df.groupby('id').tail(1).index,'target'] = df['value']

原始数据：

    date        id      value       target
0   2020-08-07  id01    0.100775    NaN
1   2020-08-08  id01    0.215885    0.215885
2   2020-08-09  id01    0.012154    0.012154
3   2020-08-10  id01    0.374503    NaN
4   2020-08-07  id02    0.369707    0.369707
5   2020-08-08  id02    0.676743    0.676743
6   2020-08-09  id02    0.659521    0.659521
7   2020-08-10  id02    0.799071    NaN

将groupby（'id'）中的最后一行替换为'value'中的'target'列：

    date        id      value       target
0   2020-08-07  id01    0.100775    NaN
1   2020-08-08  id01    0.215885    0.215885
2   2020-08-09  id01    0.012154    0.012154
3   2020-08-10  id01    0.374503    0.374503
4   2020-08-07  id02    0.369707    0.369707
5   2020-08-08  id02    0.676743    0.676743
6   2020-08-09  id02    0.659521    0.659521
7   2020-08-10  id02    0.799071    0.799071

这应该可以。添加了

tail

变量，只是为了更容易阅读语法：

tail = df.groupby('id').tail(1)
df.loc[tail.index,'target'] = df.loc[tail.index]['target'].fillna(tail.value)

输出：

0 idx        date    id     value    target
1   0  2020-08-07  id01  0.100775       NaN
2   1  2020-08-08  id01  0.215885  0.215885
3   2  2020-08-09  id01  0.012154  0.012154
4   3  2020-08-10  id01  0.374503  0.374503
5   4  2020-08-07  id02  0.369707  0.369707
6   5  2020-08-08  id02  0.676743  0.676743
7   6  2020-08-09  id02  0.659521  0.659521
8   7  2020-08-10  id02  0.799071  0.799071

fillna

在整个列上，但如果缺少的不是每个“id”的最后一个，则屏蔽回

NaN

m = df['target'].isnull() & df['id'].duplicated(keep='last')
df['target'] = df['target'].fillna(df['value']).mask(m)

首先使用combine_，您将无法进行选择

第一选项

将

.groupby（）

与

nth（值）

第二选项

使用

combine\u first

使用

.iloc

访问器以一行

groupby

     df.groupby('id').apply(lambda x:(x.iloc[-1:,3].combine_first(x.iloc[-1:,2])))\
.reset_index(level=0).combine_first(df)

第三种选择

选择每组中的最后一个索引。根据需要填充列目标并首先使用combine_更新df

g=df.groupby('id').apply(lambda x:x.iloc[-1:]).reset_index(level=0, drop=True)
#df.loc[g, 'target'] = df['target'].combine_first(df['value'])
g.target=g.value
g.combine_first(df)



   date    id     value    target
0  2020-08-07  id01  0.100775       NaN
1  2020-08-08  id01  0.215885  0.215885
2  2020-08-09  id01  0.012154  0.012154
3  2020-08-10  id01  0.374503  0.374503
4  2020-08-07  id02  0.369707  0.369707
5  2020-08-08  id02  0.676743  0.676743
6  2020-08-09  id02  0.659521  0.659521
7  2020-08-10  id02  0.799071  0.799071

在

groupby（）

中查找最后一个目标的索引，然后使用
仅替换空值。首先组合（）

@野狗这有帮助吗？很乐意进一步帮助
df.groupby('id').apply(lambda x:(x.iloc[-1:,3].combine_first(x.iloc[-1:,2])))\ .reset_index(level=0).combine_first(df)

g=df.groupby('id').apply(lambda x:x.iloc[-1:]).reset_index(level=0, drop=True) #df.loc[g, 'target'] = df['target'].combine_first(df['value']) g.target=g.value g.combine_first(df) date id value target 0 2020-08-07 id01 0.100775 NaN 1 2020-08-08 id01 0.215885 0.215885 2 2020-08-09 id01 0.012154 0.012154 3 2020-08-10 id01 0.374503 0.374503 4 2020-08-07 id02 0.369707 0.369707 5 2020-08-08 id02 0.676743 0.676743 6 2020-08-09 id02 0.659521 0.659521 7 2020-08-10 id02 0.799071 0.799071

indexes = df.groupby('id').tail(1).index df.loc[indexes, 'target'] = df['target'].combine_first(df['value']) #result date id value target 0 2020-08-07 id01 0.100775 NaN 1 2020-08-08 id01 0.215885 0.215885 2 2020-08-09 id01 0.012154 0.012154 3 2020-08-10 id01 0.374503 0.374503 4 2020-08-07 id02 0.369707 0.369707 5 2020-08-08 id02 0.676743 0.676743 6 2020-08-09 id02 0.659521 0.659521 7 2020-08-10 id02 0.799071 0.799071