Python中Groupby和ffill指定的列
我想按Python中Groupby和ffill指定的列,python,pandas,pandas-resample,Python,Pandas,Pandas Resample,我想按id,code,Timestamp(因为时间顺序很重要),然后按d1使用id和code对值进行排序,然后仅在V1和V2列上使用ffill对NaN对每组进行前向填充,在保持其他列不变的情况下,返回完整表 d1: Type_x id_ Timestamp V1 Code Type_y V2 0 abcd 39-38-30-34 2012-09-20 23:46:05.870 35.5 2 N
id
,code
,Timestamp
(因为时间顺序很重要),然后按d1
使用id
和code
对值进行排序,然后仅在V1
和V2
列上使用ffill
对NaN
对每组进行前向填充,在保持其他列不变的情况下,返回完整表
d1
:
Type_x id_ Timestamp V1 Code Type_y V2
0 abcd 39-38-30-34 2012-09-20 23:46:05.870 35.5 2 NaN 0
1 abcd 39-38-30-34 2012-09-20 23:46:23.870 44.5 0 NaN 1
2 abcd 39-38-30-34 2012-09-20 23:48:07.870 43.5 0 NaN 1
3 abcd 39-38-30-34 2012-09-20 23:49:48.870 42.5 0 NaN NaN
4 abcd 39-38-30-34 2012-09-20 23:50:44.870 34.5 2 NaN NaN
尝试:
d2 = d1.sort_values(by = ['id_', 'Code', 'Timestamp']).groupby(['id_', 'Code'])['V1', 'V2'].ffill()
它只返回两列:
V1 V2
69659 21.5 NaN
300886 21.5 1.0
300887 21.5 0.0
70086 23.0 0.0
300955 23.0 1.0
我应该怎样做才对?您需要退回什么
d2 = d1.sort_values(by = ['id_', 'Code', 'Timestamp']).groupby(['id_', 'Code']).ffill()
Type_x Timestamp V1 Type_y V2
1 abcd 39-38-30-34 23:46:23.870 44.5 NaN 1.0
2 abcd 39-38-30-34 23:48:07.870 43.5 NaN 1.0
3 abcd 39-38-30-34 23:49:48.870 42.5 NaN 1.0
0 abcd 39-38-30-34 23:46:05.870 35.5 NaN 0.0
4 abcd 39-38-30-34- 23:50:44.870 34.5 NaN 0.0
或
d2 = d1.sort_values(by = ['id_', 'Code', 'Timestamp']).groupby(['id_', 'Code']).ffill().dropna(1)
print(d2)
Type_x Timestamp V1 V2
1 abcd 39-38-30-34 23:46:23.870 44.5 1.0
2 abcd 39-38-30-34 23:48:07.870 43.5 1.0
3 abcd 39-38-30-34 23:49:48.870 42.5 1.0
0 abcd 39-38-30-34 23:46:05.870 35.5 0.0
4 abcd 39-38-30-34- 23:50:44.870 34.5 0.0
你需要归还什么
d2 = d1.sort_values(by = ['id_', 'Code', 'Timestamp']).groupby(['id_', 'Code']).ffill()
Type_x Timestamp V1 Type_y V2
1 abcd 39-38-30-34 23:46:23.870 44.5 NaN 1.0
2 abcd 39-38-30-34 23:48:07.870 43.5 NaN 1.0
3 abcd 39-38-30-34 23:49:48.870 42.5 NaN 1.0
0 abcd 39-38-30-34 23:46:05.870 35.5 NaN 0.0
4 abcd 39-38-30-34- 23:50:44.870 34.5 NaN 0.0
或
d2 = d1.sort_values(by = ['id_', 'Code', 'Timestamp']).groupby(['id_', 'Code']).ffill().dropna(1)
print(d2)
Type_x Timestamp V1 V2
1 abcd 39-38-30-34 23:46:23.870 44.5 1.0
2 abcd 39-38-30-34 23:48:07.870 43.5 1.0
3 abcd 39-38-30-34 23:49:48.870 42.5 1.0
0 abcd 39-38-30-34 23:46:05.870 35.5 0.0
4 abcd 39-38-30-34- 23:50:44.870 34.5 0.0
如果实际数据帧中除了要
groupby
的列和要ffill
的列之外,还有其他列,则可以使用transform
并逐列执行此操作:
d2 = d1.sort_values(by = ['id_', 'Code', 'Timestamp'])
d2['V1'] = d2.groupby(['id_', 'Code'])['V1'].transform(lambda x: x.ffill())
d2['V2'] = d2.groupby(['id_', 'Code'])['V2'].transform(lambda x: x.ffill())
d2
Out[1]:
Type_x id_ Timestamp V1 Code Type_y V2
1 abcd 39-38-30-34 2012-09-20 23:46:23.870 44.5 0 NaN 1.0
2 abcd 39-38-30-34 2012-09-20 23:48:07.870 43.5 0 NaN 1.0
3 abcd 39-38-30-34 2012-09-20 23:49:48.870 42.5 0 NaN 1.0
0 abcd 39-38-30-34 2012-09-20 23:46:05.870 35.5 2 NaN 0.0
4 abcd 39-38-30-34 2012-09-20 23:50:44.870 34.5 2 NaN 0.0
如果实际数据帧中除了要
groupby
的列和要ffill
的列之外,还有其他列,则可以使用transform
并逐列执行此操作:
d2 = d1.sort_values(by = ['id_', 'Code', 'Timestamp'])
d2['V1'] = d2.groupby(['id_', 'Code'])['V1'].transform(lambda x: x.ffill())
d2['V2'] = d2.groupby(['id_', 'Code'])['V2'].transform(lambda x: x.ffill())
d2
Out[1]:
Type_x id_ Timestamp V1 Code Type_y V2
1 abcd 39-38-30-34 2012-09-20 23:46:23.870 44.5 0 NaN 1.0
2 abcd 39-38-30-34 2012-09-20 23:48:07.870 43.5 0 NaN 1.0
3 abcd 39-38-30-34 2012-09-20 23:49:48.870 42.5 0 NaN 1.0
0 abcd 39-38-30-34 2012-09-20 23:46:05.870 35.5 2 NaN 0.0
4 abcd 39-38-30-34 2012-09-20 23:50:44.870 34.5 2 NaN 0.0
请发布您的预期输出注:您的id列第5行可能有一个输入错误:
39-38-30-34-
(末尾的破折号)请发布您的预期输出注:您的id列第5行可能有一个输入错误:39-38-30-34-
(结尾的破折号)@nilsinelabore这很有帮助,如果你愿意,我可以调查一下needed@nilsinelabore这有帮助吗?如果你需要的话,可以调查一下