Python 如何在日期列上创建数据透视表并计算时间差?
我有以下数据帧Python 如何在日期列上创建数据透视表并计算时间差?,python,pandas,pivot-table,Python,Pandas,Pivot Table,我有以下数据帧 D_DATE BIN Number Disposition Unit Assigned 2018-01-04 10005 SWO Issued PLUMBING DIVISION 2016-06-23 10005 SWO Issued SCAFFOLD UNIT 2016-06-23 10005 SWO Rescinded SCAFFOLD U
D_DATE BIN Number Disposition Unit Assigned
2018-01-04 10005 SWO Issued PLUMBING DIVISION
2016-06-23 10005 SWO Issued SCAFFOLD UNIT
2016-06-23 10005 SWO Rescinded SCAFFOLD UNIT
2018-01-17 10005 SWO Rescinded PLUMBING DIVISION
2019-01-04 10006 SWO Rescinded BEST SQUAD
2018-12-21 10006 SWO Issued BEST SQUAD
2020-02-10 10006 SWO Issued BEST SQUAD
2020-02-25 10006 SWO Rescinded BEST SQUAD
df = pd.DataFrame({'D_DATE':['2018-01-04','2016-06-23','2016-06-23','2018-01-17','2019-01-04','2018-12-21','2020-02-10','2020-02-25'],
'BIN Number': ['10005', '10005', '10005', '10005', '10006','10006','10006','10006] ,
'Disposition': ['SWO Issued', 'SWO Issued', 'SWO Rescinded', 'SWO Rescinded','SWO Rescinded','SWO Issued','SWO Issued','SWO Rescinded'] ,
'Unit Assigned': ['PLUMBING DIVISION', 'SCAFFOLD UNIT', 'SCAFFOLD UNIT', 'PLUMBING DIVISION','BEST SQUAD','BEST SQUAD','BEST SQUAD','BEST SQUAD']})
如果可能的话,我想创建一个数据透视表,这样我就有两列用于日期,一列用于发布数据,另一列用于撤销日期,但在数据透视中,我需要维护单位,因此我应该有三列:
单位、发行日期、撤销日期
接下来我想计算发行日期和撤销日期之间的时间差
输出:
Unit Assigned SWO Issued SWO Rescinded Time Difference
PLUMBING DIVISION 2018-01-04 2018-01-17 13 days
SCAFFOLD UNIT 2016-06-23 2016-06-23 0 days
BEST SQUAD 2018-12-21 2019-01-04 14 days
BEST SQUAD 2020-02-10 2020-02-25 15 days
Disposition BIN Number Unit Assigned SWO Issued SWO Rescinded Time_Different
0 10005 PLUMBING DIVISION 2018-01-04 2018-01-17 13 days
1 10005 SCAFFOLD UNIT 2016-06-23 2016-06-23 0 days
2 10006 BEST SQUAD 2018-12-21 2019-01-04 14 days
3 10006 BEST SQUAD 2020-02-10 2020-02-25 15 days
谢谢你的帮助。谢谢 我相信这是
pivot/pivot\u表
:
# convert to datetime if not already is
df['D_DATE'] = pd.to_datetime(df['D_DATE'])
(df.assign(idx=df.groupby(['BIN Number', 'Disposition','Unit Assigned']).cumcount())
.pivot_table(index=['idx','BIN Number', 'Unit Assigned'],
columns='Disposition',
values='D_DATE',
aggfunc='first')
.reset_index()
.assign(Time_Different=lambda x: x['SWO Rescinded'] - x['SWO Issued'])
.drop('idx',axis=1)
)
输出:
Unit Assigned SWO Issued SWO Rescinded Time Difference
PLUMBING DIVISION 2018-01-04 2018-01-17 13 days
SCAFFOLD UNIT 2016-06-23 2016-06-23 0 days
BEST SQUAD 2018-12-21 2019-01-04 14 days
BEST SQUAD 2020-02-10 2020-02-25 15 days
Disposition BIN Number Unit Assigned SWO Issued SWO Rescinded Time_Different
0 10005 PLUMBING DIVISION 2018-01-04 2018-01-17 13 days
1 10005 SCAFFOLD UNIT 2016-06-23 2016-06-23 0 days
2 10006 BEST SQUAD 2018-12-21 2019-01-04 14 days
3 10006 BEST SQUAD 2020-02-10 2020-02-25 15 days
你好谢谢,它能工作,但有一个小问题。例如,如果我有相同的单位问题&在不同的时间撤销相同仓位号的SWO,它无法计算。如果管道部门对BIN 10005发布了另一份SWO,则该SWO不会出现。它只显示第一个。帖子现在更新了。请看一看。谢谢,搞定了!!伟大的谢谢你的帮助。谢谢。