Python Dataframe groupby-值列表
我有以下数据帧:Python Dataframe groupby-值列表,python,pandas,pandas-groupby,Python,Pandas,Pandas Groupby,我有以下数据帧: driver_id status dttm 9f8f9bf3ee8f4874873288c246bd2d05 free 2018-02-04 00:19 9f8f9bf3ee8f4874873288c246bd2d05 busy 2018-02-04 01:03 8f174ffd446c456eaf3cca0915d0368d free 2018-02-03 15:43 8f174ffd4
driver_id status dttm
9f8f9bf3ee8f4874873288c246bd2d05 free 2018-02-04 00:19
9f8f9bf3ee8f4874873288c246bd2d05 busy 2018-02-04 01:03
8f174ffd446c456eaf3cca0915d0368d free 2018-02-03 15:43
8f174ffd446c456eaf3cca0915d0368d enroute 2018-02-03 17:02
3列:驱动程序id、状态、dttm
我需要做的是按驱动程序id分组,并将所有状态及其各自的dttm值列成名为'driver\u info'
的新列:
driver_id driver_info
9f8f9bf3ee8f4874873288c246bd2d05 [("free", 2018-02-04 00:19), ("busy", 2018-02-04 01:03)]
8f174ffd446c456eaf3cca0915d0368d [("free", 2018-02-03 15:43), ("enroute", 2018-02-03 17:02) ...]
在Python3中如何实现这一点
我试过了
dfg=df.groupby(“driver_id”).apply(lambda x:pd.concat((x[“status”],x[“dttm”]))
但是结果与我预期的不同…尝试:使用zip和apply(列表)
用于元组列表的list
和zip
:
df1 = (df.groupby('driver_id')
.apply(lambda x: list(zip(x['status'], x['dttm'])))
.reset_index(name='driver_info'))
print (df1)
driver_id \
0 8f174ffd446c456eaf3cca0915d0368d
1 9f8f9bf3ee8f4874873288c246bd2d05
driver_info
0 [(free, 2018-02-03 15:43), (enroute, 2018-02-0...
1 [(free, 2018-02-04 00:19), (busy, 2018-02-04 0...
很好用!这正是我需要的结果。非常感谢。
df1 = (df.groupby('driver_id')
.apply(lambda x: list(zip(x['status'], x['dttm'])))
.reset_index(name='driver_info'))
print (df1)
driver_id \
0 8f174ffd446c456eaf3cca0915d0368d
1 9f8f9bf3ee8f4874873288c246bd2d05
driver_info
0 [(free, 2018-02-03 15:43), (enroute, 2018-02-0...
1 [(free, 2018-02-04 00:19), (busy, 2018-02-04 0...