Python 如何转换数据帧以获取各种事件发生的时间?

Python 如何转换数据帧以获取各种事件发生的时间?,python,pandas,dataframe,pandas-groupby,Python,Pandas,Dataframe,Pandas Groupby,给定以下数据帧: +-------+-----+-------+-----+--------+---------------------------+ | DID | CID | Event | OID | Source | TimeStamp | +-------+-----+-------+-----+--------+---------------------------+ | 25078 | 14 | QBT | 0 | EMS | 2

给定以下数据帧:

+-------+-----+-------+-----+--------+---------------------------+
|  DID  | CID | Event | OID | Source |         TimeStamp         |
+-------+-----+-------+-----+--------+---------------------------+
| 25078 |  14 | QBT   |   0 | EMS    | 2019-10-15 10:54:35 +0000 |
| 25078 |  14 | NDOBT |   0 | EMS    | 2019-10-15 10:54:48 +0000 |
| 25078 |  14 | SBT   |   0 | EMS    | 2019-10-15 10:54:52 +0000 |
| 25078 |  14 | SBT-1 |   0 | ECS    | 2019-10-15 11:00:01 +0000 |
| 25078 |  14 | SBT-1 |   0 | ECS    | 2019-10-15 11:00:26 +0000 |
| 25078 |  14 | SBT-1 |   0 | ECS    | 2019-10-15 11:00:50 +0000 |
| 25078 |  14 | SBT   |   0 | EMS    | 2019-10-15T14:27:45       |
| 25078 |  14 | SBT   |   0 | EMS    | 2019-10-15T14:27:45       |
| 25078 |  14 | LSFA  |   0 | SPDLS  | 2019-10-15T14:28:16       |
| 25078 |  14 | LSFA  |   0 | SPDLS  | 2019-10-15T14:28:16       |
| 25078 |  14 | FEAR  |   0 | CBS    | 2019-10-15T14:28:18       |
| 25078 |  14 | FEAR  |   0 | CBS    | 2019-10-15T14:28:18       |
| 25078 |  14 | SBT   |   0 | EMS    | 2019-10-15T14:28:44       |
| 25078 |  14 | SBT   |   0 | EMS    | 2019-10-15T14:28:44       |
| 25078 |  14 | LSFA  |   0 | SPDLS  | 2019-10-15T14:30:55       |
| 25078 |  14 | LSFA  |   0 | SPDLS  | 2019-10-15T14:30:55       |
| 25078 |  14 | SBT   |   0 | EMS-1  | 2019-10-15T15:28:43       |
| 25078 |  14 | SBT   |   0 | EMS-1  | 2019-10-15T15:29:02       |
| 25078 |  14 | FEAR  |   0 | CBS    | 2019-10-15T15:30:51       |
| 25078 |  14 | FEAR  |   0 | CBS    | 2019-10-15T15:30:51       |
| 25078 |  14 | DBT   |   0 | RS     | 2019-10-15T15:44:23       |
| 25078 |  14 | QBT   |   0 | EMS-1  | 2019-10-15T16:02:16       |
+-------+-----+-------+-----+--------+---------------------------+
我希望获得一些事件和源的第一次和最后一次出现,以便最终输出如下所示:

+-------+-----+---------------------+--------------------+---------------------+--------------------+---------------------------+---------------------------+---------------------------+---------------------+
|  DID  | CID |  Event-QBT-Last-DT  | Event-QBT-First-DT |  Event-SBT-Last-DT  | Event-SBT-First-DT |    Screen-ECS-First-DT    |    Screen-ECS-Last-DT     |      FirstTimeUsage       |   LastTime Usage    |
+-------+-----+---------------------+--------------------+---------------------+--------------------+---------------------------+---------------------------+---------------------------+---------------------+
| 25078 |  14 | 2019-10-15T16:02:16 | 10/15/19 10:54 AM  | 2019-10-15T15:29:02 | 10/15/19 10:54 AM  | 2019-10-15 11:00:01 +0000 | 2019-10-15 11:00:50 +0000 | 2019-10-15 10:54:35 +0000 | 2019-10-15T16:02:16 |
+-------+-----+---------------------+--------------------+---------------------+--------------------+---------------------------+---------------------------+---------------------------+---------------------+

def f(x):
    first_used_DT=x.min()['TimeStamp']
    last_used_DT=x.max()['TimeStamp']
    first_ECS=x[x['Source']=='ECS'].min()['TimeStamp']
    last_ECS=x[x['Source']=='ECS'].max()['TimeStamp']
    last_OBT=x[x['Event']=='QBT'].max()['TimeStamp']
    first_QBT=x[x['Event']=='QBT'].min()['TimeStamp']
    last_SBT=x[x['Event']=='SBT'].max()['TimeStamp']
    first_SBT=x[x['Event']=='SBT'].min()['TimeStamp']

    return pd.DataFrame({'FirstTimeUsage': first_used_DT, 'LastTime Usage': last_used_DT,
                         'Screen-ECS-First-DT':first_ECS,'Screen-ECS-Last-DT':last_ECS
                        'Event-QBT-First-DT':first_QBT, 'Event-QBT-Last-DT':last_OBT,
                         'Event-SBT-First-DT':first_SBT, 'Event-SBT-Last-DT':last_SBT
                         }, index=[0])
如何使用pandas实现这一点。

的想法是通过获取某些事件的第一次和最后一次出现来过滤行,然后使用with
first
last
并在列中通过、最后一次展平
多索引来重塑:

L = ['QBT','SBT']

df1 = (df[df['Event'].isin(L)]
         .groupby(['OID','DID','CID','Event'])['TimeStamp']
         .agg([('Last-DT','last'), ('First-DT','first')])
         .unstack()
         .sort_index(axis=1, level=1))
df1.columns = [f'Event-{b}-{a}' for a, b in df1.columns]
#print (df1)
对于第一次和最后一次出现,请使用不带过滤器的第一个解决方案,groupby中没有
事件,也没有
取消堆栈

df2 = (df.groupby(['OID','DID','CID'])['TimeStamp']
         .agg([('FirstTimeUsage','first'), ('LastTime Usage','last')]))
#print (df2)
最后通过以下方式连接:

df = df1.join(df2).reset_index()
print (df)
   OID    DID  CID         Event-QBT-First-DT    Event-QBT-Last-DT  \
0    0  25078   14  2019-10-15 10:54:35 +0000  2019-10-15T16:02:16   

          Event-SBT-First-DT    Event-SBT-Last-DT             FirstTimeUsage  \
0  2019-10-15 10:54:52 +0000  2019-10-15T15:29:02  2019-10-15 10:54:35 +0000   

        LastTime Usage  
0  2019-10-15T16:02:16  
编辑:为了处理下一列,稍微修改生成
df1

L = ['QBT','SBT']

df1 = (df[df['Event'].isin(L)]
         .groupby(['OID','DID','CID','Event'])['TimeStamp']
         .agg([('Last-DT','last'), ('First-DT','first')])
         .unstack()
         .sort_index(axis=1, level=1))
df1.columns = [f'Event-{b}-{a}' for a, b in df1.columns]
#print (df1)

L2 = ['ECS']
df11 = (df[df['Source'].isin(L2)]
         .groupby(['OID','DID','CID','Source'])['TimeStamp']
         .agg([('Last-DT','last'), ('First-DT','first')])
         .unstack()
         .sort_index(axis=1, level=1))
df11.columns = [f'Screen-{b}-{a}' for a, b in df11.columns]

df2 = (df.groupby(['OID','DID','CID'])['TimeStamp']
         .agg([('FirstTimeUsage','first'), ('LastTime Usage','last')]))
最后一次使用
concat

df = pd.concat([df1, df11, df2], axis=1).reset_index()
print (df)
   OID    DID  CID         Event-QBT-First-DT    Event-QBT-Last-DT  \
0    0  25078   14  2019-10-15 10:54:35 +0000  2019-10-15T16:02:16   

          Event-SBT-First-DT    Event-SBT-Last-DT        Screen-ECS-First-DT  \
0  2019-10-15 10:54:52 +0000  2019-10-15T15:29:02  2019-10-15 11:00:01 +0000   

          Screen-ECS-Last-DT             FirstTimeUsage       LastTime Usage  
0  2019-10-15 11:00:50 +0000  2019-10-15 10:54:35 +0000  2019-10-15T16:02:16  

您还可以创建一个函数并使用
apply
在数据帧上运行,如下所示:

+-------+-----+---------------------+--------------------+---------------------+--------------------+---------------------------+---------------------------+---------------------------+---------------------+
|  DID  | CID |  Event-QBT-Last-DT  | Event-QBT-First-DT |  Event-SBT-Last-DT  | Event-SBT-First-DT |    Screen-ECS-First-DT    |    Screen-ECS-Last-DT     |      FirstTimeUsage       |   LastTime Usage    |
+-------+-----+---------------------+--------------------+---------------------+--------------------+---------------------------+---------------------------+---------------------------+---------------------+
| 25078 |  14 | 2019-10-15T16:02:16 | 10/15/19 10:54 AM  | 2019-10-15T15:29:02 | 10/15/19 10:54 AM  | 2019-10-15 11:00:01 +0000 | 2019-10-15 11:00:50 +0000 | 2019-10-15 10:54:35 +0000 | 2019-10-15T16:02:16 |
+-------+-----+---------------------+--------------------+---------------------+--------------------+---------------------------+---------------------------+---------------------------+---------------------+

def f(x):
    first_used_DT=x.min()['TimeStamp']
    last_used_DT=x.max()['TimeStamp']
    first_ECS=x[x['Source']=='ECS'].min()['TimeStamp']
    last_ECS=x[x['Source']=='ECS'].max()['TimeStamp']
    last_OBT=x[x['Event']=='QBT'].max()['TimeStamp']
    first_QBT=x[x['Event']=='QBT'].min()['TimeStamp']
    last_SBT=x[x['Event']=='SBT'].max()['TimeStamp']
    first_SBT=x[x['Event']=='SBT'].min()['TimeStamp']

    return pd.DataFrame({'FirstTimeUsage': first_used_DT, 'LastTime Usage': last_used_DT,
                         'Screen-ECS-First-DT':first_ECS,'Screen-ECS-Last-DT':last_ECS
                        'Event-QBT-First-DT':first_QBT, 'Event-QBT-Last-DT':last_OBT,
                         'Event-SBT-First-DT':first_SBT, 'Event-SBT-Last-DT':last_SBT
                         }, index=[0])

可能会稍微慢一点,但完成了任务。

到目前为止您尝试了什么?谢谢您的回答,但我对我的问题进行了编辑。我还想从
屏幕
columnHi@jezrael中找到一个条目,快速脱离主题的问题。知道如何将多个索引作为方法链的一部分展平为单个索引吗?除了
df.columns=
@MarkWang-hmm,我还没有找到解决方案,明白了,需要什么,但似乎还不存在。在我的头脑中,它应该是类似
df.set_axis
的可调用解决方案,比如
assign
@mu S\N我得到了,需要groupby,因为每个小组工作。这个解决方案实际上是从@jezrael的其他帖子中得到启发的。