计算R中某些事件之间的时间差
这是我几天来一直试图解决的问题。我想我可能不得不删除一些数据或其他东西,但我真的不确定。我有一些数据如下所示:计算R中某些事件之间的时间差,r,R,这是我几天来一直试图解决的问题。我想我可能不得不删除一些数据或其他东西,但我真的不确定。我有一些数据如下所示: email Action ActionType TD cnt Date_Time aaaa Company trial TD 1 10/12/14 19:17 aaaa Task Call 0 NA 10/13/14 17:00 bbbb Task Call
email Action ActionType TD cnt Date_Time
aaaa Company trial TD 1 10/12/14 19:17
aaaa Task Call 0 NA 10/13/14 17:00
bbbb Task Call 0 NA 12/9/14 16:17
bbbb Task Call 0 NA 12/9/14 16:17
bbbb Task Call 0 NA 12/10/14 16:31
bbbb Task Call 0 NA 12/12/14 16:45
bbbb Company demo TD 1 12/12/14 17:17
bbbb Event Demo TD 2 2/9/15 15:09
cccc Company trial TD 1 8/18/14 14:28
cccc Company demo TD 2 8/20/14 13:21
cccc Event Demo TD 3 2/9/15 15:08
dddd Company trial TD 1 12/14/14 0:09
eeee Company demo TD 1 8/27/14 21:57
eeee Event Demo TD 2 2/9/15 15:08
eeee Event Demo TD 3 2/9/15 15:08
ffff Company trial TD 1 3/19/14 21:15
gggg Company trial TD 1 7/30/14 18:06
hhhh Company trial TD 1 4/3/14 0:26
iiiii Company trial TD 1 5/29/14 20:10
iiiii Task Call 0 NA 5/29/14 22:01
jjjjj Task Call 0 NA 10/15/14 19:46
jjjjj Company trial TD 1 11/12/14 19:05
jjjjj Task Call 0 NA 11/12/14 19:16
jjjjj Task Call 0 NA 11/12/14 19:16
jjjjj Task Call 0 NA 11/12/14 19:31
jjjjj Task Call 0 NA 11/12/14 22:10
jjjjj Task Call 0 NA 11/13/14 19:46
jjjjj Task Call 0 NA 11/26/14 17:31
jjjjj Task Call 0 NA 11/26/14 17:31
jjjjj Task Call 0 NA 11/26/14 17:31
jjjjj Task Call 0 NA 11/26/14 17:31
kkkk Company trial TD 1 1/10/14 3:37
kkkk Task Call 0 NA 10/24/14 0:06
kkkk Task Call 0 NA 10/24/14 0:06
kkkk Task Call 0 NA 10/24/14 13:30
kkkk Company trial TD 2 10/27/14 12:45
kkkk Task Call 0 NA 1/23/15 14:31
kkkk Task Call 0 NA 1/26/15 21:15
kkkk Company Trial TD 3 1/27/15 21:15
email Action ActionType TD cnt Date_Time Time_Diff
aaaa Company trial TD 1 10/12/14 19:17
aaaa Task Call 0 NA 10/13/14 17:00
bbbb Task Call 0 NA 12/9/14 16:17
bbbb Task Call 0 NA 12/9/14 16:17 0
bbbb Task Call 0 NA 12/10/14 16:31 1 d 14 m
bbbb Task Call 0 NA 12/12/14 16:45 2 d 14 m
bbbb Company demo TD 1 12/12/14 17:17 32 m
bbbb Event Demo TD 2 2/9/15 15:09
cccc Company trial TD 1 8/18/14 14:28
cccc Company demo TD 2 8/20/14 13:21
cccc Event Demo TD 3 2/9/15 15:08
dddd Company trial TD 1 12/14/14 0:09
eeee Company demo TD 1 8/27/14 21:57
eeee Event Demo TD 2 2/9/15 15:08
eeee Event Demo TD 3 2/9/15 15:08
ffff Company trial TD 1 3/19/14 21:15
gggg Company trial TD 1 7/30/14 18:06
hhhh Company trial TD 1 4/3/14 0:26
iiiii Company trial TD 1 5/29/14 20:10
iiiii Task Call 0 NA 5/29/14 22:01
jjjjj Task Call 0 NA 10/15/14 19:46
jjjjj Company trial TD 1 11/12/14 19:05 27 d, 23 h, 19 m
jjjjj Task Call 0 NA 11/12/14 19:16
jjjjj Task Call 0 NA 11/12/14 19:16
jjjjj Task Call 0 NA 11/12/14 19:31
jjjjj Task Call 0 NA 11/12/14 22:10
jjjjj Task Call 0 NA 11/13/14 19:46
jjjjj Task Call 0 NA 11/26/14 17:31
jjjjj Task Call 0 NA 11/26/14 17:31
jjjjj Task Call 0 NA 11/26/14 17:31
jjjjj Task Call 0 NA 11/26/14 17:31
kkkk Company trial TD 1 1/10/14 3:37
kkkk Task Call 0 NA 10/24/14 0:06
kkkk Task Call 0 NA 10/24/14 0:06 0
kkkk Task Call 0 NA 10/24/14 13:30 13 h, 24 m
kkkk Company trial TD 2 10/27/14 12:45 2 d, 23 h, 15 m
kkkk Task Call 0 NA 1/23/15 14:31
kkkk Task Call 0 NA 1/26/15 21:15 3 d, 6 h, 44 m
kkkk Company trial TD 3 1/27/15 21:15 1 d
目标是计算演示或试用与以前调用之间的时间差。例如,我需要通过电子邮件地址找到第一个演示/试用,然后回过头来计算演示/试用与之前的通话之间的差异,然后计算该通话与之前的通话之间的差异,依此类推
我不关心第一次演示/试用后的任何呼叫,除非在几次呼叫后有另一次演示/试用,然后流程应在第二次演示/试用时再次启动,并计算第二次演示/试用与以前呼叫之间的差异。我有一列“TD”来表示该行有一个演示/试用。“cnt”列是该电子邮件地址中出现的TD的编号。例如,如果同一封电子邮件有两次背对背的试用,则该电子邮件地址的“cnt”列中会出现1和2
因此,基本上我希望数据如下所示:
email Action ActionType TD cnt Date_Time
aaaa Company trial TD 1 10/12/14 19:17
aaaa Task Call 0 NA 10/13/14 17:00
bbbb Task Call 0 NA 12/9/14 16:17
bbbb Task Call 0 NA 12/9/14 16:17
bbbb Task Call 0 NA 12/10/14 16:31
bbbb Task Call 0 NA 12/12/14 16:45
bbbb Company demo TD 1 12/12/14 17:17
bbbb Event Demo TD 2 2/9/15 15:09
cccc Company trial TD 1 8/18/14 14:28
cccc Company demo TD 2 8/20/14 13:21
cccc Event Demo TD 3 2/9/15 15:08
dddd Company trial TD 1 12/14/14 0:09
eeee Company demo TD 1 8/27/14 21:57
eeee Event Demo TD 2 2/9/15 15:08
eeee Event Demo TD 3 2/9/15 15:08
ffff Company trial TD 1 3/19/14 21:15
gggg Company trial TD 1 7/30/14 18:06
hhhh Company trial TD 1 4/3/14 0:26
iiiii Company trial TD 1 5/29/14 20:10
iiiii Task Call 0 NA 5/29/14 22:01
jjjjj Task Call 0 NA 10/15/14 19:46
jjjjj Company trial TD 1 11/12/14 19:05
jjjjj Task Call 0 NA 11/12/14 19:16
jjjjj Task Call 0 NA 11/12/14 19:16
jjjjj Task Call 0 NA 11/12/14 19:31
jjjjj Task Call 0 NA 11/12/14 22:10
jjjjj Task Call 0 NA 11/13/14 19:46
jjjjj Task Call 0 NA 11/26/14 17:31
jjjjj Task Call 0 NA 11/26/14 17:31
jjjjj Task Call 0 NA 11/26/14 17:31
jjjjj Task Call 0 NA 11/26/14 17:31
kkkk Company trial TD 1 1/10/14 3:37
kkkk Task Call 0 NA 10/24/14 0:06
kkkk Task Call 0 NA 10/24/14 0:06
kkkk Task Call 0 NA 10/24/14 13:30
kkkk Company trial TD 2 10/27/14 12:45
kkkk Task Call 0 NA 1/23/15 14:31
kkkk Task Call 0 NA 1/26/15 21:15
kkkk Company Trial TD 3 1/27/15 21:15
email Action ActionType TD cnt Date_Time Time_Diff
aaaa Company trial TD 1 10/12/14 19:17
aaaa Task Call 0 NA 10/13/14 17:00
bbbb Task Call 0 NA 12/9/14 16:17
bbbb Task Call 0 NA 12/9/14 16:17 0
bbbb Task Call 0 NA 12/10/14 16:31 1 d 14 m
bbbb Task Call 0 NA 12/12/14 16:45 2 d 14 m
bbbb Company demo TD 1 12/12/14 17:17 32 m
bbbb Event Demo TD 2 2/9/15 15:09
cccc Company trial TD 1 8/18/14 14:28
cccc Company demo TD 2 8/20/14 13:21
cccc Event Demo TD 3 2/9/15 15:08
dddd Company trial TD 1 12/14/14 0:09
eeee Company demo TD 1 8/27/14 21:57
eeee Event Demo TD 2 2/9/15 15:08
eeee Event Demo TD 3 2/9/15 15:08
ffff Company trial TD 1 3/19/14 21:15
gggg Company trial TD 1 7/30/14 18:06
hhhh Company trial TD 1 4/3/14 0:26
iiiii Company trial TD 1 5/29/14 20:10
iiiii Task Call 0 NA 5/29/14 22:01
jjjjj Task Call 0 NA 10/15/14 19:46
jjjjj Company trial TD 1 11/12/14 19:05 27 d, 23 h, 19 m
jjjjj Task Call 0 NA 11/12/14 19:16
jjjjj Task Call 0 NA 11/12/14 19:16
jjjjj Task Call 0 NA 11/12/14 19:31
jjjjj Task Call 0 NA 11/12/14 22:10
jjjjj Task Call 0 NA 11/13/14 19:46
jjjjj Task Call 0 NA 11/26/14 17:31
jjjjj Task Call 0 NA 11/26/14 17:31
jjjjj Task Call 0 NA 11/26/14 17:31
jjjjj Task Call 0 NA 11/26/14 17:31
kkkk Company trial TD 1 1/10/14 3:37
kkkk Task Call 0 NA 10/24/14 0:06
kkkk Task Call 0 NA 10/24/14 0:06 0
kkkk Task Call 0 NA 10/24/14 13:30 13 h, 24 m
kkkk Company trial TD 2 10/27/14 12:45 2 d, 23 h, 15 m
kkkk Task Call 0 NA 1/23/15 14:31
kkkk Task Call 0 NA 1/26/15 21:15 3 d, 6 h, 44 m
kkkk Company trial TD 3 1/27/15 21:15 1 d
时差的格式对我来说并不重要 这种数据操作在SQL中可能更容易实现。要在R中执行此操作,您需要在dataframe上使用data.table 下面的解决方案不是最优雅的,但它应该可以扩展。也许它会给你一个想法,如何做到这一点,而不是像我一样创建一堆新的专栏。最坏的情况是,您可以将其放入一个循环中,直到涵盖所有TD 基本上,我只是跨行执行了一系列条件语句
setkey(dt,email)
dt[ActionType=="Call",call_times:=Date_Time] #Field with call times only for taking mins
dt[TD=="TD",TDtime:=Date_Time] # same thing with TD
dt[,first_call:=min(call_times,na.rm=TRUE),by=email] # date time of first call for all records from an email
legit<-unique(dt[TDtime>first_call,email]) # only keeping records for emails where there was a TD after the first call
dt<-dt[.(legit)]
dt<-dt[Date_Time>first_call|ActionType=="Call"] # also removing TDs that happened before first call
dt[,first_TD:=min(TDtime,na.rm=TRUE),by=email] # same with TD
dt[call_times>first_TD,call_times_2:=Date_Time] #find all calls after the first TD
dt[,second_call:=min(call_times_2,na.rm = TRUE),by=email] #find the time of the first call after the first TD
dt[TDtime>second_call,TDtimes_2:=Date_Time] #find all TDs after the second group of calls
dt[,second_TD:=min(TDtimes_2,na.rm=TRUE),by=email] #find the first TD after second group of calls starts
dt[Date_Time<=first_TD,call_group:=1] # group calls
dt[Date_Time>first_TD&Date_Time<=second_TD&second_TD!=Inf,call_group:=2]
dt[!is.na(call_group),time_diff:=c(0,(diff(as.numeric(Date_Time))/3600)),by=.(email,call_group)] #find lagged differences between the call times within each call group. (in hours)
dt[!is.na(time_diff),.(email,ActionType,Date_Time,time_diff)]
我有一些代码将通过电子邮件计算所有行之间的差异,但我不确定如何删除在有试用/演示且仅在之后调用时的差异。或者删除同一电子邮件组中试用版和演示版之间的差异。很好!这是相当光滑,给了我确切的我想要的!