在R中:根据时间段条件连接两个数据帧
作为R的新手,我正试图通过考虑一个时间段条件来合并两个数据帧在R中:根据时间段条件连接两个数据帧,r,dataframe,merge,dplyr,R,Dataframe,Merge,Dplyr,作为R的新手,我正试图通过考虑一个时间段条件来合并两个数据帧 df1 <- data.frame("first_event" = c("4f7d", "a10a", "e79b"), "second_event" = c("9346","a839", "d939"), "device_serial" = c("123","123","123") , "start_timestamp" = c("2019-12-06 11:47:0", "2019-09-06 11:47:0", "2019
df1 <- data.frame("first_event" = c("4f7d", "a10a", "e79b"), "second_event" = c("9346","a839", "d939"), "device_serial" = c("123","123","123") , "start_timestamp" = c("2019-12-06 11:47:0", "2019-09-06 11:47:0", "2019-09-05 10:00:00"),"end_timestamp" = c("2020-01-10 12:59:38", "2019-11-22 12:06:28", "2019-11-22 12:06:28"), "exp_id" = NA)
df2 <- data.frame("device_serial" = c("123","123") , exp_id= c("a","b") , start_timestamp = c("2019-12-03 07:12:20", "2019-09-04 10:00:00") , end_timestamp = c("2020-01-17 00:05:10", NULL) , current_event_id = c("1", "2") ,current_event_timestamp= c("2020-01-17 00:05:09", "2020-01-17 00:05:09"))
我要查找的结果是一个类似以下df3的表:
感谢您阅读此问题并帮助我解决它。如果我理解正确,这里有一些建议 首先是您的数据,需要进行一些编辑: 根据@r2evans的评论,我假设空值应该是 纳乌雷亚尔 第一个数据块中df2的当前\u事件\u时间戳 代码与您在第二个块中键入的代码不匹配;我曾经 从第二个区块开始的日期时间,因为它导致了您的答案 寻找 df1% as_tibbledf1%>%转换为tibble;打印每列的数据类型 选择-exp\u id,evnt\u start=start\u timestamp,evnt\u end=end\u timestamp%>%删除exp\u id不是必需的,会弄乱连接并更改时间列的名称。 mutateevnt_start=as_datetimeevnt_start,将时间列转换为datetime类型 evnt\U end=作为日期时间evnt\U end df1 一个tibble:3x5 第一个事件第二个事件设备串行evnt启动evnt结束 1 4f7d 9346 123 2019-12-06 11:47:00 2020-01-10 12:59:38 2 a10a a839 123 2019-09-06 11:47:00 2019-11-22 12:06:28 3 e79b d939 123 2019-09-05 10:00:00 2019-11-22 12:06:28 df2% 作为tibbledf2%>%转换为tibble 重命名exp\u start=start\u时间戳,exp\u end=end\u时间戳%>%更改时间列的名称 mutate_at.vars=cexp_start,exp_end,current_event_timestamp,~as_datetime。将时间列从factor转换为datetime类型 df2 一个tibble:3x8 第一个\u事件第二个\u事件设备\u串行evnt\u启动evnt\u结束exp\u id exp\u启动exp\u结束\u或\u当前 1 4f7d 9346 123 2019-12-06 11:47:00 2020-01-10 12:59:38 a 2019-12-03 07:12:20 2020-01-17 00:05:10 2 a10a a839 123 2019-09-06 11:47:00 2019-11-22 12:06:28 b 2019-09-04 10:00:00 2019-11-23 12:06:28 3 e79b d939 123 2019-09-05 10:00:00 2019-11-22 12:06:28 2019-09-04 10:00:00 2019-11-23 12:06:28
dplyr不在时间范围上进行联接,但data.table使用foverlaps或不等式合并进行联接。为了优雅且性能合理,我建议使用data.table,至少对于这个merging.BTW,您的df2$end_时间戳中不应该有NULL。结果是,由于该向量现在已被长度1 null删除,data.frame很高兴地将其带到所有2行的列中,这几乎肯定不是您想要的。你的意思是用NA吗?这是一种解决我的问题的优雅方法,写得并不好。谢谢:@Soren,很乐意帮忙!管理、争论和解释!日期时间数据可能是一个难题。
>df1
first_event second_event device_serial start_timestamp end_timestamp exp_id
4f7d 9346 123 2019-12-06 11:47:0 2020-01-10 12:59:38 NA
a10a a839 123 2019-09-06 11:47:0 2019-11-22 12:06:28 NA
e79b d939 123 "2019-09-05 10:00:00" "2019-11-22 12:06:28") NA
>df2
device_serial exp_id start_timestamp end_timestamp current_event_id current_event_timestamp
123 a 2019-12-03 07:12:20 2020-01-17 00:05:10 1 2020-01-17 00:05:09
123 b 2019-09-04 10:00:00 NULL 2 2019-11-23 12:06:28
>df3
first_event second_event device_serial start_timestamp end_timestamp exp_id
4f7d 9346 123 2019-12-06 11:47:0 2020-01-10 12:59:38 a
a10a a839 123 2019-09-06 11:47:0 2019-11-22 12:06:28 b
e79b d939 123 "2019-09-05 10:00:00" "2019-11-22 12:06:28") b
df1 <- data.frame("first_event" = c("4f7d", "a10a", "e79b"),
"second_event" = c("9346","a839", "d939"),
"device_serial" = c("123","123","123") ,
"start_timestamp" = c("2019-12-06 11:47:0", "2019-09-06 11:47:0", "2019-09-05 10:00:00"),
"end_timestamp" = c("2020-01-10 12:59:38", "2019-11-22 12:06:28", "2019-11-22 12:06:28"),
"exp_id" = NA)
df2 <- data.frame("device_serial" = c("123","123") ,
exp_id= c("a","b") ,
start_timestamp = c("2019-12-03 07:12:20", "2019-09-04 10:00:00") ,
end_timestamp = c("2020-01-17 00:05:10", NA_real_) ,
current_event_id = c("1", "2") ,
current_event_timestamp= c("2020-01-17 00:05:09", "2019-11-23 12:06:28"))
# A tibble: 2 x 6
device_serial exp_id exp_start exp_end current_event_id current_event_timestamp
<fct> <fct> <dttm> <dttm> <fct> <dttm>
1 123 a 2019-12-03 07:12:20 2020-01-17 00:05:10 1 2020-01-17 00:05:09
2 123 b 2019-09-04 10:00:00 NA 2 2019-11-23 12:06:28