在R中查找具有条件限制的时间戳滚动计数背景_R_Datetime_Dplyr_Data.table_Rolling Computation

在R中查找具有条件限制的时间戳滚动计数背景

r datetime

在R中查找具有条件限制的时间戳滚动计数背景,r,datetime,dplyr,data.table,rolling-computation,R,Datetime,Dplyr,Data.table,Rolling Computation,我有两张桌子。一个是包含时间戳（charttime）的实验室值表。另一个是包含药物的表格，其中包含starttime和endtime（分别用于药物的给药时间和结束时间）。还有一个受试者id，它是每个患者的唯一id，还有一个与住院患者相关的“住院”id（hadm\u id）。同一患者可以有多个hadmi\u ids 目标目标是在给定药物的开始时间之前或在给定药物剂量之前24小时内获得LabValue（charttime）的量。如果可能的话，我也希望在前进方向上做同样的事情，但我只是从一个方向开始

我有两张桌子。一个是包含时间戳（

charttime

）的实验室值表。另一个是包含药物的表格，其中包含

starttime

和

endtime

（分别用于药物的给药时间和结束时间）。还有一个

受试者id

，它是每个患者的唯一id，还有一个与住院患者相关的“住院”id（

hadm\u id

）。同一患者可以有多个

hadmi\u id

目标目标是在给定药物的

开始时间之前或在给定药物剂量之前24小时内获得LabValue（charttime
）的量。如果可能的话，我也希望在前进方向上做同样的事情，但我只是从一个方向开始。为了更清晰，我基本上尝试从最底部的图像（在24小时内有多个实验室值，而不是一个实验室值）区分场景B和场景C
如果有人有一个使用data.table包的解决方案，我非常乐意接受，因为我认为这是一天结束时更高效、更优雅的解决方案。然而，我对dplyr有更多的经验，所以我首先尝试了这种方法
试过什么
在上一次试验中，我成功地获得了给定药物开始时间
和结束时间前后的最新实验室值。本质上，我做了一个笛卡尔连接，并使用分组和过滤语句过滤掉无关的值。
初始数据帧和输出的示例如下所示
下面是我尝试在服用前一次药物（或24小时）之前选择所有值的，而不仅仅是第一次最近的值
LabEvents示例
    subject_id hadm_id valuenum           charttime
 1:       7216  109208      3.8 2156-09-20 04:00:00
 2:       7216  109208      3.7 2156-09-21 04:00:00
 3:       7216  109208      3.5 2156-09-21 04:00:00
 4:       7216  109208      4.4 2156-09-22 04:00:00
 5:       7216  109208      3.3 2156-09-23 04:00:00
 6:       7216  109208      3.5 2156-09-24 04:00:00
 7:       7216  109208      3.1 2156-09-25 04:00:00
 8:       7216  109208      3.8 2156-09-26 04:00:00
 9:       7216  109208      3.8 2156-09-27 04:00:00
10:       7216  109208      3.2 2156-09-28 04:00:00

    subject_id hadm_id linkorderid           starttime             endtime
1:       7216  109208     5810095 2156-09-23 10:00:00 2156-09-23 11:00:00
2:       7216  109208     1068514 2156-09-23 11:45:00 2156-09-23 12:45:00


repEventsKExample %>% 
  inner_join(labEventsKExample, by=c("subject_id" = "subject_id", "hadm_id" = "hadm_id")) %>%
  distinct() %>%
  rename(charttime.lab = charttime) %>%
  collect() -> k_lab_repletions_MV_new_example

k_lab_repletions_MV_new_example %>%
  mutate(isRecentPre = difftime(starttime, charttime.lab, units = "hours") <= 24 & difftime(starttime, charttime.lab, units = "hours") > 0 ) %>%
  mutate(isRecentPost = difftime(endtime, charttime.lab, units = "hours") >= -24 & difftime(endtime, charttime.lab, units = "hours") < 0 )  -> Rep.LE.joined_example 

Rep.LE.joined_example %>%
  filter(isRecentPre) %>% 
  group_by(subject_id, hadm_id,charttime.lab) %>%
  mutate(isMostRecentRepletion = starttime == min(starttime)) %>%
  filter(isMostRecentRepletion) %>%
  ungroup() %>% 
  group_by(subject_id, hadm_id, starttime,endtime) %>%
  arrange(subject_id,starttime) %>%
  mutate(isMostRecentLabEvent = charttime.lab == max(charttime.lab)) %>%
  mutate(recentPreLVs = charttime.lab > dplyr::lag(starttime)) %>%
  filter(recentPreLVs == TRUE)

重复的例子
    subject_id hadm_id valuenum           charttime
 1:       7216  109208      3.8 2156-09-20 04:00:00
 2:       7216  109208      3.7 2156-09-21 04:00:00
 3:       7216  109208      3.5 2156-09-21 04:00:00
 4:       7216  109208      4.4 2156-09-22 04:00:00
 5:       7216  109208      3.3 2156-09-23 04:00:00
 6:       7216  109208      3.5 2156-09-24 04:00:00
 7:       7216  109208      3.1 2156-09-25 04:00:00
 8:       7216  109208      3.8 2156-09-26 04:00:00
 9:       7216  109208      3.8 2156-09-27 04:00:00
10:       7216  109208      3.2 2156-09-28 04:00:00

    subject_id hadm_id linkorderid           starttime             endtime
1:       7216  109208     5810095 2156-09-23 10:00:00 2156-09-23 11:00:00
2:       7216  109208     1068514 2156-09-23 11:45:00 2156-09-23 12:45:00


repEventsKExample %>% 
  inner_join(labEventsKExample, by=c("subject_id" = "subject_id", "hadm_id" = "hadm_id")) %>%
  distinct() %>%
  rename(charttime.lab = charttime) %>%
  collect() -> k_lab_repletions_MV_new_example

k_lab_repletions_MV_new_example %>%
  mutate(isRecentPre = difftime(starttime, charttime.lab, units = "hours") <= 24 & difftime(starttime, charttime.lab, units = "hours") > 0 ) %>%
  mutate(isRecentPost = difftime(endtime, charttime.lab, units = "hours") >= -24 & difftime(endtime, charttime.lab, units = "hours") < 0 )  -> Rep.LE.joined_example 

Rep.LE.joined_example %>%
  filter(isRecentPre) %>% 
  group_by(subject_id, hadm_id,charttime.lab) %>%
  mutate(isMostRecentRepletion = starttime == min(starttime)) %>%
  filter(isMostRecentRepletion) %>%
  ungroup() %>% 
  group_by(subject_id, hadm_id, starttime,endtime) %>%
  arrange(subject_id,starttime) %>%
  mutate(isMostRecentLabEvent = charttime.lab == max(charttime.lab)) %>%
  mutate(recentPreLVs = charttime.lab > dplyr::lag(starttime)) %>%
  filter(recentPreLVs == TRUE)

subject\u id hadm\u id linkorderid starttime endtime
1:       7216  109208     5810095 2156-09-23 10:00:00 2156-09-23 11:00:00
2:       7216  109208     1068514 2156-09-23 11:45:00 2156-09-23 12:45:00
重复示例%>%
内部联接（labeventske示例，by=c（“subject\u id”=“subject\u id”，“hadm\u id”=“hadm\u id”））%>%
不同的（）%>%
重命名（charttime.lab=charttime）%>%
collect（）->k_lab_repletions_MV_new_示例
k_lab_repletions_MV_new_示例%>%
突变（isRecentPre=difftime（starttime，charttime.lab，units=“hours”）0）%>%
mutate（isRecentPost=difftime（endtime，charttime.lab，units=“hours”）>=-24&difftime（endtime，charttime.lab，units=“hours”）<0）->Rep.LE.join\u示例
Rep.LE.joined_示例%>%
过滤器（isRecentPre）%%>%
分组依据（受试者id、hadm id、charttime.lab）%>%
突变（isMostRecentRepletion=starttime==min（starttime））%>%
过滤器（isMostRecentRepletion）%>%
解组（）%>%
分组依据（受试者id、hadm id、开始时间、结束时间）%>%
安排（受试者id，开始时间）%>%
突变（isMostRecentLabEvent=charttime.lab==max（charttime.lab））%>%
突变（recentPreLVs=charttime.lab>dplyr:：lag（starttime））%>%
过滤器（recentPreLVs==TRUE）

资料
下面是一些用于尝试join方法的玩具数据
structure(list(subject_id = c(7216L, 7216L, 7216L, 7216L, 7216L, 
7216L, 7216L, 7216L, 7216L, 7216L, 7216L, 7216L, 7216L, 7216L, 
7216L, 7216L, 7216L, 7216L, 7216L, 7216L, 7216L, 7216L, 7216L, 
7216L, 7216L, 7216L, 7216L), hadm_id = c(109208L, 109208L, 109208L, 
109208L, 109208L, 109208L, 109208L, 109208L, 109208L, 109208L, 
109208L, 109208L, 109208L, 109208L, 109208L, 109208L, 109208L, 
132876L, 132876L, 132876L, 132876L, 132876L, 132876L, 132876L, 
132876L, 132876L, 132876L), valuenum = c(3.8, 3.7, 3.5, 4.4, 
3.3, 3.5, 3.1, 3.8, 3.8, 3.2, 4.4, 4.1, 4.5, 4.1, 4, 4, 3.8, 
3.8, 3.7, 3.1, 3.4, 3.6, 3.5, 3.8, 3, 3.3, 3.1), charttime = structure(c(5892321600, 
5892408000, 5892408000, 5892494400, 5892580800, 5892667200, 5892753600, 
5892840000, 5892926400, 5893012800, 5893012800, 5893099200, 5893099200, 
5893185600, 5893185600, 5893272000, 5893358400, 5817499200, 5817585600, 
5817585600, 5817672000, 5817672000, 5817758400, 5817844800, 5817931200, 
5818017600, 5818104000), tzone = "UTC", class = c("POSIXct", 
"POSIXt"))), row.names = c(NA, -27L), class = c("data.table", 
"data.frame"), .internal.selfref = <pointer: 0x7f87b1017ee0>) -> 

structure(list(subject_id = c(7216L, 7216L), hadm_id = c(109208L, 
109208L), linkorderid = c(5810095L, 1068514L), starttime = structure(c(5892602400, 
5892608700), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
    endtime = structure(c(5892606000, 5892612300), class = c("POSIXct", 
    "POSIXt"), tzone = "UTC")), row.names = c(NA, -2L), class = c("data.table", 
"data.frame"), .internal.selfref = <pointer: 0x7f87b1017ee0>) -> repEventsKExample

结构（列表）（受试者id=c（7216L、7216L、7216L、7216L、7216L、7216L、，
7216L，7216L，7216L，7216L，7216L，7216L，7216L，7216L，7216L，7216L，
7216L，7216L，7216L，7216L，7216L，7216L，7216L，7216L，7216L，7216L，
7216L，7216L，7216L，7216L），hadm_id=c（109208L，109208L，109208L，
109208L、109208L、109208L、109208L、109208L、109208L、109208L、109208L、，
109208L、109208L、109208L、109208L、109208L、109208L、109208L、109208L、，
132876L、132876L、132876L、132876L、132876L、132876L、132876L、，
132876L，132876L，132876L），valuenum=c（3.8,3.7,3.5,4.4，
3.3, 3.5, 3.1, 3.8, 3.8, 3.2, 4.4, 4.1, 4.5, 4.1, 4, 4, 3.8, 
3.8,3.7,3.1,3.4,3.6,3.5,3.8,3,3.3,3.1），图表时间=结构（c（5892321600，
5892408000, 5892408000, 5892494400, 5892580800, 5892667200, 5892753600, 
5892840000, 5892926400, 5893012800, 5893012800, 5893099200, 5893099200, 
5893185600, 5893185600, 5893272000, 5893358400, 5817499200, 5817585600, 
5817585600, 5817672000, 5817672000, 5817758400, 5817844800, 5817931200, 
58180176005818104000），tzone=“UTC”，class=c（“POSIXct”，
“POSIXt”）），row.names=c（NA，-27L），class=c（“data.table”，
“data.frame”），.internal.selfref=）->
结构（列表（主题id=c（7216L，7216L），hadm_id=c（109208L，
109208L），linkorderid=c（5810095L，1068514L），starttime=structure（c（5892602400，
5892608700），class=c（“POSIXct”、“POSIXt”），tzone=“UTC”），
endtime=structure（c（5892606000，5892612300），class=c（“POSIXct”，
“POSIXt”）、tzone=“UTC”）、row.names=c（NA，-2L）、class=c（“data.table”，
“data.frame”），.internal.selfref=）->repEventsKExample

这里有一个使用foverlaps（）的解决方案。这种foverlaps
的实现并不是实现目标的最短途径，但有时它可以帮助您查看长格式的解决方案并从中进行缩减
我使用lubridate只是为了创建玩具数据
    library(lubridate)
    
    firsttime <- as.POSIXct(today() + hours(1))
    times <- firsttime + hours(1:96)
    data <- data.table(subject_id = 7216,
                       charttimes = times,
                       valuenums = sample(c(3.8, 3.7, 3.5, 4.4, 3.3, 3.5, 3.1, 3.8, 3.8,
                                            3.2, 4.4, 4.1, 4.5, 4.1, 4, 4, 3.8, 3.8, 3.7,
                                            3.1, 3.4, 3.6, 3.5, 3.8, 3, 3.3, 3.1),
                                          replace = TRUE,
                                  size = 96))

    # Pick a random point to act as the reference, around which we want a window
    # of 24 hours on either side
    ref <- data[40]

    # We create the start and end window as variables in the reference data 
    ref[, start_window := charttimes - hours(24)]
    ref[, end_window := charttimes + hours(24)]

    # And we need a duplicate of the chart times in the data for foverlap
    data[, charttimes_dup := charttimes]

    # Set the keys, including subject_id and the duplicate chart times
    setkey(data, subject_id, charttimes, charttimes_dup)
    setkey(ref, subject_id, start_window, end_window)

    # foverlaps returns all the matches of charttimes occurring between start
    # and end window. What you want to do with that afterwards can shorten the process.
    data_within_window <- foverlaps(ref, data)

库（lubridate）
第一次你能不能也包括玩具数据的预期输出数据/格式？你的时间戳有问题。我会用符合你的规范的数据来回答，但是你可能会考虑修复你的数据以防它不起作用。