根据R中的日期类型确定重叠的日期间隔_R_Intervals_Lubridate

根据R中的日期类型确定重叠的日期间隔

根据R中的日期类型确定重叠的日期间隔,r,intervals,lubridate,R,Intervals,Lubridate,我有一个简单的数据集，其中包含医疗机构的“日期输入”和“日期输出”，以及每个患者的日期类型（住院、门诊和感染期）。我需要确定一名患者是否与另一名患者的感染期重叠。我通常可以使用lubridate程序包的interval和int\u overlaps功能来实现这一点我的具体问题是当有多个感染期重叠时。我使用R.代码复制样本数据，下图所示我想用逻辑T/F标记每次就诊，如果它在感染期的间隔内。下图可能有助于可视化这些数据。红色矩形表示住院，红色圆圈表示门诊。紫色是患者住院期间的感染期。仅应标记与紫

我有一个简单的数据集，其中包含医疗机构的“日期输入”和“日期输出”，以及每个患者的日期类型（住院、门诊和感染期）。我需要确定一名患者是否与另一名患者的感染期重叠。我通常可以使用

lubridate

程序包的

interval

和

int\u overlaps

功能来实现这一点我的具体问题是当有多个感染期重叠时。

我使用R.代码复制样本数据，下图所示

我想用逻辑T/F标记每次就诊，如果它在感染期的间隔内。下图可能有助于可视化这些数据。红色矩形表示住院，红色圆圈表示门诊。紫色是患者住院期间的感染期。仅应标记与紫色间隔重叠的住院/门诊就诊（即逻辑

TRUE

或

FALSE

）。理想情况下，不会标记导致感染期的患者（即长期住院

K00005

将返回

），但如果这会导致并发症，我可以解决这个问题

我试过：

library(tidyverse); library(lubridate);

test <- have %>% mutate(Int=interval(datein, dateout),
                        overlaps=map(seq_along(Int), function(x){
                                      y=setdiff(seq_along(Int),x)
                                      return(any(int_overlaps(Int[x],Int[y])))
                                      }))

这是一种使用非常基本的for循环的freshcode Base R方法（无库）。如果患者在感染期间入住（

在感染期间开始），或在感染期间离开（在感染期间结束），或在感染期间开始和结束（在感染期间结束）它应该将重叠标记为TRUE
infectious_periods <- have[which(have$datetype=="Infectious Period"),]
have$overlap <- FALSE # initializes a new column

for(i in 1:nrow(have)){
  if(have$datetype[i] != "Infectious Period"){
    started_during <- any(have$datein[i] >= infectious_periods$datein & 
                            have$datein[i] <= infectious_periods$dateout)
    ended_during <- any(have$dateout[i] >= infectious_periods$datein & 
                          have$dateout[i] <= infectious_periods$dateout)
    in_during <- any(have$datein[i] >= infectious_periods$datein & 
                       have$dateout[i] <= infectious_periods$dateout)
    if(started_during | ended_during | in_during){
        have$overlap[i] <- TRUE
      }
  }
}
have$overlap
# A tibble: 44 x 6
#   id     datetype   datein     dateout    color   overlap
#   <chr>  <chr>      <date>     <date>     <chr>   <lgl>  
# 1 K00005 Inpatient  2018-01-11 2018-07-21 #DD4B39 TRUE   
# 2 K52253 Outpatient 2018-01-13 2018-01-13 #DD4B39 TRUE   
# 3 K32022 Inpatient  2018-01-25 2018-01-29 #DD4B39 TRUE   
# 4 K20113 Outpatient 2018-01-28 2018-01-28 #DD4B39 TRUE   
# 5 K52253 Outpatient 2018-02-24 2018-02-24 #DD4B39 FALSE  
# 6 K00164 Outpatient 2018-03-12 2018-03-12 #DD4B39 FALSE  
# 7 K00164 Outpatient 2018-03-18 2018-03-18 #DD4B39 FALSE  
# 8 K10003 Outpatient 2018-04-02 2018-04-02 #DD4B39 FALSE  
# 9 K00347 Outpatient 2018-04-05 2018-04-05 #DD4B39 TRUE   
#10 K00046 Inpatient  2018-04-05 2018-04-17 #DD4B39 TRUE  
# ... with 34 more rows

infective\u periods这既漂亮又完美。为了让您更复杂，这些数据只是同一数据集中几百个数据中的一组，因此还有一个额外的变量ovgroup
（这些有数据是一个ovgroup
）-我通常会做realdata%>%groupby（ovgroup）%%>%…
但为了附加此基本代码，我将您的代码包装在`for（j in unique（realdata$ovgroup）{…您的代码…}。这是最好的方法吗？另外，我真的很喜欢你的身材，所以非常感谢你的这些。太棒了！很高兴这很有帮助。你关于与团队合作的最佳方法的问题实际上取决于数据。这里你说这是另一个专栏-那么你只是在一个更大的数据集上执行此分析，但一次只对一个团队进行分析吗？如果因此，我通常会说，“拆分得很好！是的，有一个附加列带有ovgroup
，因此每个ovgroup
表示一个重叠事件，该事件完全由时间和空间从其他事件中分离出来，因此每个事件都应该单独查看。我使用了您的（上级）方法定义一个函数，分成一个列表，然后使用lappy。仍在解决一些问题，但绝对不要期望你为我做我的工作，你帮了我很大的忙。再次感谢！
infectious_periods <- have[which(have$datetype=="Infectious Period"),]
have$overlap <- FALSE # initializes a new column

for(i in 1:nrow(have)){
  if(have$datetype[i] != "Infectious Period"){
    started_during <- any(have$datein[i] >= infectious_periods$datein & 
                            have$datein[i] <= infectious_periods$dateout)
    ended_during <- any(have$dateout[i] >= infectious_periods$datein & 
                          have$dateout[i] <= infectious_periods$dateout)
    in_during <- any(have$datein[i] >= infectious_periods$datein & 
                       have$dateout[i] <= infectious_periods$dateout)
    if(started_during | ended_during | in_during){
        have$overlap[i] <- TRUE
      }
  }
}
have$overlap
# A tibble: 44 x 6
#   id     datetype   datein     dateout    color   overlap
#   <chr>  <chr>      <date>     <date>     <chr>   <lgl>  
# 1 K00005 Inpatient  2018-01-11 2018-07-21 #DD4B39 TRUE   
# 2 K52253 Outpatient 2018-01-13 2018-01-13 #DD4B39 TRUE   
# 3 K32022 Inpatient  2018-01-25 2018-01-29 #DD4B39 TRUE   
# 4 K20113 Outpatient 2018-01-28 2018-01-28 #DD4B39 TRUE   
# 5 K52253 Outpatient 2018-02-24 2018-02-24 #DD4B39 FALSE  
# 6 K00164 Outpatient 2018-03-12 2018-03-12 #DD4B39 FALSE  
# 7 K00164 Outpatient 2018-03-18 2018-03-18 #DD4B39 FALSE  
# 8 K10003 Outpatient 2018-04-02 2018-04-02 #DD4B39 FALSE  
# 9 K00347 Outpatient 2018-04-05 2018-04-05 #DD4B39 TRUE   
#10 K00046 Inpatient  2018-04-05 2018-04-17 #DD4B39 TRUE  
# ... with 34 more rows

library(ggplot2)
have$size <- ifelse(have$overlap,2,1)
ggplot(have, aes(datein,datetype,col=datetype,shape=datetype,cex = size)) + geom_point() + 
  facet_grid(rows = vars(id),switch = "y") + 
  geom_vline(xintercept=infectious_periods$datein) + 
  geom_vline(xintercept=infectious_periods$dateout) +  
  theme(strip.text.y.left = element_text(angle = 0)) +
  geom_linerange(aes(xmin = datein, xmax = dateout), color = have$color,size = 2)