Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/date/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 两个日期周期的部分或全部重叠_R_Date_Data.table_Overlap - Fatal编程技术网

R 两个日期周期的部分或全部重叠

R 两个日期周期的部分或全部重叠,r,date,data.table,overlap,R,Date,Data.table,Overlap,我试图知道在特定的时间段,哪些观察是活跃的 我真正的目标是了解患者在怀孕期间(9个月)哪些诊断(dx)是有效的。必须考虑的是,患者一生中可以有不同的怀孕次数,也可以有不同的诊断次数(dx可以或不能激活) 我试过喜欢或喜欢,但不完全是我想要的。真正的问题是,它有很好的文档记录,但不适用于R。 我认为,他们让它为SQL工作。所以我希望这是可以解决的 我也尝试了非等连接,但我不能让它以我想要的方式工作 让我们开始吧: 我有一个带有患者(id)诊断(dx)的DB,它的注册时间(InD_dx)和结束时间

我试图知道在特定的时间段,哪些观察是活跃的

我真正的目标是了解患者在怀孕期间(9个月)哪些诊断(dx)是有效的。必须考虑的是,患者一生中可以有不同的怀孕次数,也可以有不同的诊断次数(dx可以或不能激活)

我试过喜欢或喜欢,但不完全是我想要的。真正的问题是,它有很好的文档记录,但不适用于R。

我认为,他们让它为SQL工作。所以我希望这是可以解决的

我也尝试了非等连接,但我不能让它以我想要的方式工作

让我们开始吧

我有一个带有患者(id)诊断(dx)的DB,它的注册时间(InD_dx)和结束时间(EndD_dx),如下所示:

id <- rep("a", 11)
InD_dx <- as.Date(c("2005-10-04","2005-10-06","2005-10-06", "2008-04-07", "2010-05-10", "2012-04-24", "2012-04-24", "2012-05-15", "2014-03-20", "2014-04-22", "2017-11-30"), format = "%Y-%m-%d")
EndD_dx <- as.Date(c("2020-12-31","2020-12-31","2020-12-31", "2008-11-05", "2011-01-17", "2015-07-20", "2013-01-01", "2015-07-20", "2015-12-04", "2020-12-31", "2020-12-31"), format = "%Y-%m-%d")
dx <- c("A", "B", "C", "P", "P", "D", "P", "E", "F", "G", "H")

DT = data.table(id,InD_dx,EndD_dx, dx)

    DT
    id   InD_dx    EndD_dx   dx
 1:  a 2005-10-04 2020-12-31  A
 2:  a 2005-10-06 2020-12-31  B
 3:  a 2005-10-06 2020-12-31  C
 4:  a 2008-04-07 2008-11-05  P
 5:  a 2010-05-10 2011-01-17  P
 6:  a 2012-04-24 2015-07-20  D
 7:  a 2012-04-24 2013-01-01  P
 8:  a 2012-05-15 2015-07-20  E
 9:  a 2014-03-20 2015-12-04  F
10:  a 2014-04-22 2020-12-31  G
11:  a 2017-11-30 2020-12-31  H
Pregnancies <- copy(DT[dx== "P"])
Pregnancies 

    id InD_dx     EndD_dx   dx
 1: a 2008-04-07 2008-11-05 P
 2: a 2010-05-10 2011-01-17 P
 3: a 2012-04-24 2013-01-01 P

Dx_Other_than_Pregnancies <- copy(DT[dx!= "P"])
Dx_Other_than_Pregnancies

   id InD_dx     EndD_dx    dx
 1: a 2005-10-04 2020-12-31 A
 2: a 2005-10-06 2020-12-31 B
 3: a 2005-10-06 2020-12-31 C
 4: a 2012-04-24 2015-07-20 D
 5: a 2012-05-15 2015-07-20 E
 6: a 2014-03-20 2015-12-04 F
 7: a 2014-04-22 2020-12-31 G
 8: a 2017-11-30 2020-12-31 H
和非等联接:

   Dx_Other_than_Pregnancies[Pregnancies, on=.(id, dx_Ini<=dxIni , EndD_dx>=EndD_dx)]
在非等联接i.InD_dx和i.EndD_dx的情况下,它不会输出,EndD_dx变为i.EndD_dx

期望的结果

    id   InD_dx    EndD_dx   dx   i.InD_dx  i.EndD_dx i.dx
 1:  a 2005-10-04 2020-12-31  A 2008-04-07 2008-11-05    P
 2:  a 2005-10-06 2020-12-31  B 2008-04-07 2008-11-05    P
 3:  a 2005-10-06 2020-12-31  C 2008-04-07 2008-11-05    P
 4:  a 2005-10-04 2020-12-31  A 2010-05-10 2011-01-17    P
 5:  a 2005-10-06 2020-12-31  B 2010-05-10 2011-01-17    P
 6:  a 2005-10-06 2020-12-31  C 2010-05-10 2011-01-17    P
 7:  a 2005-10-04 2020-12-31  A 2012-04-24 2013-01-01    P
 8:  a 2005-10-06 2020-12-31  B 2012-04-24 2013-01-01    P
 9:  a 2005-10-06 2020-12-31  C 2012-04-24 2013-01-01    P
10:  a 2012-04-24 2015-07-20  D 2012-04-24 2013-01-01    P
11:  a 2012-05-15 2015-07-20  E 2012-04-24 2013-01-01    P
我不知道我是否把事情复杂化了一点。可能有一个join1:multiple,并且在日期之间进行差异化,我就能得到我想要的。这样足够有效吗

任何帮助都将不胜感激


提前谢谢

type='within'
排除了您正在寻找的部分重叠

尝试:


还请注意,您始终可以使用x[y]表示法指定所需的任何非等联接。foverlaps实际上只是x[y]非等连接的一个方便函数,以特定生物导体的确切样式实现function@Michael,重叠可以用x[y]找到,但是OP问题中描述的场景1、2和4必须进行测试,这使得
foverlaps
更加实用。
     id     InD_dx    EndD_dx dx   i.InD_dx  i.EndD_dx i.dx
 1:  a 2005-10-04 2020-12-31  A 2008-04-07 2008-11-05    P
 2:  a 2005-10-06 2020-12-31  B 2008-04-07 2008-11-05    P
 3:  a 2005-10-06 2020-12-31  C 2008-04-07 2008-11-05    P
 4:  a 2005-10-04 2020-12-31  A 2010-05-10 2011-01-17    P
 5:  a 2005-10-06 2020-12-31  B 2010-05-10 2011-01-17    P
 6:  a 2005-10-06 2020-12-31  C 2010-05-10 2011-01-17    P
 7:  a 2005-10-04 2020-12-31  A 2012-04-24 2013-01-01    P
 8:  a 2005-10-06 2020-12-31  B 2012-04-24 2013-01-01    P
 9:  a 2005-10-06 2020-12-31  C 2012-04-24 2013-01-01    P
10:  a 2012-04-24 2015-07-20  D 2012-04-24 2013-01-01    P
    id   InD_dx    EndD_dx   dx   i.InD_dx  i.EndD_dx i.dx
 1:  a 2005-10-04 2020-12-31  A 2008-04-07 2008-11-05    P
 2:  a 2005-10-06 2020-12-31  B 2008-04-07 2008-11-05    P
 3:  a 2005-10-06 2020-12-31  C 2008-04-07 2008-11-05    P
 4:  a 2005-10-04 2020-12-31  A 2010-05-10 2011-01-17    P
 5:  a 2005-10-06 2020-12-31  B 2010-05-10 2011-01-17    P
 6:  a 2005-10-06 2020-12-31  C 2010-05-10 2011-01-17    P
 7:  a 2005-10-04 2020-12-31  A 2012-04-24 2013-01-01    P
 8:  a 2005-10-06 2020-12-31  B 2012-04-24 2013-01-01    P
 9:  a 2005-10-06 2020-12-31  C 2012-04-24 2013-01-01    P
10:  a 2012-04-24 2015-07-20  D 2012-04-24 2013-01-01    P
11:  a 2012-05-15 2015-07-20  E 2012-04-24 2013-01-01    P
DT = data.table(id,InD_dx,EndD_dx, dx)

setkey(DT,id,InD_dx,EndD_dx)

foverlaps(DT[dx=='P'],DT[dx!='P'],
          by.x = c("id", "InD_dx", "EndD_dx"),
          by.y = c("id", "InD_dx", "EndD_dx"))

   id     InD_dx    EndD_dx dx   i.InD_dx  i.EndD_dx i.dx
 1:  a 2005-10-04 2020-12-31  A 2008-04-07 2008-11-05    P
 2:  a 2005-10-06 2020-12-31  B 2008-04-07 2008-11-05    P
 3:  a 2005-10-06 2020-12-31  C 2008-04-07 2008-11-05    P
 4:  a 2005-10-04 2020-12-31  A 2010-05-10 2011-01-17    P
 5:  a 2005-10-06 2020-12-31  B 2010-05-10 2011-01-17    P
 6:  a 2005-10-06 2020-12-31  C 2010-05-10 2011-01-17    P
 7:  a 2005-10-04 2020-12-31  A 2012-04-24 2013-01-01    P
 8:  a 2005-10-06 2020-12-31  B 2012-04-24 2013-01-01    P
 9:  a 2005-10-06 2020-12-31  C 2012-04-24 2013-01-01    P
10:  a 2012-04-24 2015-07-20  D 2012-04-24 2013-01-01    P
11:  a 2012-05-15 2015-07-20  E 2012-04-24 2013-01-01    P