R 两个日期周期的部分或全部重叠
我试图知道在特定的时间段,哪些观察是活跃的 我真正的目标是了解患者在怀孕期间(9个月)哪些诊断(dx)是有效的。必须考虑的是,患者一生中可以有不同的怀孕次数,也可以有不同的诊断次数(dx可以或不能激活) 我试过喜欢或喜欢,但不完全是我想要的。真正的问题是,它有很好的文档记录,但不适用于R。 我认为,他们让它为SQL工作。所以我希望这是可以解决的 我也尝试了非等连接,但我不能让它以我想要的方式工作 让我们开始吧: 我有一个带有患者(id)诊断(dx)的DB,它的注册时间(InD_dx)和结束时间(EndD_dx),如下所示:R 两个日期周期的部分或全部重叠,r,date,data.table,overlap,R,Date,Data.table,Overlap,我试图知道在特定的时间段,哪些观察是活跃的 我真正的目标是了解患者在怀孕期间(9个月)哪些诊断(dx)是有效的。必须考虑的是,患者一生中可以有不同的怀孕次数,也可以有不同的诊断次数(dx可以或不能激活) 我试过喜欢或喜欢,但不完全是我想要的。真正的问题是,它有很好的文档记录,但不适用于R。 我认为,他们让它为SQL工作。所以我希望这是可以解决的 我也尝试了非等连接,但我不能让它以我想要的方式工作 让我们开始吧: 我有一个带有患者(id)诊断(dx)的DB,它的注册时间(InD_dx)和结束时间
id <- rep("a", 11)
InD_dx <- as.Date(c("2005-10-04","2005-10-06","2005-10-06", "2008-04-07", "2010-05-10", "2012-04-24", "2012-04-24", "2012-05-15", "2014-03-20", "2014-04-22", "2017-11-30"), format = "%Y-%m-%d")
EndD_dx <- as.Date(c("2020-12-31","2020-12-31","2020-12-31", "2008-11-05", "2011-01-17", "2015-07-20", "2013-01-01", "2015-07-20", "2015-12-04", "2020-12-31", "2020-12-31"), format = "%Y-%m-%d")
dx <- c("A", "B", "C", "P", "P", "D", "P", "E", "F", "G", "H")
DT = data.table(id,InD_dx,EndD_dx, dx)
DT
id InD_dx EndD_dx dx
1: a 2005-10-04 2020-12-31 A
2: a 2005-10-06 2020-12-31 B
3: a 2005-10-06 2020-12-31 C
4: a 2008-04-07 2008-11-05 P
5: a 2010-05-10 2011-01-17 P
6: a 2012-04-24 2015-07-20 D
7: a 2012-04-24 2013-01-01 P
8: a 2012-05-15 2015-07-20 E
9: a 2014-03-20 2015-12-04 F
10: a 2014-04-22 2020-12-31 G
11: a 2017-11-30 2020-12-31 H
Pregnancies <- copy(DT[dx== "P"])
Pregnancies
id InD_dx EndD_dx dx
1: a 2008-04-07 2008-11-05 P
2: a 2010-05-10 2011-01-17 P
3: a 2012-04-24 2013-01-01 P
Dx_Other_than_Pregnancies <- copy(DT[dx!= "P"])
Dx_Other_than_Pregnancies
id InD_dx EndD_dx dx
1: a 2005-10-04 2020-12-31 A
2: a 2005-10-06 2020-12-31 B
3: a 2005-10-06 2020-12-31 C
4: a 2012-04-24 2015-07-20 D
5: a 2012-05-15 2015-07-20 E
6: a 2014-03-20 2015-12-04 F
7: a 2014-04-22 2020-12-31 G
8: a 2017-11-30 2020-12-31 H
和非等联接:
Dx_Other_than_Pregnancies[Pregnancies, on=.(id, dx_Ini<=dxIni , EndD_dx>=EndD_dx)]
在非等联接i.InD_dx和i.EndD_dx的情况下,它不会输出,EndD_dx变为i.EndD_dx
期望的结果
id InD_dx EndD_dx dx i.InD_dx i.EndD_dx i.dx
1: a 2005-10-04 2020-12-31 A 2008-04-07 2008-11-05 P
2: a 2005-10-06 2020-12-31 B 2008-04-07 2008-11-05 P
3: a 2005-10-06 2020-12-31 C 2008-04-07 2008-11-05 P
4: a 2005-10-04 2020-12-31 A 2010-05-10 2011-01-17 P
5: a 2005-10-06 2020-12-31 B 2010-05-10 2011-01-17 P
6: a 2005-10-06 2020-12-31 C 2010-05-10 2011-01-17 P
7: a 2005-10-04 2020-12-31 A 2012-04-24 2013-01-01 P
8: a 2005-10-06 2020-12-31 B 2012-04-24 2013-01-01 P
9: a 2005-10-06 2020-12-31 C 2012-04-24 2013-01-01 P
10: a 2012-04-24 2015-07-20 D 2012-04-24 2013-01-01 P
11: a 2012-05-15 2015-07-20 E 2012-04-24 2013-01-01 P
我不知道我是否把事情复杂化了一点。可能有一个join1:multiple,并且在日期之间进行差异化,我就能得到我想要的。这样足够有效吗
任何帮助都将不胜感激
提前谢谢
type='within'
排除了您正在寻找的部分重叠
尝试:
还请注意,您始终可以使用x[y]表示法指定所需的任何非等联接。foverlaps实际上只是x[y]非等连接的一个方便函数,以特定生物导体的确切样式实现function@Michael,重叠可以用x[y]找到,但是OP问题中描述的场景1、2和4必须进行测试,这使得
foverlaps
更加实用。
id InD_dx EndD_dx dx i.InD_dx i.EndD_dx i.dx
1: a 2005-10-04 2020-12-31 A 2008-04-07 2008-11-05 P
2: a 2005-10-06 2020-12-31 B 2008-04-07 2008-11-05 P
3: a 2005-10-06 2020-12-31 C 2008-04-07 2008-11-05 P
4: a 2005-10-04 2020-12-31 A 2010-05-10 2011-01-17 P
5: a 2005-10-06 2020-12-31 B 2010-05-10 2011-01-17 P
6: a 2005-10-06 2020-12-31 C 2010-05-10 2011-01-17 P
7: a 2005-10-04 2020-12-31 A 2012-04-24 2013-01-01 P
8: a 2005-10-06 2020-12-31 B 2012-04-24 2013-01-01 P
9: a 2005-10-06 2020-12-31 C 2012-04-24 2013-01-01 P
10: a 2012-04-24 2015-07-20 D 2012-04-24 2013-01-01 P
id InD_dx EndD_dx dx i.InD_dx i.EndD_dx i.dx
1: a 2005-10-04 2020-12-31 A 2008-04-07 2008-11-05 P
2: a 2005-10-06 2020-12-31 B 2008-04-07 2008-11-05 P
3: a 2005-10-06 2020-12-31 C 2008-04-07 2008-11-05 P
4: a 2005-10-04 2020-12-31 A 2010-05-10 2011-01-17 P
5: a 2005-10-06 2020-12-31 B 2010-05-10 2011-01-17 P
6: a 2005-10-06 2020-12-31 C 2010-05-10 2011-01-17 P
7: a 2005-10-04 2020-12-31 A 2012-04-24 2013-01-01 P
8: a 2005-10-06 2020-12-31 B 2012-04-24 2013-01-01 P
9: a 2005-10-06 2020-12-31 C 2012-04-24 2013-01-01 P
10: a 2012-04-24 2015-07-20 D 2012-04-24 2013-01-01 P
11: a 2012-05-15 2015-07-20 E 2012-04-24 2013-01-01 P
DT = data.table(id,InD_dx,EndD_dx, dx)
setkey(DT,id,InD_dx,EndD_dx)
foverlaps(DT[dx=='P'],DT[dx!='P'],
by.x = c("id", "InD_dx", "EndD_dx"),
by.y = c("id", "InD_dx", "EndD_dx"))
id InD_dx EndD_dx dx i.InD_dx i.EndD_dx i.dx
1: a 2005-10-04 2020-12-31 A 2008-04-07 2008-11-05 P
2: a 2005-10-06 2020-12-31 B 2008-04-07 2008-11-05 P
3: a 2005-10-06 2020-12-31 C 2008-04-07 2008-11-05 P
4: a 2005-10-04 2020-12-31 A 2010-05-10 2011-01-17 P
5: a 2005-10-06 2020-12-31 B 2010-05-10 2011-01-17 P
6: a 2005-10-06 2020-12-31 C 2010-05-10 2011-01-17 P
7: a 2005-10-04 2020-12-31 A 2012-04-24 2013-01-01 P
8: a 2005-10-06 2020-12-31 B 2012-04-24 2013-01-01 P
9: a 2005-10-06 2020-12-31 C 2012-04-24 2013-01-01 P
10: a 2012-04-24 2015-07-20 D 2012-04-24 2013-01-01 P
11: a 2012-05-15 2015-07-20 E 2012-04-24 2013-01-01 P