加速时间间隔过滤-R
我有两个带入院日期的住院数据集(加速时间间隔过滤-R,r,performance,time-series,R,Performance,Time Series,我有两个带入院日期的住院数据集(入院),以及带检测日期的实验室结果(检测)。患者有个人ID(患者ID),每次入院都有自己的入院ID(入院ID)。实验室测试数据集仅包含患者ID。一些可复制的示例数据: admission <- data.frame( patient_id = c("a", "a", "b", "b", "c", "c", "d", "d", "e", "e"), admission_id = c(1, 2, 1, 2, 1, 2, 1, 2, 1, 2), st
入院
),以及带检测日期的实验室结果(检测
)。患者有个人ID(患者ID
),每次入院都有自己的入院ID(入院ID
)。实验室测试数据集仅包含患者ID。一些可复制的示例数据:
admission <- data.frame(
patient_id = c("a", "a", "b", "b", "c", "c", "d", "d", "e", "e"),
admission_id = c(1, 2, 1, 2, 1, 2, 1, 2, 1, 2),
start_date = as.Date(
c(
"2010-10-22",
"2013-04-30",
"2009-02-08",
"2015-12-12",
"2013-01-08",
"2015-02-27",
"2009-08-02",
"2011-12-19",
"2011-09-02",
"2016-05-25"
)
),
end_date = as.Date(
c(
"2010-10-23",
"2013-05-03",
"2009-02-12",
"2015-12-12",
"2013-01-15",
"2015-02-27",
"2009-08-06",
"2011-12-26",
"2011-09-06",
"2016-05-31"
)
)
)
test <- data.frame(
patient_id = c("a", "a", "b", "b", "c", "c", "d", "d", "e", "e"),
test_date = as.Date(
c(
"2010-10-23",
"2013-04-01",
"2009-02-08",
"2015-12-12",
"2013-06-01",
"2015-02-28",
"2009-10-08",
"2011-12-21",
"2011-09-02",
"2016-05-26"
)
)
)
结果:
patient test_date admission_id start_date end_date
1 a 2010-10-23 1 2010-10-22 2010-10-23
2 b 2009-02-08 1 2009-02-08 2009-02-12
3 b 2015-12-12 2 2015-12-12 2015-12-12
4 d 2011-12-21 2 2011-12-19 2011-12-26
5 e 2011-09-02 1 2011-09-02 2011-09-06
6 e 2016-05-26 2 2016-05-25 2016-05-31
这对于这个小示例很有效,但对于较大的数据集(>100000行/观测值)来说速度非常慢
你知道如何用另一种方法来加速吗?使用数据。表的foverlaps——这对于大型数据对象来说很快:
> # here is a solution using the 'foverlaps' function in 'data.table'
> library(data.table)
> admission <- data.frame(
+ patient_id = c("a", "a", "b", "b", "c", "c", "d", "d", "e", "e"),
+ admission_id = c(1, 2, 1, 2, 1, 2, 1, 2, 1, 2),
+ .... [TRUNCATED]
> test <- data.frame(
+ patient_id = c("a", "a", "b", "b", "c", "c", "d", "d", "e", "e"),
+ test_date = as.Date(
+ c(
+ "2010-10-23",
+ .... [TRUNCATED]
> # add dummy dates to test after making data.tables
> setDT(admission)
> setDT(test)
> test[, `:=`(start_date = test_date, end_date = test_date)]
> setkey(admission, start_date, end_date) # set the key that is required
> foverlaps(test, admission)[
+ !is.na(patient_id)][, # remove non-matches
+ `:=`(i.patient_id = NULL, i.start_date = NULL, i.end_date = NULL)] .... [TRUNCATED]
patient_id admission_id start_date end_date test_date
1: a 1 2010-10-22 2010-10-23 2010-10-23
2: b 1 2009-02-08 2009-02-12 2009-02-08
3: b 2 2015-12-12 2015-12-12 2015-12-12
4: d 2 2011-12-19 2011-12-26 2011-12-21
5: e 1 2011-09-02 2011-09-06 2011-09-02
6: e 2 2016-05-25 2016-05-31 2016-05-26
>
#下面是一个使用“data.table”中的“foverlaps”函数的解决方案
>库(数据表)
>入学考试#在制作数据表后为考试添加虚拟日期
>setDT(入学)
>setDT(测试)
>测试[,`:=`(开始日期=测试日期,结束日期=测试日期)]
>设置密钥(入院、开始日期、结束日期)#设置所需密钥
>foverlaps(考试、入学)[
+!is.na(患者id)],#删除不匹配项
+`:=`(i.patient\u id=NULL,i.start\u date=NULL,i.end\u date=NULL)]。。。。[截断]
患者id入院id开始日期结束日期测试日期
1:a 1 2010-10-22 2010-10-23 2010-10-23
2:B1 2009-02-08 2009-02-12 2009-02-08
3:b 2 2015-12-12 2015-12-12 2015-12-12
4:D22011-12-19 2011-12-26 2011-12-21
5:E12011-09-022011-09-062011-09-02
6:E22016-05-252016-05-312016-05-26
>
也许可以尝试与数据表进行非等联接。
> # here is a solution using the 'foverlaps' function in 'data.table'
> library(data.table)
> admission <- data.frame(
+ patient_id = c("a", "a", "b", "b", "c", "c", "d", "d", "e", "e"),
+ admission_id = c(1, 2, 1, 2, 1, 2, 1, 2, 1, 2),
+ .... [TRUNCATED]
> test <- data.frame(
+ patient_id = c("a", "a", "b", "b", "c", "c", "d", "d", "e", "e"),
+ test_date = as.Date(
+ c(
+ "2010-10-23",
+ .... [TRUNCATED]
> # add dummy dates to test after making data.tables
> setDT(admission)
> setDT(test)
> test[, `:=`(start_date = test_date, end_date = test_date)]
> setkey(admission, start_date, end_date) # set the key that is required
> foverlaps(test, admission)[
+ !is.na(patient_id)][, # remove non-matches
+ `:=`(i.patient_id = NULL, i.start_date = NULL, i.end_date = NULL)] .... [TRUNCATED]
patient_id admission_id start_date end_date test_date
1: a 1 2010-10-22 2010-10-23 2010-10-23
2: b 1 2009-02-08 2009-02-12 2009-02-08
3: b 2 2015-12-12 2015-12-12 2015-12-12
4: d 2 2011-12-19 2011-12-26 2011-12-21
5: e 1 2011-09-02 2011-09-06 2011-09-02
6: e 2 2016-05-25 2016-05-31 2016-05-26
>