加速时间间隔过滤-R_R_Performance_Time Series

加速时间间隔过滤-R

r performance

加速时间间隔过滤-R,r,performance,time-series,R,Performance,Time Series,我有两个带入院日期的住院数据集（入院），以及带检测日期的实验室结果（检测）。患者有个人ID（患者ID），每次入院都有自己的入院ID（入院ID）。实验室测试数据集仅包含患者ID。一些可复制的示例数据： admission <- data.frame( patient_id = c("a", "a", "b", "b", "c", "c", "d", "d", "e", "e"), admission_id = c(1, 2, 1, 2, 1, 2, 1, 2, 1, 2), st

我有两个带入院日期的住院数据集（

入院

），以及带检测日期的实验室结果（

检测

）。患者有个人ID（

患者ID

），每次入院都有自己的入院ID（

入院ID

）。实验室测试数据集仅包含患者ID。一些可复制的示例数据：

admission <- data.frame(
  patient_id = c("a", "a", "b", "b", "c", "c", "d", "d", "e", "e"),
  admission_id = c(1, 2, 1, 2, 1, 2, 1, 2, 1, 2),
  start_date = as.Date(
    c(
      "2010-10-22",
      "2013-04-30",
      "2009-02-08", 
      "2015-12-12",
      "2013-01-08", 
      "2015-02-27",
      "2009-08-02",
      "2011-12-19",
      "2011-09-02",
      "2016-05-25"
    )
    ),
  end_date = as.Date(
    c(
      "2010-10-23", 
      "2013-05-03",
      "2009-02-12",
      "2015-12-12",
      "2013-01-15",
      "2015-02-27",
      "2009-08-06",
      "2011-12-26",
      "2011-09-06",
      "2016-05-31"
    )
  )
  )

test <- data.frame(
  patient_id = c("a", "a", "b", "b", "c", "c", "d", "d", "e", "e"),
  test_date = as.Date(
    c(
      "2010-10-23",
      "2013-04-01",
      "2009-02-08",
      "2015-12-12",
      "2013-06-01",
      "2015-02-28",
      "2009-10-08",
      "2011-12-21",
      "2011-09-02",
      "2016-05-26"
    )
  )
)

结果:

  patient  test_date admission_id start_date   end_date
1       a 2010-10-23            1 2010-10-22 2010-10-23
2       b 2009-02-08            1 2009-02-08 2009-02-12
3       b 2015-12-12            2 2015-12-12 2015-12-12
4       d 2011-12-21            2 2011-12-19 2011-12-26
5       e 2011-09-02            1 2011-09-02 2011-09-06
6       e 2016-05-26            2 2016-05-25 2016-05-31

这对于这个小示例很有效，但对于较大的数据集（>100000行/观测值）来说速度非常慢

你知道如何用另一种方法来加速吗？

使用数据。表的foverlaps——这对于大型数据对象来说很快：

> # here is a solution using the 'foverlaps' function in 'data.table'
> library(data.table)

> admission <- data.frame(
+   patient_id = c("a", "a", "b", "b", "c", "c", "d", "d", "e", "e"),
+   admission_id = c(1, 2, 1, 2, 1, 2, 1, 2, 1, 2),
+ .... [TRUNCATED] 

> test <- data.frame(
+   patient_id = c("a", "a", "b", "b", "c", "c", "d", "d", "e", "e"),
+   test_date = as.Date(
+     c(
+       "2010-10-23",
+  .... [TRUNCATED] 

> # add dummy dates to test after making data.tables
> setDT(admission)

> setDT(test)

> test[, `:=`(start_date = test_date, end_date = test_date)]

> setkey(admission, start_date, end_date)  # set the key that is required

> foverlaps(test, admission)[
+   !is.na(patient_id)][,  # remove non-matches
+     `:=`(i.patient_id = NULL, i.start_date = NULL, i.end_date = NULL)] .... [TRUNCATED] 
   patient_id admission_id start_date   end_date  test_date
1:          a            1 2010-10-22 2010-10-23 2010-10-23
2:          b            1 2009-02-08 2009-02-12 2009-02-08
3:          b            2 2015-12-12 2015-12-12 2015-12-12
4:          d            2 2011-12-19 2011-12-26 2011-12-21
5:          e            1 2011-09-02 2011-09-06 2011-09-02
6:          e            2 2016-05-25 2016-05-31 2016-05-26
>

#下面是一个使用“data.table”中的“foverlaps”函数的解决方案
>库（数据表）
>入学考试#在制作数据表后为考试添加虚拟日期
>setDT（入学）
>setDT（测试）
>测试[，`:=`（开始日期=测试日期，结束日期=测试日期）]
>设置密钥（入院、开始日期、结束日期）#设置所需密钥
>foverlaps（考试、入学）[
+！is.na（患者id）]，#删除不匹配项
+`:=`（i.patient\u id=NULL，i.start\u date=NULL，i.end\u date=NULL）]。。。。[截断]
患者id入院id开始日期结束日期测试日期
1:a 1 2010-10-22 2010-10-23 2010-10-23
2:B1 2009-02-08 2009-02-12 2009-02-08
3:b 2 2015-12-12 2015-12-12 2015-12-12
4:D22011-12-19 2011-12-26 2011-12-21
5:E12011-09-022011-09-062011-09-02
6:E22016-05-252016-05-312016-05-26
>

也许可以尝试与数据表进行非等联接。

> # here is a solution using the 'foverlaps' function in 'data.table'
> library(data.table)

> admission <- data.frame(
+   patient_id = c("a", "a", "b", "b", "c", "c", "d", "d", "e", "e"),
+   admission_id = c(1, 2, 1, 2, 1, 2, 1, 2, 1, 2),
+ .... [TRUNCATED] 

> test <- data.frame(
+   patient_id = c("a", "a", "b", "b", "c", "c", "d", "d", "e", "e"),
+   test_date = as.Date(
+     c(
+       "2010-10-23",
+  .... [TRUNCATED] 

> # add dummy dates to test after making data.tables
> setDT(admission)

> setDT(test)

> test[, `:=`(start_date = test_date, end_date = test_date)]

> setkey(admission, start_date, end_date)  # set the key that is required

> foverlaps(test, admission)[
+   !is.na(patient_id)][,  # remove non-matches
+     `:=`(i.patient_id = NULL, i.start_date = NULL, i.end_date = NULL)] .... [TRUNCATED] 
   patient_id admission_id start_date   end_date  test_date
1:          a            1 2010-10-22 2010-10-23 2010-10-23
2:          b            1 2009-02-08 2009-02-12 2009-02-08
3:          b            2 2015-12-12 2015-12-12 2015-12-12
4:          d            2 2011-12-19 2011-12-26 2011-12-21
5:          e            1 2011-09-02 2011-09-06 2011-09-02
6:          e            2 2016-05-25 2016-05-31 2016-05-26
>