Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/67.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/performance/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
加速时间间隔过滤-R_R_Performance_Time Series - Fatal编程技术网

加速时间间隔过滤-R

加速时间间隔过滤-R,r,performance,time-series,R,Performance,Time Series,我有两个带入院日期的住院数据集(入院),以及带检测日期的实验室结果(检测)。患者有个人ID(患者ID),每次入院都有自己的入院ID(入院ID)。实验室测试数据集仅包含患者ID。一些可复制的示例数据: admission <- data.frame( patient_id = c("a", "a", "b", "b", "c", "c", "d", "d", "e", "e"), admission_id = c(1, 2, 1, 2, 1, 2, 1, 2, 1, 2), st

我有两个带入院日期的住院数据集(
入院
),以及带检测日期的实验室结果(
检测
)。患者有个人ID(
患者ID
),每次入院都有自己的入院ID(
入院ID
)。实验室测试数据集仅包含患者ID。一些可复制的示例数据:

admission <- data.frame(
  patient_id = c("a", "a", "b", "b", "c", "c", "d", "d", "e", "e"),
  admission_id = c(1, 2, 1, 2, 1, 2, 1, 2, 1, 2),
  start_date = as.Date(
    c(
      "2010-10-22",
      "2013-04-30",
      "2009-02-08", 
      "2015-12-12",
      "2013-01-08", 
      "2015-02-27",
      "2009-08-02",
      "2011-12-19",
      "2011-09-02",
      "2016-05-25"
    )
    ),
  end_date = as.Date(
    c(
      "2010-10-23", 
      "2013-05-03",
      "2009-02-12",
      "2015-12-12",
      "2013-01-15",
      "2015-02-27",
      "2009-08-06",
      "2011-12-26",
      "2011-09-06",
      "2016-05-31"
    )
  )
  )

test <- data.frame(
  patient_id = c("a", "a", "b", "b", "c", "c", "d", "d", "e", "e"),
  test_date = as.Date(
    c(
      "2010-10-23",
      "2013-04-01",
      "2009-02-08",
      "2015-12-12",
      "2013-06-01",
      "2015-02-28",
      "2009-10-08",
      "2011-12-21",
      "2011-09-02",
      "2016-05-26"
    )
  )
)
结果:

  patient  test_date admission_id start_date   end_date
1       a 2010-10-23            1 2010-10-22 2010-10-23
2       b 2009-02-08            1 2009-02-08 2009-02-12
3       b 2015-12-12            2 2015-12-12 2015-12-12
4       d 2011-12-21            2 2011-12-19 2011-12-26
5       e 2011-09-02            1 2011-09-02 2011-09-06
6       e 2016-05-26            2 2016-05-25 2016-05-31
这对于这个小示例很有效,但对于较大的数据集(>100000行/观测值)来说速度非常慢


你知道如何用另一种方法来加速吗?

使用数据。表的foverlaps——这对于大型数据对象来说很快:

> # here is a solution using the 'foverlaps' function in 'data.table'
> library(data.table)

> admission <- data.frame(
+   patient_id = c("a", "a", "b", "b", "c", "c", "d", "d", "e", "e"),
+   admission_id = c(1, 2, 1, 2, 1, 2, 1, 2, 1, 2),
+ .... [TRUNCATED] 

> test <- data.frame(
+   patient_id = c("a", "a", "b", "b", "c", "c", "d", "d", "e", "e"),
+   test_date = as.Date(
+     c(
+       "2010-10-23",
+  .... [TRUNCATED] 

> # add dummy dates to test after making data.tables
> setDT(admission)

> setDT(test)

> test[, `:=`(start_date = test_date, end_date = test_date)]

> setkey(admission, start_date, end_date)  # set the key that is required

> foverlaps(test, admission)[
+   !is.na(patient_id)][,  # remove non-matches
+     `:=`(i.patient_id = NULL, i.start_date = NULL, i.end_date = NULL)] .... [TRUNCATED] 
   patient_id admission_id start_date   end_date  test_date
1:          a            1 2010-10-22 2010-10-23 2010-10-23
2:          b            1 2009-02-08 2009-02-12 2009-02-08
3:          b            2 2015-12-12 2015-12-12 2015-12-12
4:          d            2 2011-12-19 2011-12-26 2011-12-21
5:          e            1 2011-09-02 2011-09-06 2011-09-02
6:          e            2 2016-05-25 2016-05-31 2016-05-26
>
#下面是一个使用“data.table”中的“foverlaps”函数的解决方案
>库(数据表)
>入学考试#在制作数据表后为考试添加虚拟日期
>setDT(入学)
>setDT(测试)
>测试[,`:=`(开始日期=测试日期,结束日期=测试日期)]
>设置密钥(入院、开始日期、结束日期)#设置所需密钥
>foverlaps(考试、入学)[
+!is.na(患者id)],#删除不匹配项
+`:=`(i.patient\u id=NULL,i.start\u date=NULL,i.end\u date=NULL)]。。。。[截断]
患者id入院id开始日期结束日期测试日期
1:a 1 2010-10-22 2010-10-23 2010-10-23
2:B1 2009-02-08 2009-02-12 2009-02-08
3:b 2 2015-12-12 2015-12-12 2015-12-12
4:D22011-12-19 2011-12-26 2011-12-21
5:E12011-09-022011-09-062011-09-02
6:E22016-05-252016-05-312016-05-26
>

也许可以尝试与数据表进行非等联接。
> # here is a solution using the 'foverlaps' function in 'data.table'
> library(data.table)

> admission <- data.frame(
+   patient_id = c("a", "a", "b", "b", "c", "c", "d", "d", "e", "e"),
+   admission_id = c(1, 2, 1, 2, 1, 2, 1, 2, 1, 2),
+ .... [TRUNCATED] 

> test <- data.frame(
+   patient_id = c("a", "a", "b", "b", "c", "c", "d", "d", "e", "e"),
+   test_date = as.Date(
+     c(
+       "2010-10-23",
+  .... [TRUNCATED] 

> # add dummy dates to test after making data.tables
> setDT(admission)

> setDT(test)

> test[, `:=`(start_date = test_date, end_date = test_date)]

> setkey(admission, start_date, end_date)  # set the key that is required

> foverlaps(test, admission)[
+   !is.na(patient_id)][,  # remove non-matches
+     `:=`(i.patient_id = NULL, i.start_date = NULL, i.end_date = NULL)] .... [TRUNCATED] 
   patient_id admission_id start_date   end_date  test_date
1:          a            1 2010-10-22 2010-10-23 2010-10-23
2:          b            1 2009-02-08 2009-02-12 2009-02-08
3:          b            2 2015-12-12 2015-12-12 2015-12-12
4:          d            2 2011-12-19 2011-12-26 2011-12-21
5:          e            1 2011-09-02 2011-09-06 2011-09-02
6:          e            2 2016-05-25 2016-05-31 2016-05-26
>