C++ 计算重叠区间的R码

C++ 计算重叠区间的R码,c++,r,for-loop,intervals,C++,R,For Loop,Intervals,我有此格式片段的数据: SW_Release deviceType configStartDate configEndDate 1: 04.05.00 21 2005-11-03 19:12:36 2006-02-28 10:19:27 2: 04.05.00 16 2005-11-04 03:59:05 2006-02-28 10:19:27 3: 04.05.00 20 2005-11-04

我有此格式片段的数据:

       SW_Release deviceType     configStartDate       configEndDate
 1:   04.05.00         21 2005-11-03 19:12:36 2006-02-28 10:19:27
 2:   04.05.00         16 2005-11-04 03:59:05 2006-02-28 10:19:27
 3:   04.05.00         20 2005-11-04 03:59:06 2006-02-28 10:19:27
 4:   04.05.00         15 2005-11-04 03:59:06 2006-02-28 10:19:27
 5:   04.05.00         19 2005-11-04 03:59:06 2006-02-28 10:19:27
 6:   04.05.00         17 2005-11-04 03:59:06 2006-02-28 10:19:27
 7:   04.07.03         16 2006-02-28 10:19:27 2006-03-29 01:00:39
 8:   04.07.03         20 2006-02-28 10:19:27 2006-03-29 01:00:41
 9:   04.07.01         15 2006-02-28 10:19:27 2006-03-29 01:00:41
10:   04.07.01         19 2006-02-28 10:19:27 2006-03-29 01:00:41
11:   04.07.01         17 2006-02-28 10:19:27 2006-03-29 01:00:42
12:   04.07.01         21 2006-02-28 10:19:27 2006-03-29 01:00:42
13:   04.07.01         18 2006-02-28 10:19:27 2006-03-29 01:00:42
14:   04.07.04         16 2006-03-29 01:00:40 2006-05-01 16:07:49
15:   04.07.04         20 2006-03-29 01:00:41 2006-05-01 16:07:50
16:   04.07.02         15 2006-03-29 01:00:41 2006-05-01 16:07:50
17:   04.07.02         19 2006-03-29 01:00:41 2006-05-01 16:07:51
18:   04.07.02         17 2006-03-29 01:00:42 2006-05-01 16:07:51
19:   04.07.02         21 2006-03-29 01:00:42 2006-05-01 16:07:51
20:   04.07.02         18 2006-03-29 01:00:42 2006-06-01 09:45:36
21:   04.07.04         16 2006-05-02 09:47:57 2006-06-01 09:45:25
22:   04.07.04         20 2006-05-02 09:47:57 2006-06-01 09:45:28
23:   04.07.02         15 2006-05-02 09:47:58 2006-06-01 09:45:31
24:   04.07.02         19 2006-05-02 09:47:58 2006-06-01 09:45:32
25:   04.07.02         17 2006-05-02 09:47:58 2006-06-01 09:45:34
26:   04.07.02         21 2006-05-02 09:47:58 2006-06-01 09:45:35
27:   04.07.05         16 2006-06-01 09:45:27 2006-08-14 17:54:15
28:   04.07.05         20 2006-06-01 09:45:29 2006-08-14 17:54:15
29:   04.07.06         15 2006-06-01 09:45:31 2007-12-12 11:03:00
30:   04.07.06         19 2006-06-01 09:45:33 2007-12-12 11:03:00
31:   04.07.03         17 2006-06-01 09:45:35 2006-08-14 17:54:16
32:   04.07.03         21 2006-06-01 09:45:35 2006-08-14 17:54:16
33:   04.07.04         18 2006-06-01 09:45:37 2007-12-12 11:03:00
34:   04.07.06         16 2006-08-14 17:54:15 2007-12-12 11:02:59
35:   04.07.06         20 2006-08-14 17:54:15 2007-12-12 11:02:59
36:   04.07.04         17 2006-08-14 17:54:16 2007-12-12 11:03:00
37:   04.07.04         21 2006-08-14 17:54:16 2007-12-12 11:03:00
38:   04.05.12         14 2011-06-17 15:40:13 2012-05-24 11:43:24
我需要将第二列到最后一列和最后一列之间的所有间隔相加,但是,正如您所看到的,有些行具有重叠或部分重叠的间隔

在我计算所有天数之前,我需要将完整的数据集(上面的代码片段来自该数据集)转换为如下内容:

accumulated data:
       configStartDate       configEndDate
1: 2005-11-03 19:12:36 2007-12-12 11:03:00
2: 2011-06-17 15:40:13 2012-05-24 11:43:24
total days: 934.296
这是我的R代码,它必须是R,虽然我考虑在C++中重新编写它,并使用RCPP:< /P>
merge_intervals <- function(interval_dt){
  interval_dt <- interval_dt[order(configStartDate), list(configStartDate, configEndDate)]

  new_dt <- interval_dt[1, list(configStartDate, configEndDate)]

  for (i in 2:dim(interval_dt)[1]) {
    buff <- interval_dt[i, list(configStartDate, configEndDate)]

    if (new_dt[dim(new_dt)[1], configEndDate] >= buff[, configStartDate]){
      if(new_dt[dim(new_dt)[1], configEndDate] >= buff[, configEndDate]){
        next
      }
      else{
        new_dt[dim(new_dt)[1], configEndDate := buff[, configEndDate]]
      }
    }
    else {
      new_dt <- rbind(new_dt, buff)
    }
  }

  return(new_dt)
}
目前,使用其他计算运行整个过程大约需要0.16秒,但是,对于3000个独特的资产,这会产生8分钟的计算时间开销

如何将for循环转换为更快的内容以减少计算时间?谢谢

像这样的

df <- data.frame(
  id = 1:3,
  start = Sys.time() + c(0, 1000, 3000),
  end = Sys.time() + c(1500, 2000, 4000)
)
library(dplyr)
df %>% 
  mutate(
    overlap = lead(start, 1, default = TRUE) < end, 
    interval = cumsum(overlap)
  ) %>% 
  group_by(interval) %>% 
  summarise(start = min(start), end = max(end)) %>% 
  mutate(delta = end - start) %>% 
  summarise(total = sum(delta))

这应该是有可能做到的矢量化。您希望如何处理重叠间隔?忽略重叠或将间隔加入一个新的间隔,只考虑新的间隔。抱歉,但是您的示例并不能确切地告诉我您想要执行什么操作。如何从第一个区块中显示的10行(全部在2006年)到第二个区块中显示的两行(跨越2005-2012年),从中获取信息?您能准确地描述一下如何从样本输入到预期输出吗?我编辑了样本以包含所有行以使其更清晰。在快速查看的基础上,您是否检查了data.table包中的foverlaps?