Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/81.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/date/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 为组创建累积时间数据_R_Date_Dataframe_Dplyr_Data.table - Fatal编程技术网

R 为组创建累积时间数据

R 为组创建累积时间数据,r,date,dataframe,dplyr,data.table,R,Date,Dataframe,Dplyr,Data.table,我有一个带有ID、事件日期和日期的数据框 timeyrs = endate - startdate eventyr = eventdate - startdate 注:我已将这些数字四舍五入 df1 ID eventdate startdate enddate timeyrs eventyr 1 20-10-2007 16-06-2003 21-07-2017 14.1 4.34 1 11-11-08 16-06-2003 21-07-2017 1

我有一个带有ID、事件日期和日期的数据框

timeyrs = endate - startdate

eventyr = eventdate - startdate
注:我已将这些数字四舍五入

df1

ID  eventdate   startdate   enddate     timeyrs eventyr
1   20-10-2007  16-06-2003  21-07-2017  14.1  4.34
1   11-11-08    16-06-2003  21-07-2017  14.1  5.41
1   26-09-2012  16-06-2003  21-07-2017  14.1  9.28
2   11-05-2014  20-04-2012  16-06-2017  5.2   2.06
3   11-04-2017  6-02-2015   21-04-2017  2.2   2.18
我想将整个数据集的数据总结为几年的后续数据。即,每年1行(约20年)

df2
创建:

Year cmltime cmlevent
1   3   0
2   3   0
3   2.2 2
4   2   0
5   2   1
6   1.2 1
7   1   0
8   1   0
9   1   0
10  1   1
11  1   0
12  1   0
13  1   0
14  1   0
15  0.1 0
对于累积时间-这是当年可用的数据量,例如,对于第一年,有3个ID贡献了一整年的数据,最后6-14年只有1年的dat

对于累积事件,这是该后续年份中发生的通风口的总和。ID的2和3在其数据的第三年发生了事件


我一直在试用dplyr的代码,但迄今为止运气不佳。欢迎提出建议!

如果我理解正确,OP希望为每个
ID
创建一个自己的时间刻度,其中第一年从
startdate
开始。最后,结果在几年内汇总

下面是一个结合了
数据的代码。表
语法用于分组和聚合,以及
magrittr
管道用于算术。(顺便说一句,这是一个很好的练习,可以利用
magrittr
丰富的管道功能。)

最后,每年汇总如下:

cml_by_ID[, lapply(.SD, sum), .SDcols = c("cmltime", "cmlevent"), by = year]
资料
什么是
time.sum
原始数据的总和?嗨@Relasta,我用一个更好的例子更新了我的问题
cml_by_ID
    ID year cmltime cmlevent
 1:  1    1     1.0        0
 2:  1    2     1.0        0
 3:  1    3     1.0        0
 4:  1    4     1.0        0
 5:  1    5     1.0        1
 6:  1    6     1.0        1
 7:  1    7     1.0        0
 8:  1    8     1.0        0
 9:  1    9     1.0        0
10:  1   10     1.0        1
11:  1   11     1.0        0
12:  1   12     1.0        0
13:  1   13     1.0        0
14:  1   14     1.0        0
15:  1   15     0.1        0
16:  2    1     1.0        0
17:  2    2     1.0        0
18:  2    3     1.0        1
19:  2    4     1.0        0
20:  2    5     1.0        0
21:  2    6     0.2        0
22:  3    1     1.0        0
23:  3    2     1.0        0
24:  3    3     0.2        1
    ID year cmltime cmlevent
cml_by_ID[, lapply(.SD, sum), .SDcols = c("cmltime", "cmlevent"), by = year]
    year cmltime cmlevent
 1:    1     3.0        0
 2:    2     3.0        0
 3:    3     2.2        2
 4:    4     2.0        0
 5:    5     2.0        1
 6:    6     1.2        1
 7:    7     1.0        0
 8:    8     1.0        0
 9:    9     1.0        0
10:   10     1.0        1
11:   11     1.0        0
12:   12     1.0        0
13:   13     1.0        0
14:   14     1.0        0
15:   15     0.1        0
library(data.table)
DT <- fread(
  "ID  eventdate   startdate   enddate     timeyrs eventyr
1   20-10-2007  16-06-2003  21-07-2017  14.1  4.34
1   11-11-08    16-06-2003  21-07-2017  14.1  5.41
1   26-09-2012  16-06-2003  21-07-2017  14.1  9.28
2   11-05-2014  20-04-2012  16-06-2017  5.2   2.06
3   11-04-2017  6-02-2015   21-04-2017  2.2   2.18",
  select = 1:4
)
# convert date strings to Date class
cols <- names(DT)[names(DT) %like% "date$"]
DT[, (cols) := lapply(.SD, lubridate::dmy), .SDcols = cols]
DT
   ID  eventdate  startdate    enddate
1:  1 2007-10-20 2003-06-16 2017-07-21
2:  1 2008-11-11 2003-06-16 2017-07-21
3:  1 2012-09-26 2003-06-16 2017-07-21
4:  2 2014-05-11 2012-04-20 2017-06-16
5:  3 2017-04-11 2015-02-06 2017-04-21