R 当每个月的观察次数不同时,如何将一个值延迟一个月?

R 当每个月的观察次数不同时,如何将一个值延迟一个月?,r,data.table,R,Data.table,我有多个日期的数据集。我想将单元格的值延迟一个月。我可能无法使用shift(),因为每个月都有不同的天数(更不用说还有一些缺失的日期) 我要做的是创建一个具有唯一的年和月的新数据表,对单元格进行移位/延迟,然后将其与原始数据表合并(注意不要有重复的列) 这显然是没有效率的。还有别的办法吗 sapply(c('data.table', 'lubridate'), require, character.only = TRUE) DT <- fread('DATE, ID, Cells 200

我有多个日期的数据集。我想将
单元格的值延迟一个月。我可能无法使用
shift()
,因为每个月都有不同的天数(更不用说还有一些缺失的日期)

我要做的是创建一个具有唯一的
的新数据表,对
单元格进行移位/延迟
,然后将其与原始数据表合并(注意不要有重复的列)

这显然是没有效率的。还有别的办法吗

sapply(c('data.table', 'lubridate'), require, character.only = TRUE)

DT <- fread('DATE, ID, Cells
2000-01-01, 1, 10
2000-01-02, 1, 10
2000-01-03, 1, 10
2000-01-01, 2, 20
2000-01-02, 2, 20
2000-01-03, 2, 20
2000-01-04, 2, 20
2000-02-01, 1, 30
2000-02-02, 1, 30
2000-02-01, 2, 40
2000-02-03, 2, 40
2000-02-04, 2, 40
2000-03-01, 1, 50
2000-03-02, 1, 50
2000-03-01, 2, 60
2000-03-03, 2, 60
')


DT[, date := as.Date(DATE, format = '%Y-%m-%d')][,
           c('Year', 'Month') := .(year(date), month(date))]

setkey(DT, Year, Month, ID)

DT.Months <- DT[which(!duplicated(DT))][, 
               .(Year, Month, ID, Cells)]

DT.Months[, `:=`(Lagged.Cells = 
          shift(Cells, 1L, type = 'lag')), by = .(ID)]

DT <- DT[DT.Months][, `:=`(i.Cells, NULL)]

# > DT # This is what I want. 
# The Value in Cells is lagged by one month, 
# regardless of the number of observations within a month for each ID.
#          DATE ID Cells       date Year Month Lagged.Cells
# 1: 2000-01-01  1    10 2000-01-01 2000     1           NA
# 2: 2000-01-02  1    10 2000-01-02 2000     1           NA
# 3: 2000-01-03  1    10 2000-01-03 2000     1           NA
# 4: 2000-01-01  2    20 2000-01-01 2000     1           NA
# 5: 2000-01-02  2    20 2000-01-02 2000     1           NA
# 6: 2000-01-03  2    20 2000-01-03 2000     1           NA
# 7: 2000-01-04  2    20 2000-01-04 2000     1           NA
# 8: 2000-02-01  1    30 2000-02-01 2000     2           10
# 9: 2000-02-02  1    30 2000-02-02 2000     2           10
#10: 2000-02-01  2    40 2000-02-01 2000     2           10
#11: 2000-02-03  2    40 2000-02-03 2000     2           20
#12: 2000-02-04  2    40 2000-02-04 2000     2           20
#13: 2000-03-01  1    50 2000-03-01 2000     3           20
#14: 2000-03-02  1    50 2000-03-02 2000     3           20
#15: 2000-03-01  2    60 2000-03-01 2000     3           30
#16: 2000-03-03  2    60 2000-03-03 2000     3           30
sapply(c('data.table','lubridate'),require,character.only=TRUE)

DT
Date
类通过
支持
seq
“月”
季度”
年”
等。 不是很优雅,但你可以这样做

library(magrittr)
DT[, DATE := as.Date(DATE)]
DT[,  DATE_lag := sapply(DATE, function(x) 
  seq(x, by = "1 month", length.out = 2)[2]) %>%
    as.Date(origin = "1970-01-01")]
DT2 <- DT[, .(DATE_lag, ID, Cells)]
setnames(DT2, c("DATE_lag", "Cells"), c("DATE", "Lagged.Cells"))
merge(DT, DT2, by = c("DATE", "ID"), all.x = TRUE)

         DATE ID Cells       date month lag.cells   DATE_lag Lagged.Cells
 1: 2000-01-01  1    10 2000-01-01   Jan        NA 2000-02-01           NA
 2: 2000-01-01  2    20 2000-01-01   Jan        NA 2000-02-01           NA
 3: 2000-01-02  1    10 2000-01-02   Jan        NA 2000-02-02           NA
 4: 2000-01-02  2    20 2000-01-02   Jan        NA 2000-02-02           NA
 5: 2000-01-03  1    10 2000-01-03   Jan        NA 2000-02-03           NA
 6: 2000-01-03  2    20 2000-01-03   Jan        NA 2000-02-03           NA
 7: 2000-01-04  2    20 2000-01-04   Jan        NA 2000-02-04           NA
 8: 2000-02-01  1    30 2000-02-01   Feb        10 2000-03-01           10
 9: 2000-02-01  2    40 2000-02-01   Feb        10 2000-03-01           20
10: 2000-02-02  1    30 2000-02-02   Feb        10 2000-03-02           10
11: 2000-02-03  2    40 2000-02-03   Feb        20 2000-03-03           20
12: 2000-02-04  2    40 2000-02-04   Feb        20 2000-03-04           20
13: 2000-03-01  1    50 2000-03-01   Mar        20 2000-04-01           30
14: 2000-03-01  2    60 2000-03-01   Mar        30 2000-04-01           40
15: 2000-03-02  1    50 2000-03-02   Mar        20 2000-04-02           30
16: 2000-03-03  2    60 2000-03-03   Mar        30 2000-04-03           40
> 
库(magrittr)
DT[,日期:=截止日期(日期)]
DT[,DATE\u lag:=sapply(日期,函数(x)
seq(x,by=“1个月”,length.out=2)[2])%>%
截止日期(origin=“1970-01-01”)]
DT2
#用pacman取代你的妙用,你会感谢我的
#pacman在需要时安装、加载,并且不需要引号
pacman::p_负载(data.table,lubridate)

DT
pacman::p_load(data.table,lubridate)#用pacman替换你的有用功能,你会感谢我的
。我欢迎你的问题,并将尝试一下,但作为记录,代码审查是SE致力于改进工作代码。因此,更多的是针对损坏的代码。您确定这在运行时不是有效的吗?代码可能很冗长,但我没有看到太多的by参数。
# Replace your sapply usage with pacman and you'll thank me
#   pacman installs if needed, loads, and doesn't require quotation marks
pacman::p_load(data.table, lubridate) 

DT <- fread('DATE, ID, Cells
            2000-01-01, 1, 10
            2000-01-02, 1, 10
            2000-01-03, 1, 10
            2000-01-01, 2, 20
            2000-01-02, 2, 20
            2000-01-03, 2, 20
            2000-01-04, 2, 20
            2000-02-01, 1, 30
            2000-02-02, 1, 30
            2000-02-01, 2, 40
            2000-02-03, 2, 40
            2000-02-04, 2, 40
            2000-03-01, 1, 50
            2000-03-02, 1, 50
            2000-03-01, 2, 60
            2000-03-03, 2, 60
            ')
DT$date      <- ymd(DT$DATE)
DT$month     <- format((DT$date), "%b")
lag.cells    <- as.vector(capture.output(cat(rep("NA", length(DT$month[DT$month == "Jan"])), DT$Cells)))
lag.cells    <- strsplit(lag.cells, "\\s+")[[1]]
lag.cells    <- lag.cells[1:nrow(DT)]
DT$lag.cells <- lag.cells
DT

          DATE ID Cells       date month lag.cells
 1: 2000-01-01  1    10 2000-01-01   Jan        NA
 2: 2000-01-02  1    10 2000-01-02   Jan        NA
 3: 2000-01-03  1    10 2000-01-03   Jan        NA
 4: 2000-01-01  2    20 2000-01-01   Jan        NA
 5: 2000-01-02  2    20 2000-01-02   Jan        NA
 6: 2000-01-03  2    20 2000-01-03   Jan        NA
 7: 2000-01-04  2    20 2000-01-04   Jan        NA
 8: 2000-02-01  1    30 2000-02-01   Feb        10
 9: 2000-02-02  1    30 2000-02-02   Feb        10
10: 2000-02-01  2    40 2000-02-01   Feb        10
11: 2000-02-03  2    40 2000-02-03   Feb        20
12: 2000-02-04  2    40 2000-02-04   Feb        20
13: 2000-03-01  1    50 2000-03-01   Mar        20
14: 2000-03-02  1    50 2000-03-02   Mar        20
15: 2000-03-01  2    60 2000-03-01   Mar        30
16: 2000-03-03  2    60 2000-03-03   Mar        30