R 当每个月的观察次数不同时,如何将一个值延迟一个月?
我有多个日期的数据集。我想将R 当每个月的观察次数不同时,如何将一个值延迟一个月?,r,data.table,R,Data.table,我有多个日期的数据集。我想将单元格的值延迟一个月。我可能无法使用shift(),因为每个月都有不同的天数(更不用说还有一些缺失的日期) 我要做的是创建一个具有唯一的年和月的新数据表,对单元格进行移位/延迟,然后将其与原始数据表合并(注意不要有重复的列) 这显然是没有效率的。还有别的办法吗 sapply(c('data.table', 'lubridate'), require, character.only = TRUE) DT <- fread('DATE, ID, Cells 200
单元格的值延迟一个月。我可能无法使用shift()
,因为每个月都有不同的天数(更不用说还有一些缺失的日期)
我要做的是创建一个具有唯一的年
和月
的新数据表,对单元格进行移位/延迟
,然后将其与原始数据表合并(注意不要有重复的列)
这显然是没有效率的。还有别的办法吗
sapply(c('data.table', 'lubridate'), require, character.only = TRUE)
DT <- fread('DATE, ID, Cells
2000-01-01, 1, 10
2000-01-02, 1, 10
2000-01-03, 1, 10
2000-01-01, 2, 20
2000-01-02, 2, 20
2000-01-03, 2, 20
2000-01-04, 2, 20
2000-02-01, 1, 30
2000-02-02, 1, 30
2000-02-01, 2, 40
2000-02-03, 2, 40
2000-02-04, 2, 40
2000-03-01, 1, 50
2000-03-02, 1, 50
2000-03-01, 2, 60
2000-03-03, 2, 60
')
DT[, date := as.Date(DATE, format = '%Y-%m-%d')][,
c('Year', 'Month') := .(year(date), month(date))]
setkey(DT, Year, Month, ID)
DT.Months <- DT[which(!duplicated(DT))][,
.(Year, Month, ID, Cells)]
DT.Months[, `:=`(Lagged.Cells =
shift(Cells, 1L, type = 'lag')), by = .(ID)]
DT <- DT[DT.Months][, `:=`(i.Cells, NULL)]
# > DT # This is what I want.
# The Value in Cells is lagged by one month,
# regardless of the number of observations within a month for each ID.
# DATE ID Cells date Year Month Lagged.Cells
# 1: 2000-01-01 1 10 2000-01-01 2000 1 NA
# 2: 2000-01-02 1 10 2000-01-02 2000 1 NA
# 3: 2000-01-03 1 10 2000-01-03 2000 1 NA
# 4: 2000-01-01 2 20 2000-01-01 2000 1 NA
# 5: 2000-01-02 2 20 2000-01-02 2000 1 NA
# 6: 2000-01-03 2 20 2000-01-03 2000 1 NA
# 7: 2000-01-04 2 20 2000-01-04 2000 1 NA
# 8: 2000-02-01 1 30 2000-02-01 2000 2 10
# 9: 2000-02-02 1 30 2000-02-02 2000 2 10
#10: 2000-02-01 2 40 2000-02-01 2000 2 10
#11: 2000-02-03 2 40 2000-02-03 2000 2 20
#12: 2000-02-04 2 40 2000-02-04 2000 2 20
#13: 2000-03-01 1 50 2000-03-01 2000 3 20
#14: 2000-03-02 1 50 2000-03-02 2000 3 20
#15: 2000-03-01 2 60 2000-03-01 2000 3 30
#16: 2000-03-03 2 60 2000-03-03 2000 3 30
sapply(c('data.table','lubridate'),require,character.only=TRUE)
DTDate
类通过支持seq
“月”
、季度”
、年”
等。
不是很优雅,但你可以这样做
library(magrittr)
DT[, DATE := as.Date(DATE)]
DT[, DATE_lag := sapply(DATE, function(x)
seq(x, by = "1 month", length.out = 2)[2]) %>%
as.Date(origin = "1970-01-01")]
DT2 <- DT[, .(DATE_lag, ID, Cells)]
setnames(DT2, c("DATE_lag", "Cells"), c("DATE", "Lagged.Cells"))
merge(DT, DT2, by = c("DATE", "ID"), all.x = TRUE)
DATE ID Cells date month lag.cells DATE_lag Lagged.Cells
1: 2000-01-01 1 10 2000-01-01 Jan NA 2000-02-01 NA
2: 2000-01-01 2 20 2000-01-01 Jan NA 2000-02-01 NA
3: 2000-01-02 1 10 2000-01-02 Jan NA 2000-02-02 NA
4: 2000-01-02 2 20 2000-01-02 Jan NA 2000-02-02 NA
5: 2000-01-03 1 10 2000-01-03 Jan NA 2000-02-03 NA
6: 2000-01-03 2 20 2000-01-03 Jan NA 2000-02-03 NA
7: 2000-01-04 2 20 2000-01-04 Jan NA 2000-02-04 NA
8: 2000-02-01 1 30 2000-02-01 Feb 10 2000-03-01 10
9: 2000-02-01 2 40 2000-02-01 Feb 10 2000-03-01 20
10: 2000-02-02 1 30 2000-02-02 Feb 10 2000-03-02 10
11: 2000-02-03 2 40 2000-02-03 Feb 20 2000-03-03 20
12: 2000-02-04 2 40 2000-02-04 Feb 20 2000-03-04 20
13: 2000-03-01 1 50 2000-03-01 Mar 20 2000-04-01 30
14: 2000-03-01 2 60 2000-03-01 Mar 30 2000-04-01 40
15: 2000-03-02 1 50 2000-03-02 Mar 20 2000-04-02 30
16: 2000-03-03 2 60 2000-03-03 Mar 30 2000-04-03 40
>
库(magrittr)
DT[,日期:=截止日期(日期)]
DT[,DATE\u lag:=sapply(日期,函数(x)
seq(x,by=“1个月”,length.out=2)[2])%>%
截止日期(origin=“1970-01-01”)]
DT2
#用pacman取代你的妙用,你会感谢我的
#pacman在需要时安装、加载,并且不需要引号
pacman::p_负载(data.table,lubridate)
DTpacman::p_load(data.table,lubridate)#用pacman替换你的有用功能,你会感谢我的
。我欢迎你的问题,并将尝试一下,但作为记录,代码审查是SE致力于改进工作代码。因此,更多的是针对损坏的代码。您确定这在运行时不是有效的吗?代码可能很冗长,但我没有看到太多的by参数。
# Replace your sapply usage with pacman and you'll thank me
# pacman installs if needed, loads, and doesn't require quotation marks
pacman::p_load(data.table, lubridate)
DT <- fread('DATE, ID, Cells
2000-01-01, 1, 10
2000-01-02, 1, 10
2000-01-03, 1, 10
2000-01-01, 2, 20
2000-01-02, 2, 20
2000-01-03, 2, 20
2000-01-04, 2, 20
2000-02-01, 1, 30
2000-02-02, 1, 30
2000-02-01, 2, 40
2000-02-03, 2, 40
2000-02-04, 2, 40
2000-03-01, 1, 50
2000-03-02, 1, 50
2000-03-01, 2, 60
2000-03-03, 2, 60
')
DT$date <- ymd(DT$DATE)
DT$month <- format((DT$date), "%b")
lag.cells <- as.vector(capture.output(cat(rep("NA", length(DT$month[DT$month == "Jan"])), DT$Cells)))
lag.cells <- strsplit(lag.cells, "\\s+")[[1]]
lag.cells <- lag.cells[1:nrow(DT)]
DT$lag.cells <- lag.cells
DT
DATE ID Cells date month lag.cells
1: 2000-01-01 1 10 2000-01-01 Jan NA
2: 2000-01-02 1 10 2000-01-02 Jan NA
3: 2000-01-03 1 10 2000-01-03 Jan NA
4: 2000-01-01 2 20 2000-01-01 Jan NA
5: 2000-01-02 2 20 2000-01-02 Jan NA
6: 2000-01-03 2 20 2000-01-03 Jan NA
7: 2000-01-04 2 20 2000-01-04 Jan NA
8: 2000-02-01 1 30 2000-02-01 Feb 10
9: 2000-02-02 1 30 2000-02-02 Feb 10
10: 2000-02-01 2 40 2000-02-01 Feb 10
11: 2000-02-03 2 40 2000-02-03 Feb 20
12: 2000-02-04 2 40 2000-02-04 Feb 20
13: 2000-03-01 1 50 2000-03-01 Mar 20
14: 2000-03-02 1 50 2000-03-02 Mar 20
15: 2000-03-01 2 60 2000-03-01 Mar 30
16: 2000-03-03 2 60 2000-03-03 Mar 30