使用R在多个时间范围内求和
我有两个数据帧,x和y。数据框x有一系列日期,而数据框y有单独的日期。我想得到数据帧x中时间范围内各个日期值的总和。因此,id a将具有2019/1/1至2019/3/1的所有值之和使用R在多个时间范围内求和,r,R,我有两个数据帧,x和y。数据框x有一系列日期,而数据框y有单独的日期。我想得到数据帧x中时间范围内各个日期值的总和。因此,id a将具有2019/1/1至2019/3/1的所有值之和 id <- c("a","b","c") start_date <- as.Date(c("2019/1/1", "2019/2/1", "2019/3/1")) end_date <- as.Date(c("2019/3/1", "2019/4/1", "2019/5/1")) x <-
id <- c("a","b","c")
start_date <- as.Date(c("2019/1/1", "2019/2/1", "2019/3/1"))
end_date <- as.Date(c("2019/3/1", "2019/4/1", "2019/5/1"))
x <- data.frame(id, start_date, end_date)
dates <- seq(as.Date("2019/1/1"),as.Date("2019/5/1"),1)
values <- runif(121, min=0, max=7)
y <- data.frame(dates, values)
有很多方法可以做到这一点。一种可能性是:
library(data.table)
x <- setDT(x)
# create a complete series for each id
x <- x[, .(dates = seq(start_date, end_date, 1)), by=id]
# merge the data
m <- merge(x, y, by="dates")
# get the sums
m[, .(sum = sum(values)), by=id]
id sum
1: a 196.0311
2: b 185.6970
3: c 173.6429
一个基本R选项使用
apply
x$sum <- apply(x, 1, function(v) sum(subset(y,dates >= v["start_date"] & dates<=v["end_date"])$values))
数据
set.seed(1234)
id <- c("a","b","c")
start_date <- as.Date(c("2019/1/1", "2019/2/1", "2019/3/1"))
end_date <- as.Date(c("2019/3/1", "2019/4/1", "2019/5/1"))
x <- data.frame(id, start_date, end_date)
dates <- seq(as.Date("2019/1/1"),as.Date("2019/5/1"),1)
values <- runif(121, min=0, max=7)
y <- data.frame(dates, values)
set.seed(1234)
谢谢你,这真有效!我使用随机值只是为了重现示例。您可能想检查data.table vignette中的非等联接
x$sum <- apply(x, 1, function(v) sum(subset(y,dates >= v["start_date"] & dates<=v["end_date"])$values))
> x
id start_date end_date sum
1 a 2019-01-01 2019-03-01 196.0311
2 b 2019-02-01 2019-04-01 185.6970
3 c 2019-03-01 2019-05-01 173.6429
set.seed(1234)
id <- c("a","b","c")
start_date <- as.Date(c("2019/1/1", "2019/2/1", "2019/3/1"))
end_date <- as.Date(c("2019/3/1", "2019/4/1", "2019/5/1"))
x <- data.frame(id, start_date, end_date)
dates <- seq(as.Date("2019/1/1"),as.Date("2019/5/1"),1)
values <- runif(121, min=0, max=7)
y <- data.frame(dates, values)