R 分组ID列表上的分配函数
我有一个带有id、开始日期、结束日期以及收入和成本值的数据框R 分组ID列表上的分配函数,r,R,我有一个带有id、开始日期、结束日期以及收入和成本值的数据框 table <- data.frame(id = c(1, 2, 3), start = c("2018-01-01", "2018-02-05", "2018-05-30"), end = c("2018-01-31", "2018-03-26", "2018-08-31"), income = c(100, 225, 399),
table <- data.frame(id = c(1, 2, 3),
start = c("2018-01-01", "2018-02-05", "2018-05-30"),
end = c("2018-01-31", "2018-03-26", "2018-08-31"),
income = c(100, 225, 399),
costs = c(37, 98, 113))
table$start <- as.Date(table$start)
table$end <- as.Date(table$end)
就像这样,其中一些时间段跨越n个月,我想按月汇总收入和成本。对于与跨越两个月、三个月或更多个月的期间相关的金额,我希望在两个月、三个月或n个月之间线性分配
问题是,我还想保留id,并对两个变量执行操作(不像前面所问的问题那样是一个变量),这会使整个事情复杂化
我希望得到的是下表:
id date income costs
1 2018-01 100 37
2 2018-02 108 47.04
2 2018-03 117 50.96
3 2018-05 8.489362 2.404255
3 2018-06 127.340426 36.063830
3 2018-07 131.585106 37.265957
3 2018-08 131.585106 37.265957
我尝试在id创建的数据帧列表上使用rbindlist,并使用以下函数:
explode <- function(start, end, income) {
dates <- seq(start, end, "day")
n <- length(dates)
rowsum(rep(income, n) / n, format(dates, "%Y-%m"))
}
Map(explode, table$start, table$end, table$income)
explode我会选择数据。表
:
library(data.table)
table_aggregated <- setDT(table)[
, .(id = id, income = income, costs = costs, day_var = seq(start, end, "day")), by = 1:nrow(table)][
, `:=` (income_day = income / .N,
costs_day = costs / .N,
date = format(day_var, "%Y-%m")), by = id][
, .(income = sum(income_day),
costs = sum(costs_day)), by = .(id, date)]
你的解决方案本可以奏效。简单地说,向Map
添加一个新参数,并使用cbind
扩展您的函数,将收入和成本结合起来,然后rbind
从Map
生成的列表:
explode <- function(start, end, income, costs) {
dates <- seq(start, end, "day")
n <- length(dates)
cbind.data.frame(
date = format(start, "%Y-%m"),
income = rowsum(rep(income, n) / n, format(dates, "%Y-%m")),
costs = rowsum(rep(costs, n) / n, format(dates, "%Y-%m"))
)
}
data_list <- Map(explode, table$start, table$end, table$income, table$costs)
final_df <- do.call(rbind, data_list)
final_df
# date income costs
# 2018-01 100.000000 37.000000
# 2018-02 108.000000 47.040000
# 2018-03 117.000000 50.960000
# 2018-05 8.489362 2.404255
# 2018-06 127.340426 36.063830
# 2018-07 131.585106 37.265957
# 2018-08 131.585106 37.265957
explode我不明白接下来几个月的收入、成本计算。你如何得到收入=108,以此类推?它将原始价值的比例部分分配给2月份。换句话说,首先你必须计算起止日期之间的每日收入,然后乘以每月的天数。太简单了!我不敢相信我错过了设置数据帧的cbind解决方案。谢谢
id date income costs
1: 1 2018-01 100.000000 37.000000
2: 2 2018-02 108.000000 47.040000
3: 2 2018-03 117.000000 50.960000
4: 3 2018-05 8.489362 2.404255
5: 3 2018-06 127.340426 36.063830
6: 3 2018-07 131.585106 37.265957
7: 3 2018-08 131.585106 37.265957
explode <- function(start, end, income, costs) {
dates <- seq(start, end, "day")
n <- length(dates)
cbind.data.frame(
date = format(start, "%Y-%m"),
income = rowsum(rep(income, n) / n, format(dates, "%Y-%m")),
costs = rowsum(rep(costs, n) / n, format(dates, "%Y-%m"))
)
}
data_list <- Map(explode, table$start, table$end, table$income, table$costs)
final_df <- do.call(rbind, data_list)
final_df
# date income costs
# 2018-01 100.000000 37.000000
# 2018-02 108.000000 47.040000
# 2018-03 117.000000 50.960000
# 2018-05 8.489362 2.404255
# 2018-06 127.340426 36.063830
# 2018-07 131.585106 37.265957
# 2018-08 131.585106 37.265957