R:在一个数据帧中通过多个分组条件计算总数的百分比
我有一个数据帧“calls1”,我想知道如何创建一个新变量“PercCallsMo”,它是每个呼叫队列“queue”在给定月份“MON1_12”中表示的来自“callshandle”变量的呼叫总数的百分比。我的示例数据文件如下:R:在一个数据帧中通过多个分组条件计算总数的百分比,r,dataframe,aggregate,R,Dataframe,Aggregate,我有一个数据帧“calls1”,我想知道如何创建一个新变量“PercCallsMo”,它是每个呼叫队列“queue”在给定月份“MON1_12”中表示的来自“callshandle”变量的呼叫总数的百分比。我的示例数据文件如下: structure(list(MON1_12 = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), QUEUE = structure(c(1L, 1
structure(list(MON1_12 = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L,
2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), QUEUE = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L), .Label = c("APPLICATION_STATUS", "BENEFITS", "BILLING"
), class = "factor"), CallsHandled = c(9L, 3L, 10L, 27L, 64L,
17L, 10L, 58L, 8L, 29L, 32L, 12L, 2L, 6L, 1L, 3L, 2L, 2L, 2L,
2L)), .Names = c("MON1_12", "QUEUE", "CallsHandled"), class = "data.frame", row.names = c(NA,
-20L))
我期望的结果是在每个月“MON1_12”的连续行上显示由每个“队列”表示的“PercCallsMo”,如下所示:
structure(list(MON1_12 = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L,
2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), QUEUE = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L), .Label = c("APPLICATION_STATUS", "BENEFITS", "BILLING"
), class = "factor"), CallsHandled = c(9L, 3L, 10L, 27L, 64L,
17L, 10L, 58L, 8L, 29L, 32L, 12L, 2L, 6L, 1L, 3L, 2L, 2L, 2L,
2L), PercCallsMo = c(0.362962963, 0.362962963, 0.362962963, 0.362962963,
0.554878049, 0.554878049, 0.554878049, 0.488888889, 0.488888889,
0.37195122, 0.37195122, 0.148148148, 0.148148148, 0.148148148,
0.073170732, 0.073170732, 0.073170732, 0.073170732, 0.073170732,
0.073170732)), .Names = c("MON1_12", "QUEUE", "CallsHandled",
"PercCallsMo"), class = "data.frame", row.names = c(NA, -20L))
您可以这样做:
library(dplyr)
calls1 = calls1 %>%
group_by(MON1_12) %>%
mutate(month_total = sum(CallsHandled)) %>%
group_by(MON1_12, QUEUE) %>%
mutate(PercCallsMo = sum(CallsHandled)/month_total) %>%
select(-month_total)
使用基数R
percent <- merge(aggregate(calls1["CallsHandled"],calls1["MON1_12"], sum),
aggregate(calls1["CallsHandled"], calls1[c("MON1_12","QUEUE")], sum),
by = "MON1_12")
percent[["PercCallsMo"]] <- percent[["CallsHandled.y"]] / percent[["CallsHandled.x"]]
merge(calls1, percent[c("MON1_12", "QUEUE", "PercCallsMo")])
percent这会产生百分比,但是有没有一种方法可以在不改变数据帧大小的情况下实现呢?我需要能够保留数据帧中的所有行,只需在同一个月(MON1_12)和队列中重复“PercCallsMo”值。最后一行合并(calls1,percent[c”(“MON1_12”,“QUEUE”,“PercCallsMo”))
应返回原始数据帧中的所有行。如果没有,则将all.x=TRUE
添加到merge
调用以获取所有行。这很奇怪。屏幕上的打印输出显示了整个数据帧,但它在环境中创建的文件仍保留摘要。知道为什么会发生这种情况吗?我不知道你是如何保存数据的,但最后一行只是进行合并,而不是将其保存为变量。您可能需要将结果保存在变量中。