R:在一个数据帧中通过多个分组条件计算总数的百分比

R:在一个数据帧中通过多个分组条件计算总数的百分比,r,dataframe,aggregate,R,Dataframe,Aggregate,我有一个数据帧“calls1”,我想知道如何创建一个新变量“PercCallsMo”,它是每个呼叫队列“queue”在给定月份“MON1_12”中表示的来自“callshandle”变量的呼叫总数的百分比。我的示例数据文件如下: structure(list(MON1_12 = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), QUEUE = structure(c(1L, 1

我有一个数据帧“calls1”,我想知道如何创建一个新变量“PercCallsMo”,它是每个呼叫队列“queue”在给定月份“MON1_12”中表示的来自“callshandle”变量的呼叫总数的百分比。我的示例数据文件如下:

structure(list(MON1_12 = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 
2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), QUEUE = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L), .Label = c("APPLICATION_STATUS", "BENEFITS", "BILLING"
), class = "factor"), CallsHandled = c(9L, 3L, 10L, 27L, 64L, 
17L, 10L, 58L, 8L, 29L, 32L, 12L, 2L, 6L, 1L, 3L, 2L, 2L, 2L, 
2L)), .Names = c("MON1_12", "QUEUE", "CallsHandled"), class = "data.frame", row.names = c(NA, 
-20L))
我期望的结果是在每个月“MON1_12”的连续行上显示由每个“队列”表示的“PercCallsMo”,如下所示:

structure(list(MON1_12 = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 
2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), QUEUE = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L), .Label = c("APPLICATION_STATUS", "BENEFITS", "BILLING"
), class = "factor"), CallsHandled = c(9L, 3L, 10L, 27L, 64L, 
17L, 10L, 58L, 8L, 29L, 32L, 12L, 2L, 6L, 1L, 3L, 2L, 2L, 2L, 
2L), PercCallsMo = c(0.362962963, 0.362962963, 0.362962963, 0.362962963, 
0.554878049, 0.554878049, 0.554878049, 0.488888889, 0.488888889, 
0.37195122, 0.37195122, 0.148148148, 0.148148148, 0.148148148, 
0.073170732, 0.073170732, 0.073170732, 0.073170732, 0.073170732, 
0.073170732)), .Names = c("MON1_12", "QUEUE", "CallsHandled", 
"PercCallsMo"), class = "data.frame", row.names = c(NA, -20L))

您可以这样做:

library(dplyr)

calls1 = calls1 %>%
  group_by(MON1_12) %>%
  mutate(month_total = sum(CallsHandled)) %>%
  group_by(MON1_12, QUEUE) %>%
  mutate(PercCallsMo = sum(CallsHandled)/month_total) %>%
  select(-month_total)
使用基数R

percent <- merge(aggregate(calls1["CallsHandled"],calls1["MON1_12"], sum), 
                 aggregate(calls1["CallsHandled"], calls1[c("MON1_12","QUEUE")], sum),
                 by = "MON1_12")
percent[["PercCallsMo"]] <- percent[["CallsHandled.y"]] / percent[["CallsHandled.x"]]
merge(calls1, percent[c("MON1_12", "QUEUE", "PercCallsMo")])

percent这会产生百分比,但是有没有一种方法可以在不改变数据帧大小的情况下实现呢?我需要能够保留数据帧中的所有行,只需在同一个月(MON1_12)和队列中重复“PercCallsMo”值。最后一行
合并(calls1,percent[c”(“MON1_12”,“QUEUE”,“PercCallsMo”))
应返回原始数据帧中的所有行。如果没有,则将
all.x=TRUE
添加到
merge
调用以获取所有行。这很奇怪。屏幕上的打印输出显示了整个数据帧,但它在环境中创建的文件仍保留摘要。知道为什么会发生这种情况吗?我不知道你是如何保存数据的,但最后一行只是进行合并,而不是将其保存为变量。您可能需要将结果保存在变量中。