R 计算计数类别内计数的比例+;依赖于其他分类变量
我希望这个令人费解的标题是有意义的,但我的问题不是那么容易理解 玩具数据集列出了客户访问以及客户豁免状态和访问类型:R 计算计数类别内计数的比例+;依赖于其他分类变量,r,dplyr,data-manipulation,R,Dplyr,Data Manipulation,我希望这个令人费解的标题是有意义的,但我的问题不是那么容易理解 玩具数据集列出了客户访问以及客户豁免状态和访问类型: df <- structure(list(Customer = structure(c(8L, 2L, 5L, 4L, 4L, 1L, 1L, 6L, 6L, 7L, 7L, 7L, 3L, 3L, 3L), .Label = c("Aaron", "Elizabeth", "Frank", "John", "Mary", "Pam", "Rob", "Sam"), c
df <- structure(list(Customer = structure(c(8L, 2L, 5L, 4L, 4L, 1L,
1L, 6L, 6L, 7L, 7L, 7L, 3L, 3L, 3L), .Label = c("Aaron", "Elizabeth",
"Frank", "John", "Mary", "Pam", "Rob", "Sam"), class = "factor"),
Exemption = structure(c(2L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L,
2L, 2L, 2L, 1L, 1L, 1L), .Label = c("Exempt", "Non-exempt"
), class = "factor"), Type = structure(c(1L, 1L, 2L, 1L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L), .Label = c("Type 1",
"Type 2"), class = "factor")), .Names = c("Customer", "Exemption",
"Type"), class = "data.frame", row.names = c(NA, -15L))
Customer Exemption Type
1 Sam Non-exempt Type 1
2 Elizabeth Exempt Type 1
3 Mary Exempt Type 2
4 John Non-exempt Type 1
5 John Non-exempt Type 2
6 Aaron Non-exempt Type 2
7 Aaron Non-exempt Type 2
8 Pam Exempt Type 2
9 Pam Exempt Type 2
10 Rob Non-exempt Type 2
11 Rob Non-exempt Type 2
12 Rob Non-exempt Type 1
13 Frank Exempt Type 1
14 Frank Exempt Type 1
15 Frank Exempt Type 2
我用
dplyr
尝试了一些使用groupby(Customer,Type)%%>%summary(n())
的方法,但似乎不正确。你可以使用count
fromdplyr
来计算豁免和Type
按访问次数分组的次数:
library(dplyr)
library(tidyr)
res <- df %>% group_by(Customer) %>%
mutate(Number_of_visits=n()) %>%
group_by(Number_of_visits) %>%
count(Exemption, Type) %>%
complete(Type, fill=list(n=0)) %>%
group_by(Number_of_visits,Exemption) %>%
mutate(Proportion=n/sum(n))
客户
在预期输出中去了哪里?@mtoto:在预期输出中,访问次数将客户折叠为计数,例如,当访问次数=1时,这对应于列表中的三个客户(Sam、Elizabeth、Mary)。其中一个是非豁免的1型,因此相应的比例=1
。另外两个是免税的,一个是1型,一个是2型,所以比例各为0.5。
library(dplyr)
library(tidyr)
res <- df %>% group_by(Customer) %>%
mutate(Number_of_visits=n()) %>%
group_by(Number_of_visits) %>%
count(Exemption, Type) %>%
complete(Type, fill=list(n=0)) %>%
group_by(Number_of_visits,Exemption) %>%
mutate(Proportion=n/sum(n))
print(res)
##Source: local data frame [12 x 5]
##Groups: Number_of_visits, Exemption [6]
##
## Number_of_visits Exemption Type n Proportion
## <int> <fctr> <fctr> <dbl> <dbl>
##1 1 Exempt Type 1 1 0.5000000
##2 1 Exempt Type 2 1 0.5000000
##3 1 Non-exempt Type 1 1 1.0000000
##4 1 Non-exempt Type 2 0 0.0000000
##5 2 Exempt Type 1 0 0.0000000
##6 2 Exempt Type 2 2 1.0000000
##7 2 Non-exempt Type 1 1 0.2500000
##8 2 Non-exempt Type 2 3 0.7500000
##9 3 Exempt Type 1 2 0.6666667
##10 3 Exempt Type 2 1 0.3333333
##11 3 Non-exempt Type 1 1 0.3333333
##12 3 Non-exempt Type 2 2 0.6666667