R 计算计数类别内计数的比例+；依赖于其他分类变量_R_Dplyr_Data Manipulation

R 计算计数类别内计数的比例+；依赖于其他分类变量

R 计算计数类别内计数的比例+；依赖于其他分类变量,r,dplyr,data-manipulation,R,Dplyr,Data Manipulation,我希望这个令人费解的标题是有意义的，但我的问题不是那么容易理解玩具数据集列出了客户访问以及客户豁免状态和访问类型： df <- structure(list(Customer = structure(c(8L, 2L, 5L, 4L, 4L, 1L, 1L, 6L, 6L, 7L, 7L, 7L, 3L, 3L, 3L), .Label = c("Aaron", "Elizabeth", "Frank", "John", "Mary", "Pam", "Rob", "Sam"), c

我希望这个令人费解的标题是有意义的，但我的问题不是那么容易理解

玩具数据集列出了客户访问以及客户豁免状态和访问类型：

df <- structure(list(Customer = structure(c(8L, 2L, 5L, 4L, 4L, 1L, 
1L, 6L, 6L, 7L, 7L, 7L, 3L, 3L, 3L), .Label = c("Aaron", "Elizabeth", 
"Frank", "John", "Mary", "Pam", "Rob", "Sam"), class = "factor"), 
    Exemption = structure(c(2L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 
    2L, 2L, 2L, 1L, 1L, 1L), .Label = c("Exempt", "Non-exempt"
    ), class = "factor"), Type = structure(c(1L, 1L, 2L, 1L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L), .Label = c("Type 1", 
    "Type 2"), class = "factor")), .Names = c("Customer", "Exemption", 
"Type"), class = "data.frame", row.names = c(NA, -15L))

    Customer  Exemption   Type
1        Sam Non-exempt Type 1
2  Elizabeth     Exempt Type 1
3       Mary     Exempt Type 2
4       John Non-exempt Type 1
5       John Non-exempt Type 2
6      Aaron Non-exempt Type 2
7      Aaron Non-exempt Type 2
8        Pam     Exempt Type 2
9        Pam     Exempt Type 2
10       Rob Non-exempt Type 2
11       Rob Non-exempt Type 2
12       Rob Non-exempt Type 1
13     Frank     Exempt Type 1
14     Frank     Exempt Type 1
15     Frank     Exempt Type 2

我用

dplyr

尝试了一些使用

groupby（Customer，Type）%%>%summary（n（））

的方法，但似乎不正确。

你可以使用

count

from

dplyr

来计算

豁免和Type
按访问次数分组的次数：
library(dplyr)
library(tidyr)
res <- df %>% group_by(Customer) %>% 
              mutate(Number_of_visits=n()) %>% 
              group_by(Number_of_visits) %>% 
              count(Exemption, Type) %>%
              complete(Type, fill=list(n=0)) %>%
              group_by(Number_of_visits,Exemption) %>% 
              mutate(Proportion=n/sum(n))

客户
在预期输出中去了哪里？@mtoto：在预期输出中，访问次数将客户折叠为计数，例如，当访问次数=1时，这对应于列表中的三个客户（Sam、Elizabeth、Mary）。其中一个是非豁免的1型，因此相应的比例=1。另外两个是免税的，一个是1型，一个是2型，所以比例各为0.5。
library(dplyr)
library(tidyr)
res <- df %>% group_by(Customer) %>% 
              mutate(Number_of_visits=n()) %>% 
              group_by(Number_of_visits) %>% 
              count(Exemption, Type) %>%
              complete(Type, fill=list(n=0)) %>%
              group_by(Number_of_visits,Exemption) %>% 
              mutate(Proportion=n/sum(n))

print(res)
##Source: local data frame [12 x 5]
##Groups: Number_of_visits, Exemption [6]
##
##   Number_of_visits  Exemption   Type     n Proportion
##              <int>     <fctr> <fctr> <dbl>      <dbl>
##1                 1     Exempt Type 1     1  0.5000000
##2                 1     Exempt Type 2     1  0.5000000
##3                 1 Non-exempt Type 1     1  1.0000000
##4                 1 Non-exempt Type 2     0  0.0000000
##5                 2     Exempt Type 1     0  0.0000000
##6                 2     Exempt Type 2     2  1.0000000
##7                 2 Non-exempt Type 1     1  0.2500000
##8                 2 Non-exempt Type 2     3  0.7500000
##9                 3     Exempt Type 1     2  0.6666667
##10                3     Exempt Type 2     1  0.3333333
##11                3 Non-exempt Type 1     1  0.3333333
##12                3 Non-exempt Type 2     2  0.6666667