在dplyr中添加新的分组变量_R_Dplyr

在dplyr中添加新的分组变量

在dplyr中添加新的分组变量,r,dplyr,R,Dplyr,但是这给了我很多重复的行理想的结果应该是： tbl %>% group_by(Effective_Date) %>% mutate(Gender = 'Female',Location='All',freq_all = mean(freq)) %>% bind_rows(female,.) %>% ungroup() %>% arrange(Effective_Date) #一个tible:42 x 5 有效日期性别地点n频率 1

但是这给了我很多重复的行

理想的结果应该是：

tbl %>% 
  group_by(Effective_Date) %>% 
  mutate(Gender = 'Female',Location='All',freq_all = mean(freq)) %>% 
  bind_rows(female,.) %>% 
  ungroup() %>% 
  arrange(Effective_Date)

#一个tible:42 x 5
有效日期性别地点n频率
1 2017-01-01印度女性2810.351
2 2017-01-01美国女性2446 0.542
3 2017-01-01女性全纳0.447
4等

这将适用于您提供的特定示例：

# A tibble: 42 x 5 Effective_Date Gender Location n freq <date> <chr> <chr> <int> <dbl> 1 2017-01-01 Female India 281 0.351 2 2017-01-01 Female US 2446 0.542 3 2017-01-01 Female All NA 0.447 4 etc etc etc etc

data.table中有一个用于此的函数：

df = read.table(text = " Effective_Date Gender Location n freq 1 2017-01-01 Female India 281 0.351 2 2017-01-01 Female US 2446 0.542 3 2017-02-01 Female India 285 0.349 4 2017-02-01 Female US 2494 0.543 5 2017-01-01 Male India 556 0.386 6 2017-01-01 Male US 1123 0.668 7 2017-02-01 Male India 449 0.389 8 2017-02-01 Male US 2237 0.511 ", header=T) library(dplyr) df %>% group_by(Effective_Date, Gender) %>% summarise(freq = mean(freq)) %>% ungroup() %>% mutate(Location = "all", n = NA) %>% bind_rows(df) %>% arrange(Effective_Date, Gender) # # A tibble: 12 x 5 # Effective_Date Gender freq Location n # <fct> <fct> <dbl> <chr> <int> # 1 2017-01-01 Female 0.446 all NA # 2 2017-01-01 Female 0.351 India 281 # 3 2017-01-01 Female 0.542 US 2446 # 4 2017-01-01 Male 0.527 all NA # 5 2017-01-01 Male 0.386 India 556 # 6 2017-01-01 Male 0.668 US 1123 # 7 2017-02-01 Female 0.446 all NA # 8 2017-02-01 Female 0.349 India 285 # 9 2017-02-01 Female 0.543 US 2494 #10 2017-02-01 Male 0.45 all NA #11 2017-02-01 Male 0.389 India 449 #12 2017-02-01 Male 0.511 US 2237

mutate
是一个用于设置值的命令，因此当您执行
mutate（…，Location='All'）
时，您正在将所有
位置
更改为'All'。在所需的结果中，每个单独的位置都有行。因此，不要更新
mutate
中的位置列，而是将其添加到
groupby
，因为似乎您需要
生效日期和位置的平均频率。类似地，如果您有非女性的Gender ，您可能不想将它们全部更改为“女性” ，因此不要使用mutate（Gender=“female”）。也许你也想按性别分组？另一方面。。。。。考虑一下，您是要使用n 来计算freq 的平均值，还是要使用n 来计算freq的加权平均值。 df = read.table(text = " Effective_Date Gender Location n freq 1 2017-01-01 Female India 281 0.351 2 2017-01-01 Female US 2446 0.542 3 2017-02-01 Female India 285 0.349 4 2017-02-01 Female US 2494 0.543 ", header=T) library(dplyr) df %>% group_by(Effective_Date) %>% summarise(freq = mean(freq)) %>% mutate(Gender = "Female", Location = "all", n = NA) %>% bind_rows(df) %>% arrange(Effective_Date) # # A tibble: 6 x 5 # Effective_Date Gender Location n freq # <fct> <chr> <chr> <int> <dbl> # 1 2017-01-01 Female all NA 0.446 # 2 2017-01-01 Female India 281 0.351 # 3 2017-01-01 Female US 2446 0.542 # 4 2017-02-01 Female all NA 0.446 # 5 2017-02-01 Female India 285 0.349 # 6 2017-02-01 Female US 2494 0.543 df = read.table(text = " Effective_Date Gender Location n freq 1 2017-01-01 Female India 281 0.351 2 2017-01-01 Female US 2446 0.542 3 2017-02-01 Female India 285 0.349 4 2017-02-01 Female US 2494 0.543 5 2017-01-01 Male India 556 0.386 6 2017-01-01 Male US 1123 0.668 7 2017-02-01 Male India 449 0.389 8 2017-02-01 Male US 2237 0.511 ", header=T) library(dplyr) df %>% group_by(Effective_Date, Gender) %>% summarise(freq = mean(freq)) %>% ungroup() %>% mutate(Location = "all", n = NA) %>% bind_rows(df) %>% arrange(Effective_Date, Gender) # # A tibble: 12 x 5 # Effective_Date Gender freq Location n # <fct> <fct> <dbl> <chr> <int> # 1 2017-01-01 Female 0.446 all NA # 2 2017-01-01 Female 0.351 India 281 # 3 2017-01-01 Female 0.542 US 2446 # 4 2017-01-01 Male 0.527 all NA # 5 2017-01-01 Male 0.386 India 556 # 6 2017-01-01 Male 0.668 US 1123 # 7 2017-02-01 Female 0.446 all NA # 8 2017-02-01 Female 0.349 India 285 # 9 2017-02-01 Female 0.543 US 2494 #10 2017-02-01 Male 0.45 all NA #11 2017-02-01 Male 0.389 India 449 #12 2017-02-01 Male 0.511 US 2237 library(data.table) setDT(df) res = groupingsets(df, by=c("Effective_Date", "Gender", "Location"), sets=list( c("Effective_Date", "Gender"), c("Effective_Date", "Gender", "Location") ), j = .(n = sum(n), freq = mean(freq)) )[order(Effective_Date, Gender, Location, na.last=TRUE)] Effective_Date Gender Location n freq 1: 2017-01-01 Female India 281 0.3510 2: 2017-01-01 Female US 2446 0.5420 3: 2017-01-01 Female <NA> 2727 0.4465 4: 2017-02-01 Female India 285 0.3490 5: 2017-02-01 Female US 2494 0.5430 6: 2017-02-01 Female <NA> 2779 0.4460 myby = c("Effective_Date", "Gender", "Location") groupingsets(df, j = .(n = sum(n), freq = mean(freq)), by=myby, sets=list(myby, head(myby, -1)) )[, setorderv(.SD, myby, na.last=TRUE)]