Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/72.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
在dplyr中添加新的分组变量_R_Dplyr - Fatal编程技术网

在dplyr中添加新的分组变量

在dplyr中添加新的分组变量,r,dplyr,R,Dplyr,但是这给了我很多重复的行 理想的结果应该是: tbl %>% group_by(Effective_Date) %>% mutate(Gender = 'Female',Location='All',freq_all = mean(freq)) %>% bind_rows(female,.) %>% ungroup() %>% arrange(Effective_Date) #一个tible:42 x 5 有效日期性别地点n频率 1

但是这给了我很多重复的行

理想的结果应该是:

tbl %>% 
  group_by(Effective_Date) %>% 
  mutate(Gender = 'Female',Location='All',freq_all = mean(freq)) %>% 
  bind_rows(female,.) %>% 
  ungroup() %>% 
  arrange(Effective_Date)
#一个tible:42 x 5
有效日期性别地点n频率
1 2017-01-01印度女性2810.351
2 2017-01-01美国女性2446 0.542
3 2017-01-01女性全纳0.447
4等

这将适用于您提供的特定示例:

 # A tibble: 42 x 5
       Effective_Date Gender Location     n  freq
       <date>         <chr>  <chr>    <int> <dbl>
     1 2017-01-01     Female India      281 0.351
     2 2017-01-01     Female US        2446 0.542
     3 2017-01-01     Female All         NA 0.447
     4 etc etc etc etc

data.table中有一个用于此的函数:

df = read.table(text = "
Effective_Date Gender Location     n  freq
1 2017-01-01     Female India      281 0.351
2 2017-01-01     Female US        2446 0.542
3 2017-02-01     Female India      285 0.349
4 2017-02-01     Female US        2494 0.543
5 2017-01-01     Male India      556 0.386
6 2017-01-01     Male US        1123 0.668
7 2017-02-01     Male India      449 0.389
8 2017-02-01     Male US        2237 0.511
", header=T)

library(dplyr)

df %>%
  group_by(Effective_Date, Gender) %>%
  summarise(freq = mean(freq)) %>%
  ungroup() %>%
  mutate(Location = "all",
         n = NA) %>%
  bind_rows(df) %>%
  arrange(Effective_Date, Gender) 

# # A tibble: 12 x 5
#   Effective_Date Gender  freq Location     n
#   <fct>          <fct>  <dbl> <chr>    <int>
# 1 2017-01-01     Female 0.446 all         NA
# 2 2017-01-01     Female 0.351 India      281
# 3 2017-01-01     Female 0.542 US        2446
# 4 2017-01-01     Male   0.527 all         NA
# 5 2017-01-01     Male   0.386 India      556
# 6 2017-01-01     Male   0.668 US        1123
# 7 2017-02-01     Female 0.446 all         NA
# 8 2017-02-01     Female 0.349 India      285
# 9 2017-02-01     Female 0.543 US        2494
#10 2017-02-01     Male   0.45  all         NA
#11 2017-02-01     Male   0.389 India      449
#12 2017-02-01     Male   0.511 US        2237

mutate
是一个用于设置值的命令,因此当您执行
mutate(…,Location='All')
时,您正在将所有
位置
更改为'All'。在所需的结果中,每个单独的位置都有行。因此,不要更新
mutate
中的位置列,而是将其添加到
groupby
,因为似乎您需要
生效日期和
位置的平均频率。类似地,如果您有非女性的
Gender
,您可能不想将它们全部更改为
“女性”
,因此不要使用
mutate(Gender=“female”)
。也许你也想按性别分组?另一方面。。。。。考虑一下,您是要使用
n
来计算
freq
的平均值,还是要使用
n
来计算
freq
的加权平均值。
df = read.table(text = "
Effective_Date Gender Location     n  freq
1 2017-01-01     Female India      281 0.351
2 2017-01-01     Female US        2446 0.542
3 2017-02-01     Female India      285 0.349
4 2017-02-01     Female US        2494 0.543
", header=T)

library(dplyr)

df %>%
  group_by(Effective_Date) %>%
  summarise(freq = mean(freq)) %>%
  mutate(Gender = "Female",
         Location = "all",
         n = NA) %>%
  bind_rows(df) %>%
  arrange(Effective_Date)

# # A tibble: 6 x 5
#   Effective_Date Gender Location     n  freq
#   <fct>          <chr>  <chr>    <int> <dbl>
# 1 2017-01-01     Female all         NA 0.446
# 2 2017-01-01     Female India      281 0.351
# 3 2017-01-01     Female US        2446 0.542
# 4 2017-02-01     Female all         NA 0.446
# 5 2017-02-01     Female India      285 0.349
# 6 2017-02-01     Female US        2494 0.543
df = read.table(text = "
Effective_Date Gender Location     n  freq
1 2017-01-01     Female India      281 0.351
2 2017-01-01     Female US        2446 0.542
3 2017-02-01     Female India      285 0.349
4 2017-02-01     Female US        2494 0.543
5 2017-01-01     Male India      556 0.386
6 2017-01-01     Male US        1123 0.668
7 2017-02-01     Male India      449 0.389
8 2017-02-01     Male US        2237 0.511
", header=T)

library(dplyr)

df %>%
  group_by(Effective_Date, Gender) %>%
  summarise(freq = mean(freq)) %>%
  ungroup() %>%
  mutate(Location = "all",
         n = NA) %>%
  bind_rows(df) %>%
  arrange(Effective_Date, Gender) 

# # A tibble: 12 x 5
#   Effective_Date Gender  freq Location     n
#   <fct>          <fct>  <dbl> <chr>    <int>
# 1 2017-01-01     Female 0.446 all         NA
# 2 2017-01-01     Female 0.351 India      281
# 3 2017-01-01     Female 0.542 US        2446
# 4 2017-01-01     Male   0.527 all         NA
# 5 2017-01-01     Male   0.386 India      556
# 6 2017-01-01     Male   0.668 US        1123
# 7 2017-02-01     Female 0.446 all         NA
# 8 2017-02-01     Female 0.349 India      285
# 9 2017-02-01     Female 0.543 US        2494
#10 2017-02-01     Male   0.45  all         NA
#11 2017-02-01     Male   0.389 India      449
#12 2017-02-01     Male   0.511 US        2237
library(data.table)
setDT(df)

res = groupingsets(df, by=c("Effective_Date", "Gender", "Location"), 
  sets=list(
    c("Effective_Date", "Gender"), 
    c("Effective_Date", "Gender", "Location")
  ), j = .(n = sum(n), freq = mean(freq))
)[order(Effective_Date, Gender, Location, na.last=TRUE)]

   Effective_Date Gender Location    n   freq
1:     2017-01-01 Female    India  281 0.3510
2:     2017-01-01 Female       US 2446 0.5420
3:     2017-01-01 Female     <NA> 2727 0.4465
4:     2017-02-01 Female    India  285 0.3490
5:     2017-02-01 Female       US 2494 0.5430
6:     2017-02-01 Female     <NA> 2779 0.4460
myby = c("Effective_Date", "Gender", "Location")
groupingsets(df, 
  j = .(n = sum(n), freq = mean(freq)), 
  by=myby, sets=list(myby, head(myby, -1))
)[, setorderv(.SD, myby, na.last=TRUE)]