在dplyr中添加新的分组变量
但是这给了我很多重复的行 理想的结果应该是:在dplyr中添加新的分组变量,r,dplyr,R,Dplyr,但是这给了我很多重复的行 理想的结果应该是: tbl %>% group_by(Effective_Date) %>% mutate(Gender = 'Female',Location='All',freq_all = mean(freq)) %>% bind_rows(female,.) %>% ungroup() %>% arrange(Effective_Date) #一个tible:42 x 5 有效日期性别地点n频率 1
tbl %>%
group_by(Effective_Date) %>%
mutate(Gender = 'Female',Location='All',freq_all = mean(freq)) %>%
bind_rows(female,.) %>%
ungroup() %>%
arrange(Effective_Date)
#一个tible:42 x 5
有效日期性别地点n频率
1 2017-01-01印度女性2810.351
2 2017-01-01美国女性2446 0.542
3 2017-01-01女性全纳0.447
4等
这将适用于您提供的特定示例:
# A tibble: 42 x 5
Effective_Date Gender Location n freq
<date> <chr> <chr> <int> <dbl>
1 2017-01-01 Female India 281 0.351
2 2017-01-01 Female US 2446 0.542
3 2017-01-01 Female All NA 0.447
4 etc etc etc etc
data.table中有一个用于此的函数:
df = read.table(text = "
Effective_Date Gender Location n freq
1 2017-01-01 Female India 281 0.351
2 2017-01-01 Female US 2446 0.542
3 2017-02-01 Female India 285 0.349
4 2017-02-01 Female US 2494 0.543
5 2017-01-01 Male India 556 0.386
6 2017-01-01 Male US 1123 0.668
7 2017-02-01 Male India 449 0.389
8 2017-02-01 Male US 2237 0.511
", header=T)
library(dplyr)
df %>%
group_by(Effective_Date, Gender) %>%
summarise(freq = mean(freq)) %>%
ungroup() %>%
mutate(Location = "all",
n = NA) %>%
bind_rows(df) %>%
arrange(Effective_Date, Gender)
# # A tibble: 12 x 5
# Effective_Date Gender freq Location n
# <fct> <fct> <dbl> <chr> <int>
# 1 2017-01-01 Female 0.446 all NA
# 2 2017-01-01 Female 0.351 India 281
# 3 2017-01-01 Female 0.542 US 2446
# 4 2017-01-01 Male 0.527 all NA
# 5 2017-01-01 Male 0.386 India 556
# 6 2017-01-01 Male 0.668 US 1123
# 7 2017-02-01 Female 0.446 all NA
# 8 2017-02-01 Female 0.349 India 285
# 9 2017-02-01 Female 0.543 US 2494
#10 2017-02-01 Male 0.45 all NA
#11 2017-02-01 Male 0.389 India 449
#12 2017-02-01 Male 0.511 US 2237
mutate
是一个用于设置值的命令,因此当您执行mutate(…,Location='All')
时,您正在将所有位置
更改为'All'。在所需的结果中,每个单独的位置都有行。因此,不要更新mutate
中的位置列,而是将其添加到groupby
,因为似乎您需要生效日期和位置的平均频率。类似地,如果您有非女性的Gender
,您可能不想将它们全部更改为“女性”
,因此不要使用mutate(Gender=“female”)
。也许你也想按性别分组?另一方面。。。。。考虑一下,您是要使用n
来计算freq
的平均值,还是要使用n
来计算freq
的加权平均值。
df = read.table(text = "
Effective_Date Gender Location n freq
1 2017-01-01 Female India 281 0.351
2 2017-01-01 Female US 2446 0.542
3 2017-02-01 Female India 285 0.349
4 2017-02-01 Female US 2494 0.543
", header=T)
library(dplyr)
df %>%
group_by(Effective_Date) %>%
summarise(freq = mean(freq)) %>%
mutate(Gender = "Female",
Location = "all",
n = NA) %>%
bind_rows(df) %>%
arrange(Effective_Date)
# # A tibble: 6 x 5
# Effective_Date Gender Location n freq
# <fct> <chr> <chr> <int> <dbl>
# 1 2017-01-01 Female all NA 0.446
# 2 2017-01-01 Female India 281 0.351
# 3 2017-01-01 Female US 2446 0.542
# 4 2017-02-01 Female all NA 0.446
# 5 2017-02-01 Female India 285 0.349
# 6 2017-02-01 Female US 2494 0.543
df = read.table(text = "
Effective_Date Gender Location n freq
1 2017-01-01 Female India 281 0.351
2 2017-01-01 Female US 2446 0.542
3 2017-02-01 Female India 285 0.349
4 2017-02-01 Female US 2494 0.543
5 2017-01-01 Male India 556 0.386
6 2017-01-01 Male US 1123 0.668
7 2017-02-01 Male India 449 0.389
8 2017-02-01 Male US 2237 0.511
", header=T)
library(dplyr)
df %>%
group_by(Effective_Date, Gender) %>%
summarise(freq = mean(freq)) %>%
ungroup() %>%
mutate(Location = "all",
n = NA) %>%
bind_rows(df) %>%
arrange(Effective_Date, Gender)
# # A tibble: 12 x 5
# Effective_Date Gender freq Location n
# <fct> <fct> <dbl> <chr> <int>
# 1 2017-01-01 Female 0.446 all NA
# 2 2017-01-01 Female 0.351 India 281
# 3 2017-01-01 Female 0.542 US 2446
# 4 2017-01-01 Male 0.527 all NA
# 5 2017-01-01 Male 0.386 India 556
# 6 2017-01-01 Male 0.668 US 1123
# 7 2017-02-01 Female 0.446 all NA
# 8 2017-02-01 Female 0.349 India 285
# 9 2017-02-01 Female 0.543 US 2494
#10 2017-02-01 Male 0.45 all NA
#11 2017-02-01 Male 0.389 India 449
#12 2017-02-01 Male 0.511 US 2237
library(data.table)
setDT(df)
res = groupingsets(df, by=c("Effective_Date", "Gender", "Location"),
sets=list(
c("Effective_Date", "Gender"),
c("Effective_Date", "Gender", "Location")
), j = .(n = sum(n), freq = mean(freq))
)[order(Effective_Date, Gender, Location, na.last=TRUE)]
Effective_Date Gender Location n freq
1: 2017-01-01 Female India 281 0.3510
2: 2017-01-01 Female US 2446 0.5420
3: 2017-01-01 Female <NA> 2727 0.4465
4: 2017-02-01 Female India 285 0.3490
5: 2017-02-01 Female US 2494 0.5430
6: 2017-02-01 Female <NA> 2779 0.4460
myby = c("Effective_Date", "Gender", "Location")
groupingsets(df,
j = .(n = sum(n), freq = mean(freq)),
by=myby, sets=list(myby, head(myby, -1))
)[, setorderv(.SD, myby, na.last=TRUE)]