为什么我的dplyr语句会创建额外的行?
我希望“temp”输出40行,由1-20岁的男性和1-20岁的女性组成。相反,它创建40行,然后复制它们并附加它们,结果“temp”是80行 它为什么这样做?我如何阻止它?我知道我自己可以删除第41-80行,但在处理大型数据集时,这是一件痛苦的事情为什么我的dplyr语句会创建额外的行?,r,dplyr,tidyr,R,Dplyr,Tidyr,我希望“temp”输出40行,由1-20岁的男性和1-20岁的女性组成。相反,它创建40行,然后复制它们并附加它们,结果“temp”是80行 它为什么这样做?我如何阻止它?我知道我自己可以删除第41-80行,但在处理大型数据集时,这是一件痛苦的事情 library(dplyr) library(tidyr) gender <- sample(c("male","female"), 100, replace = T) age <- sample(1:20, , replace = T
library(dplyr)
library(tidyr)
gender <- sample(c("male","female"), 100, replace = T)
age <- sample(1:20, , replace = T)
df <- data.frame(gender, age)
temp <- df %>% group_by(gender, age) %>%
summarise(count = n()) %>%
complete(gender = c("male", "female"), age = 1:20, fill = list(count = 0))
库(dplyr)
图书馆(tidyr)
性别来自dplyr(强调添加):
当您按多个变量分组时,每个摘要将剥离一个
分组的级别
以下是您的代码通过管道传输到complete
的数据帧:
> df %>% group_by(gender, age) %>% summarise(count = n())
# A tibble: 24 x 3
# Groups: gender [?]
gender age count
<fct> <int> <int>
1 female 2 4
2 female 3 2
3 female 7 6
4 female 9 5
5 female 10 4
6 female 11 2
7 female 12 3
8 female 13 4
9 female 15 1
10 female 18 1
# ... with 14 more rows
这是否为您提供了所需的输出df%%>%groupby(gender,age)%%>%summary(count=n())%%>%groupby(gender)%%>%complete(age=1:20,fill=list(count=0))
。按性别分组也很有效。我认为如果不在complete()中指定性别,那么如果说女性或男性都没有年龄2,那么只会为女性创建年龄2,计数为0,但不会为男性创建年龄2,因为我告诉它年龄2必须在那里(但只必须是一次)。我想我现在明白了,在组别中有性别意味着每个年龄段都必须有男性和女性。。。在本例中为1:20,因为我在complete()中指定了它。我仍然不知道为什么在complete()中使用gender会产生额外的行。
# explicitly remove all grouping
t1 <- df %>%
group_by(gender, age) %>%
summarise(count = n()) %>%
ungroup() %>%
complete(gender = c("male", "female"),
age = 1:20,
fill = list(count = 0))
# retain gender grouping, & only complete for different ages within each gender group
t2 <- df %>%
group_by(gender, age) %>%
summarise(count = n()) %>%
complete(age = 1:20,
fill = list(count = 0))
# use count, which is a wrapper for group_by(), summarise(n = n()), & ungroup() in one line
# note: the output variable name from this approach is hard-coded to n, & there is currently
# no way to change it in this step
t3 <- df %>%
count(gender, age) %>%
rename(count = n) %>%
complete(gender = c("male", "female"),
age = 1:20,
fill = list(count = 0))
> all.equal(t1, t2)
[1] TRUE
> all.equal(t1, t3)
[1] TRUE