Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/81.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 使用data.table聚合时保留零计数组合_R_Data.table - Fatal编程技术网

R 使用data.table聚合时保留零计数组合

R 使用data.table聚合时保留零计数组合,r,data.table,R,Data.table,假设我有以下数据。表: dt <- data.table(id = c(rep(1, 5), rep(2, 4)), sex = c(rep("H", 5), rep("F", 4)), fruit = c("apple", "tomato", "apple", "apple", "orange", "apple", "apple", "tomato", "tomato"), key =

假设我有以下
数据。表

dt <- data.table(id = c(rep(1, 5), rep(2, 4)),
                 sex = c(rep("H", 5), rep("F", 4)), 
                 fruit = c("apple", "tomato", "apple", "apple", "orange", "apple", "apple", "tomato", "tomato"),
                 key = "id")

   id sex  fruit
1:  1   H  apple
2:  1   H tomato
3:  1   H  apple
4:  1   H  apple
5:  1   H orange
6:  2   F  apple
7:  2   F  apple
8:  2   F tomato
9:  2   F tomato
其中:

    fruit sex N
1:  apple   H 3
2: tomato   H 1
3: orange   H 1
4:  apple   F 2
5: tomato   F 2
问题是,这样做会丢失
sex==“F”
orange
计数,因为该计数为0。有没有一种方法可以在不丢失零计数组合的情况下进行聚合

非常清楚的是,预期结果如下:

   fruit sex N
1:  apple   H 3
2: tomato   H 1
3: orange   H 1
4:  apple   F 2
5: tomato   F 2
6: orange   F 0

非常感谢

一种方法是将
sex
id
更改为factor(
id
在这里是多余的?)


或者您可以将
水果
更改为按
性别进行因子和分组

dt[, fruit := factor(fruit)]
dt[, .(fruit = levels(fruit), N=c(table(fruit))),by=sex]
#    sex  fruit N
# 1:   H  apple 3
# 2:   H orange 1
# 3:   H tomato 1
# 4:   F  apple 2
# 5:   F orange 0
# 6:   F tomato 2

编辑: 但是我怀疑如果你的
数据.table
很大,那么依赖
table
可能不是个好主意。在这种情况下,可能是一条路要走。也就是说,首先进行聚合,然后进行连接

out <- setkey(dt, sex, fruit)[, .N, 
             by="sex,fruit"][CJ(c("H","F"), 
             c("apple","tomato","orange")), 
             allow.cartesian=TRUE][is.na(N), N := 0L]
#    sex  fruit N
# 1:   F  apple 2
# 2:   F orange 0
# 3:   F tomato 2
# 4:   H  apple 3
# 5:   H orange 1
# 6:   H tomato 1

out似乎最简单的方法是显式提供传递给
i=
的data.table中的所有category组合,设置
by=.EACHI
对它们进行迭代:

setkey(dt, sex, fruit)
dt[CJ(sex, fruit, unique = TRUE), .N, by = .EACHI]
#    sex  fruit N
# 1:   F  apple 2
# 2:   F orange 0
# 3:   F tomato 2
# 4:   H  apple 3
# 5:   H orange 1
# 6:   H tomato 1

我在测试你的答案,并想“它工作得很好,但似乎有点慢”。然后我看到了你的编辑:)太棒了,非常感谢。
out <- setkey(dt, sex, fruit)[, .N, 
             by="sex,fruit"][CJ(c("H","F"), 
             c("apple","tomato","orange")), 
             allow.cartesian=TRUE][is.na(N), N := 0L]
#    sex  fruit N
# 1:   F  apple 2
# 2:   F orange 0
# 3:   F tomato 2
# 4:   H  apple 3
# 5:   H orange 1
# 6:   H tomato 1
setkey(dt, sex, fruit)
dt[CJ(sex, fruit, unique = TRUE), .N, by = .EACHI]
#    sex  fruit N
# 1:   F  apple 2
# 2:   F orange 0
# 3:   F tomato 2
# 4:   H  apple 3
# 5:   H orange 1
# 6:   H tomato 1