R 使用data.table聚合时保留零计数组合
假设我有以下R 使用data.table聚合时保留零计数组合,r,data.table,R,Data.table,假设我有以下数据。表: dt <- data.table(id = c(rep(1, 5), rep(2, 4)), sex = c(rep("H", 5), rep("F", 4)), fruit = c("apple", "tomato", "apple", "apple", "orange", "apple", "apple", "tomato", "tomato"), key =
数据。表:
dt <- data.table(id = c(rep(1, 5), rep(2, 4)),
sex = c(rep("H", 5), rep("F", 4)),
fruit = c("apple", "tomato", "apple", "apple", "orange", "apple", "apple", "tomato", "tomato"),
key = "id")
id sex fruit
1: 1 H apple
2: 1 H tomato
3: 1 H apple
4: 1 H apple
5: 1 H orange
6: 2 F apple
7: 2 F apple
8: 2 F tomato
9: 2 F tomato
其中:
fruit sex N
1: apple H 3
2: tomato H 1
3: orange H 1
4: apple F 2
5: tomato F 2
问题是,这样做会丢失sex==“F”
的orange
计数,因为该计数为0。有没有一种方法可以在不丢失零计数组合的情况下进行聚合
非常清楚的是,预期结果如下:
fruit sex N
1: apple H 3
2: tomato H 1
3: orange H 1
4: apple F 2
5: tomato F 2
6: orange F 0
非常感谢 一种方法是将sex
或id
更改为factor(id
在这里是多余的?)
或者您可以将水果
更改为按性别进行因子和分组
:
dt[, fruit := factor(fruit)]
dt[, .(fruit = levels(fruit), N=c(table(fruit))),by=sex]
# sex fruit N
# 1: H apple 3
# 2: H orange 1
# 3: H tomato 1
# 4: F apple 2
# 5: F orange 0
# 6: F tomato 2
编辑:
但是我怀疑如果你的数据.table
很大,那么依赖table
可能不是个好主意。在这种情况下,可能是一条路要走。也就是说,首先进行聚合,然后进行连接
out <- setkey(dt, sex, fruit)[, .N,
by="sex,fruit"][CJ(c("H","F"),
c("apple","tomato","orange")),
allow.cartesian=TRUE][is.na(N), N := 0L]
# sex fruit N
# 1: F apple 2
# 2: F orange 0
# 3: F tomato 2
# 4: H apple 3
# 5: H orange 1
# 6: H tomato 1
out似乎最简单的方法是显式提供传递给i=
的data.table中的所有category组合,设置by=.EACHI
对它们进行迭代:
setkey(dt, sex, fruit)
dt[CJ(sex, fruit, unique = TRUE), .N, by = .EACHI]
# sex fruit N
# 1: F apple 2
# 2: F orange 0
# 3: F tomato 2
# 4: H apple 3
# 5: H orange 1
# 6: H tomato 1
我在测试你的答案,并想“它工作得很好,但似乎有点慢”。然后我看到了你的编辑:)太棒了,非常感谢。
out <- setkey(dt, sex, fruit)[, .N,
by="sex,fruit"][CJ(c("H","F"),
c("apple","tomato","orange")),
allow.cartesian=TRUE][is.na(N), N := 0L]
# sex fruit N
# 1: F apple 2
# 2: F orange 0
# 3: F tomato 2
# 4: H apple 3
# 5: H orange 1
# 6: H tomato 1
setkey(dt, sex, fruit)
dt[CJ(sex, fruit, unique = TRUE), .N, by = .EACHI]
# sex fruit N
# 1: F apple 2
# 2: F orange 0
# 3: F tomato 2
# 4: H apple 3
# 5: H orange 1
# 6: H tomato 1