R、 data.table,按列*数字*分组,并对一列求和
假设我有以下数据。表R、 data.table,按列*数字*分组,并对一列求和,r,data.table,R,Data.table,假设我有以下数据。表 > DT # A B C D E N # 1: J t X D N 0.07898388 # 2: U z U L A 0.46906049 # 3: H a Z F S 0.50826435 # --- # 9998: X b R L X 0.49879990 # 9999: Z r U J J 0.63233668 # 10000: C b M K U 0.
> DT
# A B C D E N
# 1: J t X D N 0.07898388
# 2: U z U L A 0.46906049
# 3: H a Z F S 0.50826435
# ---
# 9998: X b R L X 0.49879990
# 9999: Z r U J J 0.63233668
# 10000: C b M K U 0.47796539
现在我需要按一对列分组,然后计算和N。
当您提前知道列名时,这很容易做到:
> DT[, sum(N), by=.(A,B)]
# A B V1
# 1: J t 6.556897
# 2: U z 9.060844
# 3: H a 4.293426
# ---
# 674: V z 11.439100
# 675: M x 1.736050
# 676: U k 3.676197
但我必须在一个函数中执行此操作,该函数接收要分组的列索引向量
> f <- function(columns = 1:2) {
DT[, sum(N), by=columns]
}
> f(1:2)
Error in `[.data.table`(DT, , sum(N), by = columns) :
The items in the 'by' or 'keyby' list are length (2). Each must be same
length as rows in x or number of rows returned by i (10000).
我该如何使其工作?以下是我的方法:
f <- function(columns) {
Get <- if (!is.numeric(columns)) match(columns, names(DT)) else columns
columns <- names(DT)[Get]
DT[, sum(N), by = columns]
}
在函数中添加一行,根据“columns”参数标识列名。啊哈,是的
nm只需要nm
f <- function(columns) {
Get <- if (!is.numeric(columns)) match(columns, names(DT)) else columns
columns <- names(DT)[Get]
DT[, sum(N), by = columns]
}
set.seed(1)
DT <- data.table(
A = sample(letters[1:3], 20, TRUE),
B = sample(letters[1:5], 20, TRUE),
C = sample(LETTERS[1:2], 20, TRUE),
N = rnorm(20)
)
## Should work with either column number or name
f(1)
f("A")
f(c(1, 3))
f(c("A", "C"))