R 用data.table索引索引data.table

R 用data.table索引索引data.table,r,indexing,data.table,R,Indexing,Data.table,假设我有两个data.tables: indexDT <- data.table(id = rep(c(1,2,3),c(3,2,1)), V1 = c(1,3,5,2,4,4) , V3= c(3,4,5, 4, 5,5)) DT <- data.table(id = rep(1:3,(rep(5,3))), data.table(sapply(1:3, function(i){rnorm(5*3)}))) setkey(indexDT,"id") setkey(DT,"id")

假设我有两个data.tables:

indexDT <- data.table(id = rep(c(1,2,3),c(3,2,1)), V1 = c(1,3,5,2,4,4) , V3= c(3,4,5, 4, 5,5))
DT <- data.table(id = rep(1:3,(rep(5,3))), data.table(sapply(1:3, function(i){rnorm(5*3)})))

setkey(indexDT,"id")
setkey(DT,"id")
indexDT中的值用作每个列名称的每个id的行索引。现在我想做以下工作:对于indexDT中的每一列(这里是V1和V3)和每个id(这里是1、2和3),根据相同的列和id选择DT中的值。 一个解决方案如下,但这一个不是很优雅,很难阅读,我希望有一个更快的解决方案。indexDT和DT都非常大(DT为nrow=500k*26和nrow=+/-10k)

任何更好的解决方案都将不胜感激!!!
谢谢

也许您可以使用矩阵替换为索引矩阵来实现这一点:

DT[, names(indexDT), with = F][indexDT[, .(M = list(as.matrix(.SD))), keyby = id],
     as.data.table(matrix(as.matrix(.SD)[cbind(c(t(M[[1]])), 1:ncol(M[[1]]))],
                          ncol = ncol(M[[1]]), byrow = T)), by = .EACHI]
#   id          V1         V2
#1:  1 -0.08786187 -1.1277373
#2:  1 -0.62336535  0.5501641
#3:  1  1.09400253 -0.8152316
#4:  2 -1.01158421  2.0713417
#5:  2 -0.08669810 -0.3845776
#6:  3 -0.10041684 -0.2430609

内部的
cbind
构造了索引矩阵,其余的只是将数据转换为正确的类型。

也许您可以使用索引矩阵替换矩阵来实现这一点:

DT[, names(indexDT), with = F][indexDT[, .(M = list(as.matrix(.SD))), keyby = id],
     as.data.table(matrix(as.matrix(.SD)[cbind(c(t(M[[1]])), 1:ncol(M[[1]]))],
                          ncol = ncol(M[[1]]), byrow = T)), by = .EACHI]
#   id          V1         V2
#1:  1 -0.08786187 -1.1277373
#2:  1 -0.62336535  0.5501641
#3:  1  1.09400253 -0.8152316
#4:  2 -1.01158421  2.0713417
#5:  2 -0.08669810 -0.3845776
#6:  3 -0.10041684 -0.2430609
内部的
cbind
构建了索引矩阵,其余的只是将数据转换为正确的类型

#> selectionDT 
#   id          V1         V2
#1:  1  0.30093680 -0.6158101
#2:  1  0.57746018 -1.2155334
#3:  1 -0.14585645 -0.6914313
#4:  2  0.08072223 -1.2507563
#5:  2 -0.98598985  0.1300098
#6:  3 -0.01676263  0.3053506
DT[, names(indexDT), with = F][indexDT[, .(M = list(as.matrix(.SD))), keyby = id],
     as.data.table(matrix(as.matrix(.SD)[cbind(c(t(M[[1]])), 1:ncol(M[[1]]))],
                          ncol = ncol(M[[1]]), byrow = T)), by = .EACHI]
#   id          V1         V2
#1:  1 -0.08786187 -1.1277373
#2:  1 -0.62336535  0.5501641
#3:  1  1.09400253 -0.8152316
#4:  2 -1.01158421  2.0713417
#5:  2 -0.08669810 -0.3845776
#6:  3 -0.10041684 -0.2430609