R 用data.table索引索引data.table_R_Indexing_Data.table

R 用data.table索引索引data.table

r indexing

R 用data.table索引索引data.table,r,indexing,data.table,R,Indexing,Data.table,假设我有两个data.tables： indexDT <- data.table(id = rep(c(1,2,3),c(3,2,1)), V1 = c(1,3,5,2,4,4) , V3= c(3,4,5, 4, 5,5)) DT <- data.table(id = rep(1:3,(rep(5,3))), data.table(sapply(1:3, function(i){rnorm(5*3)}))) setkey(indexDT,"id") setkey(DT,"id")

假设我有两个data.tables：

indexDT <- data.table(id = rep(c(1,2,3),c(3,2,1)), V1 = c(1,3,5,2,4,4) , V3= c(3,4,5, 4, 5,5))
DT <- data.table(id = rep(1:3,(rep(5,3))), data.table(sapply(1:3, function(i){rnorm(5*3)})))

setkey(indexDT,"id")
setkey(DT,"id")

indexDT中的值用作每个列名称的每个id的行索引。现在我想做以下工作：对于indexDT中的每一列（这里是V1和V3）和每个id（这里是1、2和3），根据相同的列和id选择DT中的值。一个解决方案如下，但这一个不是很优雅，很难阅读，我希望有一个更快的解决方案。indexDT和DT都非常大（DT为nrow=500k*26和nrow=+/-10k）

任何更好的解决方案都将不胜感激！！！

谢谢

也许您可以使用矩阵替换为索引矩阵来实现这一点：

DT[, names(indexDT), with = F][indexDT[, .(M = list(as.matrix(.SD))), keyby = id],
     as.data.table(matrix(as.matrix(.SD)[cbind(c(t(M[[1]])), 1:ncol(M[[1]]))],
                          ncol = ncol(M[[1]]), byrow = T)), by = .EACHI]
#   id          V1         V2
#1:  1 -0.08786187 -1.1277373
#2:  1 -0.62336535  0.5501641
#3:  1  1.09400253 -0.8152316
#4:  2 -1.01158421  2.0713417
#5:  2 -0.08669810 -0.3845776
#6:  3 -0.10041684 -0.2430609

内部的

cbind

构造了索引矩阵，其余的只是将数据转换为正确的类型。

也许您可以使用索引矩阵替换矩阵来实现这一点：

DT[, names(indexDT), with = F][indexDT[, .(M = list(as.matrix(.SD))), keyby = id],
     as.data.table(matrix(as.matrix(.SD)[cbind(c(t(M[[1]])), 1:ncol(M[[1]]))],
                          ncol = ncol(M[[1]]), byrow = T)), by = .EACHI]
#   id          V1         V2
#1:  1 -0.08786187 -1.1277373
#2:  1 -0.62336535  0.5501641
#3:  1  1.09400253 -0.8152316
#4:  2 -1.01158421  2.0713417
#5:  2 -0.08669810 -0.3845776
#6:  3 -0.10041684 -0.2430609

内部的

cbind

构建了索引矩阵，其余的只是将数据转换为正确的类型

#> selectionDT 
#   id          V1         V2
#1:  1  0.30093680 -0.6158101
#2:  1  0.57746018 -1.2155334
#3:  1 -0.14585645 -0.6914313
#4:  2  0.08072223 -1.2507563
#5:  2 -0.98598985  0.1300098
#6:  3 -0.01676263  0.3053506

DT[, names(indexDT), with = F][indexDT[, .(M = list(as.matrix(.SD))), keyby = id],
     as.data.table(matrix(as.matrix(.SD)[cbind(c(t(M[[1]])), 1:ncol(M[[1]]))],
                          ncol = ncol(M[[1]]), byrow = T)), by = .EACHI]
#   id          V1         V2
#1:  1 -0.08786187 -1.1277373
#2:  1 -0.62336535  0.5501641
#3:  1  1.09400253 -0.8152316
#4:  2 -1.01158421  2.0713417
#5:  2 -0.08669810 -0.3845776
#6:  3 -0.10041684 -0.2430609