在R的bigmemory中按组从变量中减去平均值

在R的bigmemory中按组从变量中减去平均值,r,functional-programming,statistics,R,Functional Programming,Statistics,我想贬低big.matrix(panel)结构中的变量。我尝试了不同的方法,但在bigmemory设置中有效的方法是tapply(由BigTable软件包提供)。我有以下代码来计算由panel_id表示的组的变量var1的平均值 data <- read.big.matrix ("data.csv", sep = ",", header=TRUE, type = "double", backingfile = "backing.bin" , descriptor = "data.desc"

我想贬低big.matrix(panel)结构中的变量。我尝试了不同的方法,但在bigmemory设置中有效的方法是tapply(由BigTable软件包提供)。我有以下代码来计算由panel_id表示的组的变量var1的平均值

data <- read.big.matrix ("data.csv", sep = ",", header=TRUE, type = "double", backingfile = "backing.bin" , descriptor = "data.desc")
xdesc <- dget ("data.desc")
data <- attach.big.matrix(xdesc)

mean_var1=tapply(data[,"var1"], data[,"panel_id"], mean, na.rm=TRUE)

data最简单的方法可能是使用
bigspilt
函数和
for
循环进行就地修改

idx <- bigsplit(data, 1)

for(i in seq(length(idx))){
    data[idx[[i]],2] <- data[idx[[i]],2] - mean_var1[i]
}
# use lapply
lapply(seq(length(idx)), function(x) data[idx[[x]],] - mean_var1[[x]])

# use foreach (don't forget to register you backend!!!)
library(foreach)
foreach(iter = seq(length(idx))) %dopar% {
    data[idx[[iter]],2] - mean_var1[iter]
}