R数据表每行的快速t检验_R_Data.table

R数据表每行的快速t检验

R数据表每行的快速t检验,r,data.table,R,Data.table,我可以使用data.table的固有速度来获得更快的逐行t.test结果，并使用可变列名吗？下面是我当前的代码，每1000行需要几秒钟 slow.diffexp <- function(dt, samples1, samples2) { for (i in 1:nrow(dt)) { if (round(i/1000)==i/1000) { cat(i, "\n"); } a <- t.test(dt[i, samples1, with=FAL

我可以使用data.table的固有速度来获得更快的逐行t.test结果，并使用可变列名吗？下面是我当前的代码，每1000行需要几秒钟

slow.diffexp <- function(dt, samples1, samples2) {
  for (i in 1:nrow(dt)) {
    if (round(i/1000)==i/1000) {
      cat(i, "\n");
    }
    a <- t.test(dt[i, samples1, with=FALSE],
                dt[i, samples2, with=FALSE]);
    set(dt, i, "tt.p.value", a$p.value)
    set(dt, i, "tt.mean1", a$estimate[1])
    set(dt, i, "tt.mean2", a$estimate[2])
  }
}

test.dt <- data.table(V1=sample(1000, 100000, replace=TRUE));
for (i in 2:20) {
  colname <- paste0("V", i);
  test.dt[ , (colname):=sample(1000, 100000, replace=TRUE)];
}
samples1 <- sample(names(test.dt), size=10);
samples2 <- setdiff(names(test.dt), samples1);
slow.diffexp(test.dt, samples1, samples2);

slow.xp这并没有显式地使用data.table
，但是对于循环，它应该比快得多：
set.seed(700)
test.dt <- data.table(V1=sample(1000, 100000, replace=TRUE));
for (i in 2:20) {
  colname <- paste0("V", i);
  test.dt[ , (colname):=sample(1000, 100000, replace=TRUE)];
}
samples1 <- sample(names(test.dt), size=10);
samples2 <- setdiff(names(test.dt), samples1);

system.time(myList<-apply(test.dt, 1, function(x) t.test(x[samples1], x[samples2])))
# user  system elapsed 
# 18.44    0.00   18.47 

test.dt$tt.p.value<-sapply(myList, function(x) x[[3]])
test.dt$tt.mean1<-sapply(myList, function(x) x[[5]][[1]])
test.dt$tt.mean2<-sapply(myList, function(x) x[[5]][[2]])

test.dt[1:10, 19:23, with = F]

V19 V20 tt.p.value tt.mean1 tt.mean2
962 536    0.98203    460.8    463.9
882 767    0.06294    657.4    416.0
371 111    0.73440    463.1    502.8
173 720    0.57195    595.9    513.3
126 404    0.86948    602.8    619.5
 14  16    0.63462    315.7    377.3
870 384    0.03670    377.7    626.6
142 997    0.19836    623.2    442.8
  4 193    0.99891    628.4    628.2
250 888    0.35232    590.9    498.5

这并没有显式地使用data.table
，但它应该比for
循环快得多：
set.seed(700)
test.dt <- data.table(V1=sample(1000, 100000, replace=TRUE));
for (i in 2:20) {
  colname <- paste0("V", i);
  test.dt[ , (colname):=sample(1000, 100000, replace=TRUE)];
}
samples1 <- sample(names(test.dt), size=10);
samples2 <- setdiff(names(test.dt), samples1);

system.time(myList<-apply(test.dt, 1, function(x) t.test(x[samples1], x[samples2])))
# user  system elapsed 
# 18.44    0.00   18.47 

test.dt$tt.p.value<-sapply(myList, function(x) x[[3]])
test.dt$tt.mean1<-sapply(myList, function(x) x[[5]][[1]])
test.dt$tt.mean2<-sapply(myList, function(x) x[[5]][[2]])

test.dt[1:10, 19:23, with = F]

V19 V20 tt.p.value tt.mean1 tt.mean2
962 536    0.98203    460.8    463.9
882 767    0.06294    657.4    416.0
371 111    0.73440    463.1    502.8
173 720    0.57195    595.9    513.3
126 404    0.86948    602.8    619.5
 14  16    0.63462    315.7    377.3
870 384    0.03670    377.7    626.6
142 997    0.19836    623.2    442.8
  4 193    0.99891    628.4    628.2
250 888    0.35232    590.9    498.5

data.table
通常在按列操作时速度更快。创建自己的精简版t.test.default
只执行您需要的特定操作，可能会让您获得更多里程。或者，您可以简单地随机抽取p值样本，这几乎是瞬时的。数据。表
通常在按列操作时更快速。您可以通过创建自己的精简版本的t.test.default
来获得更多里程数，该版本只做您需要的特定事情。或者，您可以简单地绘制一个随机的p值样本，这几乎是瞬时的。