R 从数据帧中的组重复采样并应用函数_R_Data.table_Repeat

R 从数据帧中的组重复采样并应用函数

R 从数据帧中的组重复采样并应用函数,r,data.table,repeat,R,Data.table,Repeat,这是两个问题（和）的组合我们的目标是对data.table中的组进行采样，但重复此过程“n”次，并提取每行值的平均值。例如： #generate the data DT = data.table(a=c(1,1,1,1:15,1,1), b=sample(1:1000,20)) #sample the data as done in the second linked question DT[,.SD[sample(.N,min(.N,3))],by = a] a b 1:

这是两个问题（和）的组合

我们的目标是对data.table中的组进行采样，但重复此过程“n”次，并提取每行值的平均值。例如：

#generate the data
DT = data.table(a=c(1,1,1,1:15,1,1), b=sample(1:1000,20))

#sample the data as done in the second linked question
DT[,.SD[sample(.N,min(.N,3))],by = a]
     a   b
 1:  1 288
 2:  1 881
 3:  1 409
 4:  2 937
 5:  3  46
 6:  4 525
 7:  5 887
 8:  6 548
 9:  7 453
10:  8 948
11:  9 449
12: 10 670
13: 11 566
14: 12 102
15: 13 993
16: 14 243
17: 15  42

现在，我尝试使用第一个链接问题中给出的答案：

x <- replicate(100,{DT[,.SD[sample(.N,min(.N,3))],by = a]})

因此，为了获得每行的平均值，我必须找到

x[[j]]

的平均值，其中

来自

seq（2200,2）

其中

是复制次数*2

有没有更简单的方法？我已尝试以这种方式使用此解决方案（）：

y <- DT[,.SD[sample(.N,min(.N,3))],by = a]
y[,list(mean=mean(b)),by=a]
     a mean
 1:  1  550
 2:  2  849
 3:  3  603
 4:  4   77
 5:  5  973
 6:  6  746
 7:  7  919
 8:  8  655
 9:  9  883
10: 10  823
11: 11  533
12: 12  483
13: 13   53
14: 14  827
15: 15  413

y类似这样的东西
根据您的评论，您希望对每个复制按组进行平均，因此在本例中，15*100表示。这里有两种方法
library(data.table)
set.seed(1) # for reproducibility
DT = data.table(a=c(1,1,1,1:15,1,1), b=sample(1:1000,20))
x <- replicate(100,{DT[,.SD[sample(.N,min(.N,3))],by = a]})

indx <- seq(1,length(x),2)
result.1 <- mapply(function(a,b)aggregate(b,list(a),mean)$x,x[indx],x[indx+1])
str(result.1)
#  num [1:15, 1:100] 569 201 894 940 657 625 62 204 175 679 ...
result.2 <- sapply(x[indx+1],function(b)aggregate(b,x[1],mean)$x)
identical(result.1,result.2)
# [1] TRUE

库（data.table）
设定种子（1）#用于再现性
DT=数据表（a=c（1,1,1,1:15,1,1），b=样本（1:1000,20））
是的，当不再需要“a”时，它会简化。如果在另一个问题的上下文中，我们希望将其保留为行名或其他内容，那么使用sapply（seq（2，length（x），2），function（i）mean（x[[i]]）？问题是，我不想平均整个列表。我想分组平均。在这个例子中，a表示组，那么您想知道每个组的每个复制的平均值吗？？对于您的示例中总共100*15的意思？？我想我要做的是使用“group by mean”来回答列表“x”。在我复制这个过程100次之后，我想取每个唯一的“a”值的平均值。基本上，使用这种技术：
library(data.table)
set.seed(1) # for reproducibility
DT = data.table(a=c(1,1,1,1:15,1,1), b=sample(1:1000,20))
x <- replicate(100,{DT[,.SD[sample(.N,min(.N,3))],by = a]})

indx <- seq(1,length(x),2)
result.1 <- mapply(function(a,b)aggregate(b,list(a),mean)$x,x[indx],x[indx+1])
str(result.1)
#  num [1:15, 1:100] 569 201 894 940 657 625 62 204 175 679 ...
result.2 <- sapply(x[indx+1],function(b)aggregate(b,x[1],mean)$x)
identical(result.1,result.2)
# [1] TRUE