在r中创建3个互斥样本_R_Random Sample_Sample Data

在r中创建3个互斥样本

在r中创建3个互斥样本,r,random-sample,sample-data,R,Random Sample,Sample Data,我有一个数据集，我需要把它分成三个大小不同的互斥随机样本。我用过： testdata<-sample(47959,14388,replace=FALSE,prob=NULL) testdata一些数据： set.seed(42) x <- sample(20, size=100, replace=TRUE) head(x) ## [1] 19 19 6 17 13 11 编辑：更新的实施索引可以代替lappy（1:3，函数（j）x[i==j]），尝试一下split（x，i）。

我有一个数据集，我需要把它分成三个大小不同的互斥随机样本。我用过：

testdata<-sample(47959,14388,replace=FALSE,prob=NULL)

testdata一些数据：
set.seed(42)
x <- sample(20, size=100, replace=TRUE)
head(x)
## [1] 19 19  6 17 13 11

编辑：更新的实施
索引可以代替lappy（1:3，函数（j）x[i==j]）
，尝试一下split（x，i）
。没错，习惯和快速攻击的受害者（这就是为什么我倾向于用“很多方法”来警告一些答案）。在我的头脑中，我总是将split
与by
分组，这会产生一些不同的结构。我现在明白了，那是个错误。谢谢@Thomas！您可以这样控制大小：str（split（sample）（1:sum（index…）然后使用向量列表索引原始数据，当然。一行代码总是很有趣（编写，而不是调试/维护！）。我上面更新的实现当然不应该使用神奇常量（1:3
），alasample（rep（指数）
i <- sample(1:3, size=length(x), replace=TRUE)
head(i)
## [1] 2 1 1 2 3 3

x.grouped <- split(x, i)
str(x.grouped)
## List of 3
##  $ 1: int [1:31] 19 6 15 20 9 5 8 9 18 20 ...
##  $ 2: int [1:30] 19 17 14 10 6 10 19 3 10 19 ...
##  $ 3: int [1:39] 13 11 15 3 15 19 12 2 8 19 ...

indices <- c(20, 50, 30)
indices.cs <- cumsum(indices)
x.unsorted <- sample(x)
xs.grouped.sized <- mapply(function(a,b) x.unsorted[a:b],
    1+lag(indices.cs, default=0),
    indices.cs,
    SIMPLIFY=FALSE)
str(xs.grouped.sized)
## List of 3
##  $ : int [1:20] 2 7 13 1 19 7 14 20 19 1 ...
##  $ : int [1:50] 13 6 19 4 19 20 20 11 17 3 ...
##  $ : int [1:30] 1 10 7 16 9 16 17 11 14 8 ...

indices <- sample(rep(1:3, times = c(20,50,30)))
str(split(x, indices))
## List of 3
##  $ 1: int [1:20] 6 3 10 6 10 20 17 8 5 13 ...
##  $ 2: int [1:50] 19 19 17 15 14 15 19 20 3 19 ...
##  $ 3: int [1:30] 13 11 15 19 10 12 3 11 14 1 ...