如何在R中对具有事务ID的数据集进行随机采样应该是在一起的_R_Sampling_Random Sample

如何在R中对具有事务ID的数据集进行随机采样应该是在一起的

如何在R中对具有事务ID的数据集进行随机采样应该是在一起的,r,sampling,random-sample,R,Sampling,Random Sample,我的示例数据集如下所示 transactionID desc 1 a 1 d 1 a 2 c 2 d 3 l 3 g 3 h 5 h 5 b 5 h 5 f 6 d 7 f 7 v 7 f 8 f 8 d 抽样结果应为： 1 a 1 d 1 a 2 c 2 d 3 l 3 g 3 h 或确切的样本值并不重要，它可以是任何东西，但我必须保留的重要因素是一个样本中的相同事务id。我该怎么

我的示例数据集如下所示

transactionID   desc
1   a
1   d
1   a
2   c
2   d
3   l
3   g
3   h
5   h
5   b
5   h
5   f
6   d
7   f
7   v
7   f
8   f
8   d

抽样结果应为：

或

确切的样本值并不重要，它可以是任何东西，但我必须保留的重要因素是一个样本中的相同事务id。我该怎么做

你可以试试

 n <- 2
 df[with(df, transactionID %in% 
         sample(unique(transactionID),n, replace=FALSE)),]
 #      transactionID desc
 #1              1    a
 #2              1    d
 #3              1    a
 #17             8    f
 #18             8    d

n我建议使用data.table
对象来提高效率（而不是data.frame
s），尤其是用于此任务（因为它具有二进制搜索功能）
 n <- 2
 df[with(df, transactionID %in% 
         sample(unique(transactionID),n, replace=FALSE)),]
 #      transactionID desc
 #1              1    a
 #2              1    d
 #3              1    a
 #17             8    f
 #18             8    d

 df <- structure(list(transactionID = c(1L, 1L, 1L, 2L, 2L, 3L, 3L, 
 3L, 5L, 5L, 5L, 5L, 6L, 7L, 7L, 7L, 8L, 8L), desc = c("a", "d", 
 "a", "c", "d", "l", "g", "h", "h", "b", "h", "f", "d", "f", "v", 
 "f", "f", "d")), .Names = c("transactionID", "desc"), class = "data.frame",
 row.names = c(NA,-18L))

library(data.table)
setkey(setDT(df), transactionID) # Converting to data.table and setting a key in order to enable binary search

set.seed(123) # making the example reproducible
n <- 3 # Number of samples
indx <- sample(unique(df$transactionID), n) # sampling the `transactionID`

df[J(indx)]
#    transactionID desc
# 1:             3    l
# 2:             3    g
# 3:             3    h
# 4:             6    d
# 5:             8    f
# 6:             8    d