如何在R中对具有事务ID的数据集进行随机采样应该是在一起的
我的示例数据集如下所示如何在R中对具有事务ID的数据集进行随机采样应该是在一起的,r,sampling,random-sample,R,Sampling,Random Sample,我的示例数据集如下所示 transactionID desc 1 a 1 d 1 a 2 c 2 d 3 l 3 g 3 h 5 h 5 b 5 h 5 f 6 d 7 f 7 v 7 f 8 f 8 d 抽样结果应为: 1 a 1 d 1 a 2 c 2 d 3 l 3 g 3 h 或 确切的样本值并不重要,它可以是任何东西,但我必须保留的重要因素是一个样本中的相同事务id。我该怎么
transactionID desc
1 a
1 d
1 a
2 c
2 d
3 l
3 g
3 h
5 h
5 b
5 h
5 f
6 d
7 f
7 v
7 f
8 f
8 d
抽样结果应为:
1 a
1 d
1 a
2 c
2 d
3 l
3 g
3 h
或
确切的样本值并不重要,它可以是任何东西,但我必须保留的重要因素是一个样本中的相同事务id。我该怎么做 你可以试试
n <- 2
df[with(df, transactionID %in%
sample(unique(transactionID),n, replace=FALSE)),]
# transactionID desc
#1 1 a
#2 1 d
#3 1 a
#17 8 f
#18 8 d
n我建议使用data.table
对象来提高效率(而不是data.frame
s),尤其是用于此任务(因为它具有二进制搜索功能)
n <- 2
df[with(df, transactionID %in%
sample(unique(transactionID),n, replace=FALSE)),]
# transactionID desc
#1 1 a
#2 1 d
#3 1 a
#17 8 f
#18 8 d
df <- structure(list(transactionID = c(1L, 1L, 1L, 2L, 2L, 3L, 3L,
3L, 5L, 5L, 5L, 5L, 6L, 7L, 7L, 7L, 8L, 8L), desc = c("a", "d",
"a", "c", "d", "l", "g", "h", "h", "b", "h", "f", "d", "f", "v",
"f", "f", "d")), .Names = c("transactionID", "desc"), class = "data.frame",
row.names = c(NA,-18L))
library(data.table)
setkey(setDT(df), transactionID) # Converting to data.table and setting a key in order to enable binary search
set.seed(123) # making the example reproducible
n <- 3 # Number of samples
indx <- sample(unique(df$transactionID), n) # sampling the `transactionID`
df[J(indx)]
# transactionID desc
# 1: 3 l
# 2: 3 g
# 3: 3 h
# 4: 6 d
# 5: 8 f
# 6: 8 d