R中数据帧的随机样本
我有以下数据框:R中数据帧的随机样本,r,R,我有以下数据框: id<-c(1,1,2,3,3) date<-c("23-01-08","01-11-07","30-11-07","17-12-07","12-12-08") df<-data.frame(id,date) df$date2<-as.Date(as.character(df$date), format = "%d-%m-%y") id date date2 1 23-01-08 2008-01-23 1 01-11-07 2
id<-c(1,1,2,3,3)
date<-c("23-01-08","01-11-07","30-11-07","17-12-07","12-12-08")
df<-data.frame(id,date)
df$date2<-as.Date(as.character(df$date), format = "%d-%m-%y")
id date date2
1 23-01-08 2008-01-23
1 01-11-07 2007-11-01
2 30-11-07 2007-11-30
3 17-12-07 2007-12-17
3 12-12-08 2008-12-12
任何帮助都将不胜感激。首先,您必须生成示例索引:
s_ids=sample(unique(df$id),2)
现在,您已经在df中选择了正确的记录
new_df=df$[df$id %in% s_ids,]
您可以使用
sample()
您可以使用
sample
函数
set.seed(2)
df[match(sample(unique(df$id),2),df$id),]
sample()
函数将生成随机索引,然后您可以将它们与df
数据帧行匹配,并获取其余数据。
有关更多信息,请检查样本或使用dplyr
chosen <- sample(unique(df$id), 2)
library(dplyr)
df %>%
filter(id %in% sample(unique(id),2))
# id date date2
#1 2 30-11-07 2007-11-30
#2 3 17-12-07 2007-12-17
#3 3 12-12-08 2008-12-12
或
使用sqldf:
library(sqldf)
a <- sqldf("SELECT DISTINCT id FROM df ORDER BY RANDOM(*) LIMIT 2")
sqldf("SELECT * FROM df WHERE id IN a")
如果您有重复的id值,这将不起作用。也就是说,在当前数据中,您可能会选择
1
两次。这不会给出预期的结果-您总是返回5行。更新了答案
library(dplyr)
df %>%
filter(id %in% sample(unique(id),2))
# id date date2
#1 2 30-11-07 2007-11-30
#2 3 17-12-07 2007-12-17
#3 3 12-12-08 2008-12-12
df %>%
select(id) %>%
unique() %>%
sample_n(2) %>%
semi_join(df, .)
# id date date2
#1 1 23-01-08 2008-01-23
#2 1 01-11-07 2007-11-01
#3 2 30-11-07 2007-11-30
library(sqldf)
a <- sqldf("SELECT DISTINCT id FROM df ORDER BY RANDOM(*) LIMIT 2")
sqldf("SELECT * FROM df WHERE id IN a")
id date date2
1 1 23-01-08 2008-01-23
2 1 01-11-07 2007-11-01
3 3 17-12-07 2007-12-17
4 3 12-12-08 2008-12-12