R 如何减少使用set.seed()和sample()创建可复制数据帧的代码?
我想创建一个相当大且可复制的数据集,名为R 如何减少使用set.seed()和sample()创建可复制数据帧的代码?,r,R,我想创建一个相当大且可复制的数据集,名为Activity,在StackOverFlow这里提出一个问题。我的数据帧由以下变量组成: DateTime:以毫秒为单位的日期和时间,数据速率为每秒11个值,即每秒11行 ID:指个人。我想创建一个包含3个人数据的数据集(a、B和C) x:范围从-1到+1的随机数据 y:范围从-1到+1的随机数据 z:从-1到+1的随机数据 我最初使用的代码是: set.seed(100) fmt <- "%Y-%m-%d %H:%M:%OS" DateTime
Activity
,在StackOverFlow这里提出一个问题。我的数据帧由以下变量组成:
DateTime
:以毫秒为单位的日期和时间,数据速率为每秒11个值,即每秒11行ID
:指个人。我想创建一个包含3个人数据的数据集(a
、B
和C
)x
:范围从-1到+1的随机数据y
:范围从-1到+1的随机数据z
:从-1到+1的随机数据set.seed(100)
fmt <- "%Y-%m-%d %H:%M:%OS"
DateTime = seq(from=as.POSIXct("2017-08-05 14:03:55.300", format=fmt, tz="UTC"), by=1/11, length.out=67)
ID = rep("A", each=67)
x= sample(seq(from = -1, to = 1, by = 0.01), size = 67, replace = TRUE)
y= sample(seq(from = -1, to = 1, by = 0.01), size = 67, replace = TRUE)
z= sample(seq(from = -1, to = 1, by = 0.01), size = 67, replace = TRUE)
Activity1<- data.frame(DateTime,ID, x, y, z)
DateTime = seq(from=as.POSIXct("2017-08-05 16:18:12.100", format=fmt, tz="UTC"),by=1/11, length.out=67)
ID = rep("B", each=67)
x= sample(seq(from = -1, to = 1, by = 0.01), size = 67, replace = TRUE)
y= sample(seq(from = -1, to = 1, by = 0.01), size = 67, replace = TRUE)
z= sample(seq(from = -1, to = 1, by = 0.01), size = 67, replace = TRUE)
Activity2<- data.frame(DateTime,ID, x, y, z)
DateTime = seq(from=as.POSIXct("2017-08-05 20:34:31.540", format=fmt, tz="UTC"),by=1/11, length.out=67)
ID = rep("C", each=67)
x= sample(seq(from = -1, to = 1, by = 0.01), size = 67, replace = TRUE)
y= sample(seq(from = -1, to = 1, by = 0.01), size = 67, replace = TRUE)
z= sample(seq(from = -1, to = 1, by = 0.01), size = 67, replace = TRUE)
Activity3<- data.frame(DateTime,ID, x, y, z)
Activity<- rbind(Activity1,Activity2,Activity3)
head(Activity)
DateTime ID x y z
1 2017-08-05 14:03:55.29999 A 0.01 0.82 -0.56
2 2017-08-05 14:03:55.39090 A 0.11 0.74 0.07
3 2017-08-05 14:03:55.48182 A 0.50 0.95 -0.64
4 2017-08-05 14:03:55.57273 A 0.97 -0.89 0.95
5 2017-08-05 14:03:55.66364 A -0.97 0.78 -0.01
6 2017-08-05 14:03:55.75454 A -0.46 0.20 1.00
set.seed(100)
fmt有许多不同的方法可以达到相同的结果。这是我使用首选工具所做的:
library(data.table)
# define parameters to control the process
base_data <- fread("DateTime, ID, N
2017-08-05 14:03:55.300, A, 67
2017-08-05 16:18:12.100, B, 67
2017-08-05 20:34:31.540, C, 67")[
, DateTime := lubridate::ymd_hms(DateTime)]
# expand sequences rowwise
Activity <- base_data[, .(DateTime = seq(from = DateTime, by = 1/11, length.out = N)),
by = .(rn = seq(nrow(base_data)), ID)][
, rn := NULL][]
# create x, y, z columns by sampling
cols <- c("x", "y", "z")
set.seed(100)
Activity[, (cols) := replicate(length(cols), round(runif(.N, -1, +1), 2), simplify = FALSE)]
Activity
默认情况下,不打印秒的分数,但1/11秒增量可以通过
head(diff(Activity$DateTime))
由于OP没有要求用给定的种子值精确地复制他的结果,我已经替换了它
sample(seq(from = -1, to = 1, by = 0.01), size = 67, replace = TRUE)
借
如果需要sample()
,则可以跳过seq()
部分
sample((-100:100)/100, .N, replace = TRUE)
使用data.table
链接代码可以更简洁地编写为
library(data.table)
cols <- c("x", "y", "z")
set.seed(100)
Activity <- fread("DateTime, ID, N
2017-08-05 14:03:55.300, A, 67
2017-08-05 16:18:12.100, B, 67
2017-08-05 20:34:31.540, C, 67")[
, DateTime := lubridate::ymd_hms(DateTime)][
, .(DateTime = seq(from = DateTime, by = 1/11, length.out = N)),
by = .(rn = seq(nrow(base_data)), ID)][
, (cols) := replicate(length(cols), round(runif(.N, -1, +1), 2), simplify = FALSE)][
, rn := NULL][]
库(data.table)
cols是否使用不同的R版本?据报道,R3.6中对样本的更改使其在不同版本之间不可复制,这是可能的!!您知道如何简化代码以创建我想要的数据帧吗?我正在使用版本1.2.5033
。您能检查一下您的数据帧并告诉我您是否获得了与我相同的数据帧吗?“我使用的是1.2.5033
”;这很可能是RStudio的版本,而不是R;检查sessionInfo()
的输出,对于您正在使用的R版本,可以使用RNGversion()
函数获取较新版本的R,以使用较旧版本的随机数生成器。例如,RNGversion(“3.5.1”)
告诉R使用随机数生成器的3.5.1版本。谢谢@Uwe,它看起来很棒!!一个疑问是什么是cyl
,disp
和wt
?您在第一个代码选项后显示它们。@Dekike这只是一个意外。我已经更正了我的答案。(显然,我还没有好好清理我的工作空间……)
round(runif(.N, -1, +1), 2)
sample((-100:100)/100, .N, replace = TRUE)
library(data.table)
cols <- c("x", "y", "z")
set.seed(100)
Activity <- fread("DateTime, ID, N
2017-08-05 14:03:55.300, A, 67
2017-08-05 16:18:12.100, B, 67
2017-08-05 20:34:31.540, C, 67")[
, DateTime := lubridate::ymd_hms(DateTime)][
, .(DateTime = seq(from = DateTime, by = 1/11, length.out = N)),
by = .(rn = seq(nrow(base_data)), ID)][
, (cols) := replicate(length(cols), round(runif(.N, -1, +1), 2), simplify = FALSE)][
, rn := NULL][]