对于每个id,从dataframe列中随机标记或选择一半的值,以创建两个单独的变量?
我想从1列中为每个唯一标识符ID创建2个变量。我想随机选择一半的值作为一个变量,剩下的一半作为另一个变量。下面是一个示例数据帧:对于每个id,从dataframe列中随机标记或选择一半的值,以创建两个单独的变量?,r,R,我想从1列中为每个唯一标识符ID创建2个变量。我想随机选择一半的值作为一个变量,剩下的一半作为另一个变量。下面是一个示例数据帧: Df1 <- data.frame(ID = c(1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,3), var = c(100, 200, 250, 400,425,250,80, 120, 210, 175,50,200,300, 90, 70, 500,400) 任何帮助都将不胜感激 感谢在各种库中有
Df1 <- data.frame(ID = c(1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,3),
var = c(100, 200, 250, 400,425,250,80, 120, 210, 175,50,200,300, 90, 70, 500,400)
任何帮助都将不胜感激
感谢在各种库中有许多复杂的测试/训练数据分割功能。这是一个基于随机样本的非常简单的例子
i = sample(1:nrow(Df1), size = floor(0.5*nrow(Df1)))
Df.set1 = Df1[i,]
Df.set2 = Df1[-i,]
在各种库中有许多复杂的测试/训练数据分割函数。这是一个基于随机样本的非常简单的例子
i = sample(1:nrow(Df1), size = floor(0.5*nrow(Df1)))
Df.set1 = Df1[i,]
Df.set2 = Df1[-i,]
在各种库中有许多复杂的测试/训练数据分割函数。这是一个基于随机样本的非常简单的例子
i = sample(1:nrow(Df1), size = floor(0.5*nrow(Df1)))
Df.set1 = Df1[i,]
Df.set2 = Df1[-i,]
在各种库中有许多复杂的测试/训练数据分割函数。这是一个基于随机样本的非常简单的例子
i = sample(1:nrow(Df1), size = floor(0.5*nrow(Df1)))
Df.set1 = Df1[i,]
Df.set2 = Df1[-i,]
如果您不介意一列系统性地比另一列长,您可以使用
grp <- with(Df1, ave(ID, ID, FUN=function(x) sample(gl(2,1,length(x)))))
这将始终将额外的样本放入第1组。如果您想随机化剩菜的放置,那么像这样的辅助函数可能会有所帮助
markhalf <- function(x) {
n <- floor(length(x)/2)
z <- rep(c(1,2), each=n)
if (length(x) %% 2==1) {
z<- c(z, c(1,2)[sample(1:2, 1)])
}
sample(z)
}
因为两者都使用sample,所以它应该是对每个组的随机分配。如果您不介意一列系统性地比另一列长,您可以使用
grp <- with(Df1, ave(ID, ID, FUN=function(x) sample(gl(2,1,length(x)))))
这将始终将额外的样本放入第1组。如果您想随机化剩菜的放置,那么像这样的辅助函数可能会有所帮助
markhalf <- function(x) {
n <- floor(length(x)/2)
z <- rep(c(1,2), each=n)
if (length(x) %% 2==1) {
z<- c(z, c(1,2)[sample(1:2, 1)])
}
sample(z)
}
因为两者都使用sample,所以它应该是对每个组的随机分配。如果您不介意一列系统性地比另一列长,您可以使用
grp <- with(Df1, ave(ID, ID, FUN=function(x) sample(gl(2,1,length(x)))))
这将始终将额外的样本放入第1组。如果您想随机化剩菜的放置,那么像这样的辅助函数可能会有所帮助
markhalf <- function(x) {
n <- floor(length(x)/2)
z <- rep(c(1,2), each=n)
if (length(x) %% 2==1) {
z<- c(z, c(1,2)[sample(1:2, 1)])
}
sample(z)
}
因为两者都使用sample,所以它应该是对每个组的随机分配。如果您不介意一列系统性地比另一列长,您可以使用
grp <- with(Df1, ave(ID, ID, FUN=function(x) sample(gl(2,1,length(x)))))
这将始终将额外的样本放入第1组。如果您想随机化剩菜的放置,那么像这样的辅助函数可能会有所帮助
markhalf <- function(x) {
n <- floor(length(x)/2)
z <- rep(c(1,2), each=n)
if (length(x) %% 2==1) {
z<- c(z, c(1,2)[sample(1:2, 1)])
}
sample(z)
}
因为两者都使用sample,所以它应该是对每个组的随机分配。这似乎应该满足您的要求:
set.seed(1) # So you can reproduce my result
## Create an indicator column that will take the values of 0 and 1
## Initialize it with 0
Df1$ind <- 0
## Use `by` and `sample` to get half of the rows for each ID
## Assign "1" to the "ind" column for those rows
Df1$ind[unlist(by(1:nrow(Df1), Df1$ID,
function(x) sample(x, ceiling(length(x)/2), FALSE)))] <- 1
## Create a "time" variable based on the "ID" and "ind" columns
Df1$time <- with(Df1, ave(ind, ID, ind, FUN = seq_along))
## Reshape the data (if required) into columns based on the indicator column
## The ID and time columns would serve as your unique IDs
library(reshape2)
dcast(Df1, ID + time ~ ind, value.var="var")
# ID time 0 1
# 1 1 1 100 200
# 2 1 2 400 250
# 3 1 3 425 250
# 4 2 1 80 120
# 5 2 2 210 175
# 6 2 3 50 200
# 7 3 1 300 90
# 8 3 2 500 70
# 9 3 3 NA 400
这似乎可以满足您的需求:
set.seed(1) # So you can reproduce my result
## Create an indicator column that will take the values of 0 and 1
## Initialize it with 0
Df1$ind <- 0
## Use `by` and `sample` to get half of the rows for each ID
## Assign "1" to the "ind" column for those rows
Df1$ind[unlist(by(1:nrow(Df1), Df1$ID,
function(x) sample(x, ceiling(length(x)/2), FALSE)))] <- 1
## Create a "time" variable based on the "ID" and "ind" columns
Df1$time <- with(Df1, ave(ind, ID, ind, FUN = seq_along))
## Reshape the data (if required) into columns based on the indicator column
## The ID and time columns would serve as your unique IDs
library(reshape2)
dcast(Df1, ID + time ~ ind, value.var="var")
# ID time 0 1
# 1 1 1 100 200
# 2 1 2 400 250
# 3 1 3 425 250
# 4 2 1 80 120
# 5 2 2 210 175
# 6 2 3 50 200
# 7 3 1 300 90
# 8 3 2 500 70
# 9 3 3 NA 400
这似乎可以满足您的需求:
set.seed(1) # So you can reproduce my result
## Create an indicator column that will take the values of 0 and 1
## Initialize it with 0
Df1$ind <- 0
## Use `by` and `sample` to get half of the rows for each ID
## Assign "1" to the "ind" column for those rows
Df1$ind[unlist(by(1:nrow(Df1), Df1$ID,
function(x) sample(x, ceiling(length(x)/2), FALSE)))] <- 1
## Create a "time" variable based on the "ID" and "ind" columns
Df1$time <- with(Df1, ave(ind, ID, ind, FUN = seq_along))
## Reshape the data (if required) into columns based on the indicator column
## The ID and time columns would serve as your unique IDs
library(reshape2)
dcast(Df1, ID + time ~ ind, value.var="var")
# ID time 0 1
# 1 1 1 100 200
# 2 1 2 400 250
# 3 1 3 425 250
# 4 2 1 80 120
# 5 2 2 210 175
# 6 2 3 50 200
# 7 3 1 300 90
# 8 3 2 500 70
# 9 3 3 NA 400
这似乎可以满足您的需求:
set.seed(1) # So you can reproduce my result
## Create an indicator column that will take the values of 0 and 1
## Initialize it with 0
Df1$ind <- 0
## Use `by` and `sample` to get half of the rows for each ID
## Assign "1" to the "ind" column for those rows
Df1$ind[unlist(by(1:nrow(Df1), Df1$ID,
function(x) sample(x, ceiling(length(x)/2), FALSE)))] <- 1
## Create a "time" variable based on the "ID" and "ind" columns
Df1$time <- with(Df1, ave(ind, ID, ind, FUN = seq_along))
## Reshape the data (if required) into columns based on the indicator column
## The ID and time columns would serve as your unique IDs
library(reshape2)
dcast(Df1, ID + time ~ ind, value.var="var")
# ID time 0 1
# 1 1 1 100 200
# 2 1 2 400 250
# 3 1 3 425 250
# 4 2 1 80 120
# 5 2 2 210 175
# 6 2 3 50 200
# 7 3 1 300 90
# 8 3 2 500 70
# 9 3 3 NA 400
你想用奇数个观察值的组做什么?好问题。如果有一个奇数,如果一列比另一列长就可以了,因为我最终要对它们进行聚合。你想对奇数个观察值的组做什么?好问题。如果有一个奇数,如果一列比另一列长就可以了,因为我最终要对它们进行聚合。你想对奇数个观察值的组做什么?好问题。如果有一个奇数,如果一列比另一列长就可以了,因为我最终要对它们进行聚合。你想对奇数个观察值的组做什么?好问题。如果有一个奇数,如果一列比另一列长就可以了,因为我最终要对它们进行聚合。