对于每个id，从dataframe列中随机标记或选择一半的值，以创建两个单独的变量？_R

对于每个id，从dataframe列中随机标记或选择一半的值，以创建两个单独的变量？

对于每个id，从dataframe列中随机标记或选择一半的值，以创建两个单独的变量？,r,R,我想从1列中为每个唯一标识符ID创建2个变量。我想随机选择一半的值作为一个变量，剩下的一半作为另一个变量。下面是一个示例数据帧： Df1 <- data.frame(ID = c(1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,3), var = c(100, 200, 250, 400,425,250,80, 120, 210, 175,50,200,300, 90, 70, 500,400) 任何帮助都将不胜感激感谢在各种库中有

我想从1列中为每个唯一标识符ID创建2个变量。我想随机选择一半的值作为一个变量，剩下的一半作为另一个变量。下面是一个示例数据帧：

    Df1 <- data.frame(ID = c(1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,3), 
              var = c(100, 200, 250, 400,425,250,80, 120, 210, 175,50,200,300, 90, 70, 500,400)

任何帮助都将不胜感激

感谢

在各种库中有许多复杂的测试/训练数据分割功能。这是一个基于随机样本的非常简单的例子

i = sample(1:nrow(Df1), size = floor(0.5*nrow(Df1)))
Df.set1 = Df1[i,]
Df.set2 = Df1[-i,]

在各种库中有许多复杂的测试/训练数据分割函数。这是一个基于随机样本的非常简单的例子

i = sample(1:nrow(Df1), size = floor(0.5*nrow(Df1)))
Df.set1 = Df1[i,]
Df.set2 = Df1[-i,]

在各种库中有许多复杂的测试/训练数据分割函数。这是一个基于随机样本的非常简单的例子

i = sample(1:nrow(Df1), size = floor(0.5*nrow(Df1)))
Df.set1 = Df1[i,]
Df.set2 = Df1[-i,]

在各种库中有许多复杂的测试/训练数据分割函数。这是一个基于随机样本的非常简单的例子

i = sample(1:nrow(Df1), size = floor(0.5*nrow(Df1)))
Df.set1 = Df1[i,]
Df.set2 = Df1[-i,]

如果您不介意一列系统性地比另一列长，您可以使用

grp <- with(Df1, ave(ID, ID, FUN=function(x) sample(gl(2,1,length(x)))))

这将始终将额外的样本放入第1组。如果您想随机化剩菜的放置，那么像这样的辅助函数可能会有所帮助

markhalf <- function(x) {
  n <- floor(length(x)/2)
  z <- rep(c(1,2), each=n)
  if (length(x) %% 2==1) {
     z<- c(z, c(1,2)[sample(1:2, 1)])
  }
  sample(z)
}

因为两者都使用sample，所以它应该是对每个组的随机分配。

如果您不介意一列系统性地比另一列长，您可以使用

grp <- with(Df1, ave(ID, ID, FUN=function(x) sample(gl(2,1,length(x)))))

这将始终将额外的样本放入第1组。如果您想随机化剩菜的放置，那么像这样的辅助函数可能会有所帮助

markhalf <- function(x) {
  n <- floor(length(x)/2)
  z <- rep(c(1,2), each=n)
  if (length(x) %% 2==1) {
     z<- c(z, c(1,2)[sample(1:2, 1)])
  }
  sample(z)
}

因为两者都使用sample，所以它应该是对每个组的随机分配。

如果您不介意一列系统性地比另一列长，您可以使用

grp <- with(Df1, ave(ID, ID, FUN=function(x) sample(gl(2,1,length(x)))))

这将始终将额外的样本放入第1组。如果您想随机化剩菜的放置，那么像这样的辅助函数可能会有所帮助

markhalf <- function(x) {
  n <- floor(length(x)/2)
  z <- rep(c(1,2), each=n)
  if (length(x) %% 2==1) {
     z<- c(z, c(1,2)[sample(1:2, 1)])
  }
  sample(z)
}

因为两者都使用sample，所以它应该是对每个组的随机分配。

如果您不介意一列系统性地比另一列长，您可以使用

grp <- with(Df1, ave(ID, ID, FUN=function(x) sample(gl(2,1,length(x)))))

这将始终将额外的样本放入第1组。如果您想随机化剩菜的放置，那么像这样的辅助函数可能会有所帮助

markhalf <- function(x) {
  n <- floor(length(x)/2)
  z <- rep(c(1,2), each=n)
  if (length(x) %% 2==1) {
     z<- c(z, c(1,2)[sample(1:2, 1)])
  }
  sample(z)
}

因为两者都使用sample，所以它应该是对每个组的随机分配。

这似乎应该满足您的要求：

set.seed(1)   # So you can reproduce my result

## Create an indicator column that will take the values of 0 and 1
## Initialize it with 0
Df1$ind <- 0

## Use `by` and `sample` to get half of the rows for each ID
## Assign "1" to the "ind" column for those rows
Df1$ind[unlist(by(1:nrow(Df1), Df1$ID, 
                  function(x) sample(x, ceiling(length(x)/2), FALSE)))] <- 1

## Create a "time" variable based on the "ID" and "ind" columns
Df1$time <- with(Df1, ave(ind, ID, ind, FUN = seq_along))

## Reshape the data (if required) into columns based on the indicator column
## The ID and time columns would serve as your unique IDs
library(reshape2)
dcast(Df1, ID + time ~ ind, value.var="var")
#   ID time   0   1
# 1  1    1 100 200
# 2  1    2 400 250
# 3  1    3 425 250
# 4  2    1  80 120
# 5  2    2 210 175
# 6  2    3  50 200
# 7  3    1 300  90
# 8  3    2 500  70
# 9  3    3  NA 400

这似乎可以满足您的需求：

set.seed(1)   # So you can reproduce my result

## Create an indicator column that will take the values of 0 and 1
## Initialize it with 0
Df1$ind <- 0

## Use `by` and `sample` to get half of the rows for each ID
## Assign "1" to the "ind" column for those rows
Df1$ind[unlist(by(1:nrow(Df1), Df1$ID, 
                  function(x) sample(x, ceiling(length(x)/2), FALSE)))] <- 1

## Create a "time" variable based on the "ID" and "ind" columns
Df1$time <- with(Df1, ave(ind, ID, ind, FUN = seq_along))

## Reshape the data (if required) into columns based on the indicator column
## The ID and time columns would serve as your unique IDs
library(reshape2)
dcast(Df1, ID + time ~ ind, value.var="var")
#   ID time   0   1
# 1  1    1 100 200
# 2  1    2 400 250
# 3  1    3 425 250
# 4  2    1  80 120
# 5  2    2 210 175
# 6  2    3  50 200
# 7  3    1 300  90
# 8  3    2 500  70
# 9  3    3  NA 400

这似乎可以满足您的需求：

set.seed(1)   # So you can reproduce my result

## Create an indicator column that will take the values of 0 and 1
## Initialize it with 0
Df1$ind <- 0

## Use `by` and `sample` to get half of the rows for each ID
## Assign "1" to the "ind" column for those rows
Df1$ind[unlist(by(1:nrow(Df1), Df1$ID, 
                  function(x) sample(x, ceiling(length(x)/2), FALSE)))] <- 1

## Create a "time" variable based on the "ID" and "ind" columns
Df1$time <- with(Df1, ave(ind, ID, ind, FUN = seq_along))

## Reshape the data (if required) into columns based on the indicator column
## The ID and time columns would serve as your unique IDs
library(reshape2)
dcast(Df1, ID + time ~ ind, value.var="var")
#   ID time   0   1
# 1  1    1 100 200
# 2  1    2 400 250
# 3  1    3 425 250
# 4  2    1  80 120
# 5  2    2 210 175
# 6  2    3  50 200
# 7  3    1 300  90
# 8  3    2 500  70
# 9  3    3  NA 400

这似乎可以满足您的需求：

set.seed(1)   # So you can reproduce my result

## Create an indicator column that will take the values of 0 and 1
## Initialize it with 0
Df1$ind <- 0

## Use `by` and `sample` to get half of the rows for each ID
## Assign "1" to the "ind" column for those rows
Df1$ind[unlist(by(1:nrow(Df1), Df1$ID, 
                  function(x) sample(x, ceiling(length(x)/2), FALSE)))] <- 1

## Create a "time" variable based on the "ID" and "ind" columns
Df1$time <- with(Df1, ave(ind, ID, ind, FUN = seq_along))

## Reshape the data (if required) into columns based on the indicator column
## The ID and time columns would serve as your unique IDs
library(reshape2)
dcast(Df1, ID + time ~ ind, value.var="var")
#   ID time   0   1
# 1  1    1 100 200
# 2  1    2 400 250
# 3  1    3 425 250
# 4  2    1  80 120
# 5  2    2 210 175
# 6  2    3  50 200
# 7  3    1 300  90
# 8  3    2 500  70
# 9  3    3  NA 400

你想用奇数个观察值的组做什么？好问题。如果有一个奇数，如果一列比另一列长就可以了，因为我最终要对它们进行聚合。你想对奇数个观察值的组做什么？好问题。如果有一个奇数，如果一列比另一列长就可以了，因为我最终要对它们进行聚合。你想对奇数个观察值的组做什么？好问题。如果有一个奇数，如果一列比另一列长就可以了，因为我最终要对它们进行聚合。你想对奇数个观察值的组做什么？好问题。如果有一个奇数，如果一列比另一列长就可以了，因为我最终要对它们进行聚合。