R 使用自然递减数创建模拟数据
我想创建如下所示的随机模拟数据R 使用自然递减数创建模拟数据,r,R,我想创建如下所示的随机模拟数据 __ID__|__Amount__ 1 20 1 14 1 9 1 3 2 11 2 5 2 2 从随机数开始,但具有相同ID的第二个数应小于第一个数,第三个数必须小于第二个数。启动的最大数量应为20 如果希望Amount列是真正的随机值,这是一个棘手的问题,您可以使用递归调用,递归使用sample: ## Recursively
__ID__|__Amount__
1 20
1 14
1 9
1 3
2 11
2 5
2 2
从随机数开始,但具有相同ID的第二个数应小于第一个数,第三个数必须小于第二个数。启动的最大数量应为20 如果希望
Amount
列是真正的随机值,这是一个棘手的问题,您可以使用递归调用,递归使用sample
:
## Recursively sampling from a uniform distribution
recursive.sample <- function(start, end, length, results = NA, counter =0) {
## To enter the recursion, counter must be smaller than the length out
## and the last result must be smaller than the starting point (except the firs time)
if(counter < length && ifelse(counter != 0, results[counter] > start, TRUE)){
## Increment the counter
counter <- counter + 1
## Sample between start and the last result or the start and the end of the vector
results[counter] <- ifelse(counter != 1, sample(start:results[counter-1], 1), sample(start:end, 1))
## Recursive call
return(recursive.sample(start = start, end = end, length = length, results = results, counter = counter))
} else {
## Exit the recursion
return(results)
}
}
## Example
set.seed(0)
recursive.sample(start = 1, end = 20, length = 3, results = NA, counter = 0)
#[1] 18 5 2
请注意,由于在递归函数中采样较高值的概率较低,因此结果有所不同
然后,您可以使用所选函数轻松创建表,如下所示:
set.seed(123)
## The ID column
ID <- c(rep(1, 4), rep(2,3))
## The Amount column
Amount <- c(recursive.sample(1, 20, 4, NA, 0), recursive.sample(1, 11, 3, NA, 0))
## The table
cbind(ID, Amount)
# ID Amount
#[1,] 1 18
#[2,] 1 5
#[3,] 1 2
#[4,] 1 2
#[5,] 2 10
#[6,] 2 3
#[7,] 2 3
set.seed(123)
##ID列
ID您可以先创建数据,然后使用tidyverse
根据需要对其进行排序:
set.seed(0)
df <- data.frame(id = rep(1:3,10), amt = sample(1:20, 30, replace = TRUE))
df %>%
group_by(id) %>%
arrange(id, desc(amt))
set.seed(0)
df%
分组依据(id)%>%
排列(id,描述(金额))
两种方法,一种使用dplyr
,另一种仅使用基本R函数。这与前面的两个解决方案略有不同
我使用了排序ID列,但这不是必需的
方法1
rm(list=ls())
种子(1)
df%分组依据(ID)%>%
变异(数量=排序(样本(1:20,n(),replace=T),递减=TRUE))
方法2
rm(list=ls())
种子(1)
df
set.seed(0)
df <- data.frame(id = rep(1:3,10), amt = sample(1:20, 30, replace = TRUE))
df %>%
group_by(id) %>%
arrange(id, desc(amt))
rm(list = ls())
set.seed(1)
df <- data.frame(ID = rep(1:3, each = 5))
df %>% group_by(ID) %>%
mutate(Amount = sort(sample(1 : 20, n(), replace = T), decreasing = TRUE))
rm(list = ls())
set.seed(1)
df <- data.frame(ID = rep(1:3, each = 5))
df$Amount <- NA
uniq_ID <- unique(df$ID)
index_lst <- lapply(uniq_ID, function(x) which(df$ID == x))
res <- lapply(index_lst, function(x) sort(sample(1 : 20, length(x)),
decreasing = TRUE))
df$Amount[unlist(index_lst)] <- unlist(res)
rm(list = ls())
set.seed(1)
df <- data.frame(ID = rep(1:3, each = 5))
df$Amount <- NA
tab <- as.data.frame(table(df$ID))
lapply(1 : nrow(tab), function(x) df$Amount[which(df$ID == tab$Var1[x])] <<-
sort(sample(1 : 20, tab$Freq[x]), decreasing = TRUE))