R 基于频率排序的困难随机化
我有这样一个数据帧: x=数据帧(A=c(“D1”、“D1”、“D1”、“D1”、“D1”、“D1”、“D2”、“D3”、“D3”、“D4”、“D4”、“D5”、“D5”),B=c(“A1”、“A3”、“A4”、“A5”、“A6”、“A5”、“A5”、“A6”、“A1”、“A2”、“A5”、“A6”)) 要按B列排序,B列中的实体具有不同的频率R 基于频率排序的困难随机化,r,random,R,Random,我有这样一个数据帧: x=数据帧(A=c(“D1”、“D1”、“D1”、“D1”、“D1”、“D1”、“D2”、“D3”、“D3”、“D4”、“D4”、“D5”、“D5”),B=c(“A1”、“A3”、“A4”、“A5”、“A6”、“A5”、“A5”、“A6”、“A1”、“A2”、“A5”、“A6”)) 要按B列排序,B列中的实体具有不同的频率 A B freq(B) D1 A1 2 D4 A1 2 D4 A2 1 D1 A3 1 D1 A4 1
A B freq(B)
D1 A1 2
D4 A1 2
D4 A2 1
D1 A3 1
D1 A4 1
D1 A5 4
D2 A5 4
D3 A5 4
D5 A5 4
D1 A6 4
D3 A6 4
D4 A6 4
D5 A6 4
我想在数据帧x的B列上生成一个随机数据帧,但是只有在条目的频率相同或相似(+/-一个秩)的情况下才能进行随机化。让我们说吧。现在,A2、A3、A4的频率为1,因此A2、A3和A4可以彼此自由替换,但不能替换为A5、A6或A1。类似地,由于A5和A6的频率为4,它们可以在它们之间随机分配。对于A1,这是唯一一个频率为2的条目(根据频率(B)排名第二),由于无法进行替换,因此为A1提供了特殊条件。A1可以随机替换为A2、A3、A4(排名第一的等级(根据频率(B))低于A1)或A5/A6(排名第一的等级(根据频率(B))高于A1)
R是否可以很容易地完成这项工作?关于随机化的问题的下半部分有点不清楚,但这是一个开始。当你更新你的问题时,我会相应地更新答案。下面的代码添加列B的计数信息,然后根据我们添加的频率列的值对行进行采样。我认为这里需要做的就是修改哪些列可用于采样的可用性,但请确认您想要什么
require(plyr)
x <- merge(x,count(x, "B"))
ddply(x, "freq", function(x) sample(x))
require(plyr)
关于随机化的问题的下半部分有点不清楚,但这是一个开始。当你更新你的问题时,我会相应地更新答案。下面的代码添加列B的计数信息,然后根据我们添加的频率列的值对行进行采样。我认为这里需要做的就是修改哪些列可用于采样的可用性,但请确认您想要什么
require(plyr)
x <- merge(x,count(x, "B"))
ddply(x, "freq", function(x) sample(x))
require(plyr)
x第一部分很容易由mypermute
包中的函数处理(目前仅启用)
我们可以把它包装成一个语句来生成n个置换
ctrl <- permControl(strata = factor(x$freq))
n <- 10
set.seed(83)
IND <- replicate(n, permuted.index(NROW(x), control = ctrl))
现在你还需要做一些特殊的取样。如果我理解正确,您需要的是确定哪个频率级别仅由单个B级别组成。然后,可能随机地,将该频率级别中的B替换为从相邻频率级别中的B中随机选择的B。如果是这样,那么获取要替换的正确行就有点复杂了,但我认为下面的函数可以做到这一点:
randSampleSpecial <- function(x, replace = TRUE) {
## have we got access to permute?
stopifnot(require(permute))
## generate a random permutation within the levels of freq
ind <- permuted.index(NROW(x),
control = permControl(strata = factor(x$freq)))
## split freq into freq classes
ranks <- with(x, split(freq, freq))
## rank the freq classes
Ranked <- rank(as.numeric(names(ranks)))
## split the Bs on basis of freq classes
Bs <- with(x, split(B, freq))
## number of unique Bs in freq class
uniq <- sapply(Bs, function(x) length(unique(x)))
## which contain only a single type of B?
repl <- which(uniq == 1)
## if there are no freq classes with only one level of B, return
if(!(length(repl) > 0))
return(ind)
## if not, continue
## which of the freq classes are adjacent to unique class?
other <- which(Ranked %in% (repl + c(1,-1)))
## generate uniform random numbers to decide if we replace
Rand <- runif(length(ranks[[repl]]))
## Which are the rows in `x` that we want to change?
candidates <- with(x, which(freq == as.numeric(names(uniq[repl]))))
## which are the adjacent values we can replace with
replacements <- with(x, which(freq %in% as.numeric(names(uniq[other]))))
## which candidates to replace? Decision is random
change <- sample(candidates, sum(Rand > 0.5))
## if we are changing a candidate, sample from the replacements and
## assign
if(length(change) > 0)
ind[candidates][change] <- sample(ind[replacements], length(change),
replace = replace)
## return
ind
}
我们可以将其包装在replicate()
调用中,以生成许多这样的替换:
R> IND <- replicate(10, randSampleSpecial(x))
R> IND
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 11 3 6 4 2 1 1 2 10 3
[2,] 1 11 1 12 11 11 2 1 1 13
[3,] 4 5 4 3 4 3 4 5 5 4
[4,] 5 4 5 5 5 4 5 3 3 3
[5,] 3 3 3 4 3 5 3 4 4 5
[6,] 11 7 11 12 9 6 7 8 9 9
[7,] 13 12 12 7 11 7 9 10 8 10
[8,] 10 8 9 8 12 12 8 6 13 8
[9,] 7 9 13 10 8 10 13 9 12 11
[10,] 6 11 10 11 10 13 12 13 10 13
[11,] 12 10 6 6 6 9 11 12 7 12
[12,] 9 6 7 9 7 8 10 7 6 7
[13,] 8 13 8 13 13 11 6 11 11 6
R>IND
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 11 3 6 4 2 1 1 2 10 3
[2,] 1 11 1 12 11 11 2 1 1 13
[3,] 4 5 4 3 4 3 4 5 5 4
[4,] 5 4 5 5 5 4 5 3 3 3
[5,] 3 3 3 4 3 5 3 4 4 5
[6,] 11 7 11 12 9 6 7 8 9 9
[7,] 13 12 12 7 11 7 9 10 8 10
[8,] 10 8 9 8 12 12 8 6 13 8
[9,] 7 9 13 10 8 10 13 9 12 11
[10,] 6 11 10 11 10 13 12 13 10 13
[11,] 12 10 6 6 6 9 11 12 7 12
[12,] 9 6 7 9 7 8 10 7 6 7
[13,] 8 13 8 13 13 11 6 11 11 6
对于该数据集,我们知道可能要用其他freq类的值替换排序后的x
中的第1行和第2行。如果我们没有进行替换,则IND
的前两行中只有1
或2
的值(参见前面的IND
)。在新的IND
中,前两行中的值不是a1
或2
,我们将其替换为相邻频率类别之一的B
我的函数假定您希望:
仅随机用相邻类别中的一个替换同质频率类别中的元素 !如果您希望始终替换,则我们会根据需要更改功能
如果我们正在进行替换,则该替换可以是任何替换,如果我们需要多个替换,则可以多次选择相同的替换。在调用中设置replace=FALSE
,以便在不进行替换的情况下进行采样(如果需要)
该函数假定您只有一个单一特定频率类别。如果在两个或更多的单特定类上使用循环可以很容易地修改,但是这会使函数变得复杂,并且由于您对问题的描述不太清楚,所以我保持简单
第一部分很容易由我的permute
包中的函数处理(目前仅启用)
我们可以把它包装成一个语句来生成n个置换
ctrl <- permControl(strata = factor(x$freq))
n <- 10
set.seed(83)
IND <- replicate(n, permuted.index(NROW(x), control = ctrl))
现在你还需要做一些特殊的取样。如果我理解正确,您需要的是确定哪个频率级别仅由单个B级别组成。然后,可能随机地,将该频率级别中的B替换为从相邻频率级别中的B中随机选择的B。如果是这样,那么获取要替换的正确行就有点复杂了,但我认为下面的函数可以做到这一点:
randSampleSpecial <- function(x, replace = TRUE) {
## have we got access to permute?
stopifnot(require(permute))
## generate a random permutation within the levels of freq
ind <- permuted.index(NROW(x),
control = permControl(strata = factor(x$freq)))
## split freq into freq classes
ranks <- with(x, split(freq, freq))
## rank the freq classes
Ranked <- rank(as.numeric(names(ranks)))
## split the Bs on basis of freq classes
Bs <- with(x, split(B, freq))
## number of unique Bs in freq class
uniq <- sapply(Bs, function(x) length(unique(x)))
## which contain only a single type of B?
repl <- which(uniq == 1)
## if there are no freq classes with only one level of B, return
if(!(length(repl) > 0))
return(ind)
## if not, continue
## which of the freq classes are adjacent to unique class?
other <- which(Ranked %in% (repl + c(1,-1)))
## generate uniform random numbers to decide if we replace
Rand <- runif(length(ranks[[repl]]))
## Which are the rows in `x` that we want to change?
candidates <- with(x, which(freq == as.numeric(names(uniq[repl]))))
## which are the adjacent values we can replace with
replacements <- with(x, which(freq %in% as.numeric(names(uniq[other]))))
## which candidates to replace? Decision is random
change <- sample(candidates, sum(Rand > 0.5))
## if we are changing a candidate, sample from the replacements and
## assign
if(length(change) > 0)
ind[candidates][change] <- sample(ind[replacements], length(change),
replace = replace)
## return
ind
}
我们可以将其包装在replicate()
调用中,以生成许多这样的替换:
R> IND <- replicate(10, randSampleSpecial(x))
R> IND
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 11 3 6 4 2 1 1 2 10 3
[2,] 1 11 1 12 11 11 2 1 1 13
[3,] 4 5 4 3 4 3 4 5 5 4
[4,] 5 4 5 5 5 4 5 3 3 3
[5,] 3 3 3 4 3 5 3 4 4 5
[6,] 11 7 11 12 9 6 7 8 9 9
[7,] 13 12 12 7 11 7 9 10 8 10
[8,] 10 8 9 8 12 12 8 6 13 8
[9,] 7 9 13 10 8 10 13 9 12 11
[10,] 6 11 10 11 10 13 12 13 10 13
[11,] 12 10 6 6 6 9 11 12 7 12
[12,] 9 6 7 9 7 8 10 7 6 7
[13,] 8 13 8 13 13 11 6 11 11 6
R>IND
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 11 3 6 4 2 1 1 2 10 3
[2,] 1 11 1 12 11 11 2 1 1 13
[3,] 4 5 4 3 4 3 4 5 5 4
[4,] 5 4 5 5 5 4 5 3
R> IND <- replicate(10, randSampleSpecial(x))
R> IND
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 11 3 6 4 2 1 1 2 10 3
[2,] 1 11 1 12 11 11 2 1 1 13
[3,] 4 5 4 3 4 3 4 5 5 4
[4,] 5 4 5 5 5 4 5 3 3 3
[5,] 3 3 3 4 3 5 3 4 4 5
[6,] 11 7 11 12 9 6 7 8 9 9
[7,] 13 12 12 7 11 7 9 10 8 10
[8,] 10 8 9 8 12 12 8 6 13 8
[9,] 7 9 13 10 8 10 13 9 12 11
[10,] 6 11 10 11 10 13 12 13 10 13
[11,] 12 10 6 6 6 9 11 12 7 12
[12,] 9 6 7 9 7 8 10 7 6 7
[13,] 8 13 8 13 13 11 6 11 11 6
Permdf <- function(x,v){
# some code to allow Permdf(df,var)
mc <- match.call()
v <- as.quoted(mc$v)
y <- unlist(eval.quoted(v,x))
# make bins with values in v per frequency
freqs <- count(x,v)
bins <- split(freqs[[1]],freqs[[2]])
nbins <- length(bins)
# define the output
dfid <- 1:nrow(x)
for (i in 1:nbins){
# which id's to change
id <- which(y %in% bins[[i]])
if(length(bins[[i]]) > 1){
# in case there's more than one value for that frequency
dfid[id] <- sample(dfid[id])
} else {
bid <- c(i-1,i,i+1)
# control wether id in range
bid <- bid[bid > 0 & bid <=nbins]
# id values to choose from
vid <- which(y %in% unlist(bins[bid]))
# random selection
dfid[id] <- sample(vid,length(id),replace=TRUE)
}
}
#return
dfid
}
Permdf(x,B)