R 基于频率排序的困难随机化_R_Random

R 基于频率排序的困难随机化

r random

R 基于频率排序的困难随机化,r,random,R,Random,我有这样一个数据帧： x=数据帧（A=c（“D1”、“D1”、“D1”、“D1”、“D1”、“D1”、“D2”、“D3”、“D3”、“D4”、“D4”、“D5”、“D5”），B=c（“A1”、“A3”、“A4”、“A5”、“A6”、“A5”、“A5”、“A6”、“A1”、“A2”、“A5”、“A6”））要按B列排序，B列中的实体具有不同的频率 A B freq(B) D1 A1 2 D4 A1 2 D4 A2 1 D1 A3 1 D1 A4 1

我有这样一个数据帧：

x=数据帧（A=c（“D1”、“D1”、“D1”、“D1”、“D1”、“D1”、“D2”、“D3”、“D3”、“D4”、“D4”、“D5”、“D5”），B=c（“A1”、“A3”、“A4”、“A5”、“A6”、“A5”、“A5”、“A6”、“A1”、“A2”、“A5”、“A6”））

要按B列排序，B列中的实体具有不同的频率

A   B   freq(B)  
D1  A1  2  
D4  A1  2  
D4  A2  1  
D1  A3  1  
D1  A4  1  
D1  A5  4  
D2  A5  4  
D3  A5  4  
D5  A5  4  
D1  A6  4  
D3  A6  4  
D4  A6  4  
D5  A6  4

我想在数据帧x的B列上生成一个随机数据帧，但是只有在条目的频率相同或相似（+/-一个秩）的情况下才能进行随机化。让我们说吧。现在，A2、A3、A4的频率为1，因此A2、A3和A4可以彼此自由替换，但不能替换为A5、A6或A1。类似地，由于A5和A6的频率为4，它们可以在它们之间随机分配。对于A1，这是唯一一个频率为2的条目（根据频率（B）排名第二），由于无法进行替换，因此为A1提供了特殊条件。A1可以随机替换为A2、A3、A4（排名第一的等级（根据频率（B））低于A1）或A5/A6（排名第一的等级（根据频率（B））高于A1）

R是否可以很容易地完成这项工作？

关于随机化的问题的下半部分有点不清楚，但这是一个开始。当你更新你的问题时，我会相应地更新答案。下面的代码添加列B的计数信息，然后根据我们添加的频率列的值对行进行采样。我认为这里需要做的就是修改哪些列可用于采样的可用性，但请确认您想要什么

require(plyr)
x <- merge(x,count(x, "B"))
ddply(x, "freq", function(x) sample(x))

require（plyr）
关于随机化的问题的下半部分有点不清楚，但这是一个开始。当你更新你的问题时，我会相应地更新答案。下面的代码添加列B的计数信息，然后根据我们添加的频率列的值对行进行采样。我认为这里需要做的就是修改哪些列可用于采样的可用性，但请确认您想要什么
require(plyr)
x <- merge(x,count(x, "B"))
ddply(x, "freq", function(x) sample(x))

require（plyr）
x第一部分很容易由mypermute
包中的函数处理（目前仅启用）
我们可以把它包装成一个语句来生成n个置换
ctrl <- permControl(strata = factor(x$freq))
n <- 10
set.seed(83)
IND <- replicate(n, permuted.index(NROW(x), control = ctrl))

现在你还需要做一些特殊的取样。如果我理解正确，您需要的是确定哪个频率级别仅由单个B级别组成。然后，可能随机地，将该频率级别中的B替换为从相邻频率级别中的B中随机选择的B。如果是这样，那么获取要替换的正确行就有点复杂了，但我认为下面的函数可以做到这一点：
randSampleSpecial <- function(x, replace = TRUE) {
    ## have we got access to permute?
    stopifnot(require(permute))
    ## generate a random permutation within the levels of freq
    ind <- permuted.index(NROW(x), 
                          control = permControl(strata = factor(x$freq)))
    ## split freq into freq classes
    ranks <- with(x, split(freq, freq))
    ## rank the freq classes
    Ranked <- rank(as.numeric(names(ranks)))
    ## split the Bs on basis of freq classes
    Bs <- with(x, split(B, freq))
    ## number of unique Bs in freq class
    uniq <- sapply(Bs, function(x) length(unique(x)))
    ## which contain only a single type of B?
    repl <- which(uniq == 1)
    ## if there are no freq classes with only one level of B, return
    if(!(length(repl) > 0))
        return(ind) 
    ## if not, continue
    ## which of the freq classes are adjacent to unique class?
    other <- which(Ranked %in% (repl + c(1,-1)))
    ## generate uniform random numbers to decide if we replace
    Rand <- runif(length(ranks[[repl]]))
    ## Which are the rows in `x` that we want to change?
    candidates <- with(x, which(freq == as.numeric(names(uniq[repl]))))
    ## which are the adjacent values we can replace with
    replacements <- with(x, which(freq %in% as.numeric(names(uniq[other]))))
    ## which candidates to replace? Decision is random
    change <- sample(candidates, sum(Rand > 0.5))
    ## if we are changing a candidate, sample from the replacements and
    ## assign
    if(length(change) > 0)
        ind[candidates][change] <- sample(ind[replacements], length(change), 
                                          replace = replace)
    ## return
    ind
}

我们可以将其包装在replicate（）
调用中，以生成许多这样的替换：
R> IND <- replicate(10, randSampleSpecial(x))
R> IND
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
 [1,]   11    3    6    4    2    1    1    2   10     3
 [2,]    1   11    1   12   11   11    2    1    1    13
 [3,]    4    5    4    3    4    3    4    5    5     4
 [4,]    5    4    5    5    5    4    5    3    3     3
 [5,]    3    3    3    4    3    5    3    4    4     5
 [6,]   11    7   11   12    9    6    7    8    9     9
 [7,]   13   12   12    7   11    7    9   10    8    10
 [8,]   10    8    9    8   12   12    8    6   13     8
 [9,]    7    9   13   10    8   10   13    9   12    11
[10,]    6   11   10   11   10   13   12   13   10    13
[11,]   12   10    6    6    6    9   11   12    7    12
[12,]    9    6    7    9    7    8   10    7    6     7
[13,]    8   13    8   13   13   11    6   11   11     6

R>IND
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]   11    3    6    4    2    1    1    2   10     3
[2,]    1   11    1   12   11   11    2    1    1    13
[3,]    4    5    4    3    4    3    4    5    5     4
[4,]    5    4    5    5    5    4    5    3    3     3
[5,]    3    3    3    4    3    5    3    4    4     5
[6,]   11    7   11   12    9    6    7    8    9     9
[7,]   13   12   12    7   11    7    9   10    8    10
[8,]   10    8    9    8   12   12    8    6   13     8
[9,]    7    9   13   10    8   10   13    9   12    11
[10,]    6   11   10   11   10   13   12   13   10    13
[11,]   12   10    6    6    6    9   11   12    7    12
[12,]    9    6    7    9    7    8   10    7    6     7
[13,]    8   13    8   13   13   11    6   11   11     6

对于该数据集，我们知道可能要用其他freq类的值替换排序后的x
中的第1行和第2行。如果我们没有进行替换，则IND
的前两行中只有1
或2
的值（参见前面的IND
）。在新的IND
中，前两行中的值不是a1
或2
，我们将其替换为相邻频率类别之一的B
我的函数假定您希望：
仅随机用相邻类别中的一个替换同质频率类别中的元素！如果您希望始终替换，则我们会根据需要更改功能
如果我们正在进行替换，则该替换可以是任何替换，如果我们需要多个替换，则可以多次选择相同的替换。在调用中设置replace=FALSE
，以便在不进行替换的情况下进行采样（如果需要）
该函数假定您只有一个单一特定频率类别。如果在两个或更多的单特定类上使用循环可以很容易地修改，但是这会使函数变得复杂，并且由于您对问题的描述不太清楚，所以我保持简单

第一部分很容易由我的permute
包中的函数处理（目前仅启用）
我们可以把它包装成一个语句来生成n个置换
ctrl <- permControl(strata = factor(x$freq))
n <- 10
set.seed(83)
IND <- replicate(n, permuted.index(NROW(x), control = ctrl))

现在你还需要做一些特殊的取样。如果我理解正确，您需要的是确定哪个频率级别仅由单个B级别组成。然后，可能随机地，将该频率级别中的B替换为从相邻频率级别中的B中随机选择的B。如果是这样，那么获取要替换的正确行就有点复杂了，但我认为下面的函数可以做到这一点：
randSampleSpecial <- function(x, replace = TRUE) {
    ## have we got access to permute?
    stopifnot(require(permute))
    ## generate a random permutation within the levels of freq
    ind <- permuted.index(NROW(x), 
                          control = permControl(strata = factor(x$freq)))
    ## split freq into freq classes
    ranks <- with(x, split(freq, freq))
    ## rank the freq classes
    Ranked <- rank(as.numeric(names(ranks)))
    ## split the Bs on basis of freq classes
    Bs <- with(x, split(B, freq))
    ## number of unique Bs in freq class
    uniq <- sapply(Bs, function(x) length(unique(x)))
    ## which contain only a single type of B?
    repl <- which(uniq == 1)
    ## if there are no freq classes with only one level of B, return
    if(!(length(repl) > 0))
        return(ind) 
    ## if not, continue
    ## which of the freq classes are adjacent to unique class?
    other <- which(Ranked %in% (repl + c(1,-1)))
    ## generate uniform random numbers to decide if we replace
    Rand <- runif(length(ranks[[repl]]))
    ## Which are the rows in `x` that we want to change?
    candidates <- with(x, which(freq == as.numeric(names(uniq[repl]))))
    ## which are the adjacent values we can replace with
    replacements <- with(x, which(freq %in% as.numeric(names(uniq[other]))))
    ## which candidates to replace? Decision is random
    change <- sample(candidates, sum(Rand > 0.5))
    ## if we are changing a candidate, sample from the replacements and
    ## assign
    if(length(change) > 0)
        ind[candidates][change] <- sample(ind[replacements], length(change), 
                                          replace = replace)
    ## return
    ind
}

我们可以将其包装在replicate（）
调用中，以生成许多这样的替换：
R> IND <- replicate(10, randSampleSpecial(x))
R> IND
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
 [1,]   11    3    6    4    2    1    1    2   10     3
 [2,]    1   11    1   12   11   11    2    1    1    13
 [3,]    4    5    4    3    4    3    4    5    5     4
 [4,]    5    4    5    5    5    4    5    3    3     3
 [5,]    3    3    3    4    3    5    3    4    4     5
 [6,]   11    7   11   12    9    6    7    8    9     9
 [7,]   13   12   12    7   11    7    9   10    8    10
 [8,]   10    8    9    8   12   12    8    6   13     8
 [9,]    7    9   13   10    8   10   13    9   12    11
[10,]    6   11   10   11   10   13   12   13   10    13
[11,]   12   10    6    6    6    9   11   12    7    12
[12,]    9    6    7    9    7    8   10    7    6     7
[13,]    8   13    8   13   13   11    6   11   11     6

R>IND
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]   11    3    6    4    2    1    1    2   10     3
[2,]    1   11    1   12   11   11    2    1    1    13
[3,]    4    5    4    3    4    3    4    5    5     4
[4,]    5    4    5    5    5    4    5    3
R> IND <- replicate(10, randSampleSpecial(x))
R> IND
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
 [1,]   11    3    6    4    2    1    1    2   10     3
 [2,]    1   11    1   12   11   11    2    1    1    13
 [3,]    4    5    4    3    4    3    4    5    5     4
 [4,]    5    4    5    5    5    4    5    3    3     3
 [5,]    3    3    3    4    3    5    3    4    4     5
 [6,]   11    7   11   12    9    6    7    8    9     9
 [7,]   13   12   12    7   11    7    9   10    8    10
 [8,]   10    8    9    8   12   12    8    6   13     8
 [9,]    7    9   13   10    8   10   13    9   12    11
[10,]    6   11   10   11   10   13   12   13   10    13
[11,]   12   10    6    6    6    9   11   12    7    12
[12,]    9    6    7    9    7    8   10    7    6     7
[13,]    8   13    8   13   13   11    6   11   11     6

Permdf <- function(x,v){
  # some code to allow Permdf(df,var)
  mc <- match.call()
  v <- as.quoted(mc$v)
  y <- unlist(eval.quoted(v,x))
  # make bins with values in v per frequency
  freqs <- count(x,v)
  bins <- split(freqs[[1]],freqs[[2]])
  nbins <- length(bins)
  # define the output
  dfid <- 1:nrow(x)

  for (i in 1:nbins){
    # which id's to change
    id <- which(y %in% bins[[i]])

    if(length(bins[[i]]) > 1){
      # in case there's more than one value for that frequency
      dfid[id] <- sample(dfid[id])
    } else {
      bid <- c(i-1,i,i+1)
      # control wether id in range
      bid <- bid[bid > 0 & bid <=nbins]
      # id values to choose from
      vid <- which(y %in% unlist(bins[bid]))
      # random selection
      dfid[id] <- sample(vid,length(id),replace=TRUE)
    }
  }
  #return
  dfid
}

Permdf(x,B)