在R中的数据帧中,对被阻止行之间的随机列重新排序
我想在数据帧的给定行块之间分别随机重新排列每一列。行块是连续的,如图所示:在R中的数据帧中,对被阻止行之间的随机列重新排序,r,permutation,R,Permutation,我想在数据帧的给定行块之间分别随机重新排列每一列。行块是连续的,如图所示: mylist=list(1:50,51:52,53:102,103:128,129:154,155:180,181:206,207:232,233:258,259:284,285:310,311:336,337:362,363:388,389:414,415:440,441:466,467:492,493:518,519:544,545:570,571:596,597:622,623:648,649:674,675:70
mylist=list(1:50,51:52,53:102,103:128,129:154,155:180,181:206,207:232,233:258,259:284,285:310,311:336,337:362,363:388,389:414,415:440,441:466,467:492,493:518,519:544,545:570,571:596,597:622,623:648,649:674,675:700)
假设我有一个名为dat的data.frame。它是700行50列。因此,基本上,对于这26行块中的每一个,我希望每一列都随机地重新排序
具有较小data.frame的示例可能是a=
1 1 1 1 1
2 2 2 2 2
3 3 3 3 3
4 4 4 4 4
5 5 5 5 5
6 6 6 6 6
7 7 7 7 7
8 8 8 8 8
9 9 9 9 9
其中,每行bin如下所示:
mylist=list(1:2,3:6,7:9)
可能导致重新排序的数据帧B=
1 2 1 1 1
2 1 2 2 2
3 4 3 5 3
4 6 4 3 4
5 5 5 6 5
6 3 6 4 6
8 9 8 7 9
9 7 9 8 8
7 8 7 9 7
谢谢。这样就可以了
dat_new<-dat[,unlist(mapply(function(x) sample(x),mylist))]
dat_new假设您的mylist
是完全连续的,没有孔或重复(即unlist(mylist)==1:length(unlist(mylist))
,就像您提供的那样,那么您可以使用任何“split-apply-combine”相对轻松地完成此操作方法。这里是一个data.table
实现,我们首先创建一个拆分索引,根据组中的项目数为每个组重复一个标签,然后按组拆分/重新排序
dt[, split.idx:=unlist(
lapply(
mylist, # for each item in mylist
function(x) rep(paste0(range(x), collapse="-"), length(x)) # create "min-max" label repeated `length` times
) ) ]
dt[, lapply(.SD, sample), by=split.idx] # for each group (`.SD`), cycle through each column and `sample`
生成(注意,我正在将结果子集为易于显示的内容):
你可以清楚地看到,尤其是从51-52组中,这个组只有51-52个值。下面是我使用的数据:
library(data.table)
set.seed(1)
dt <- data.table(replicate(50, 1:700))
库(data.table)
种子(1)
dt您可以尝试以下方法:
# create a 'blocking variable'
block <- rep(x = seq_along(mylist), times = sapply(mylist, length))
# within each block, loop over columns and 'shuffle' each column using `sample`
set.seed(1)
B <- do.call(rbind.data.frame,
by(A, block, function(dat){
sapply(dat, function(x) sample(x))
})
)
B
# V1 V2 V3 V4 V5
# 1.1 1 2 1 2 2
# 1.2 2 1 2 1 1
# 2.1 3 6 4 5 3
# 2.2 6 4 5 3 4
# 2.3 4 5 6 6 5
# 2.4 5 3 3 4 6
# 3.1 8 7 9 8 9
# 3.2 9 8 7 9 8
# 3.3 7 9 8 7 7
#创建一个“阻塞变量”
block这里有一种方法。它不需要名为“a”的数据框架,就像布罗迪格的回答一样,假设“mylist”中没有孔或重复
这将生成一个矩阵,其列数由Ncol
指定
Ncol <- 50 # Number of columns
A1 <- seq_along(unlist(mylist, use.names = FALSE))
do.call(rbind, # ^^ Generate a sequence
lapply(mylist, function(x) { # Traverse the list
replicate(Ncol, sample(A1[x])) # Use replicate with sample
}))
您可以很容易地使用replicate
获得包含多列的矩阵:
set.seed(1)
replicate(5, shuffle(length(block$blocks), block))
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1 1 1 1 2
# [2,] 2 2 2 2 1
# [3,] 5 3 6 4 5
# [4,] 6 5 3 6 4
# [5,] 3 6 4 5 6
# [6,] 4 4 5 3 3
# [7,] 9 8 7 7 9
# [8,] 8 9 9 8 8
# [9,] 7 7 8 9 7
我可能误解了这个问题,但我想你需要为每一列指定不同的顺序。那么必须是dat[,这里]而不是数据[这里]:),我已经做了更正,我也不确定这是否正确。我认为各列保持不变,但对于每一列,您都希望随机分配值。考虑到我在这里的记录极差,我应该知道而不是询问您的答案,但是,不是每列都应该包含1:9中的所有值吗?我在这里要做的唯一主要编辑是我只使用rbind
而不是rbind.data.frame
,除非确实需要data.frame
rbind
单独使用会更有效率。@AnandaMahto,我完全同意。我只从rbind
开始,但在OP中提到了“重新排序的数据帧”。我在答案中添加了您的注释,以使其更清晰可见。谢谢
# create a 'blocking variable'
block <- rep(x = seq_along(mylist), times = sapply(mylist, length))
# within each block, loop over columns and 'shuffle' each column using `sample`
set.seed(1)
B <- do.call(rbind.data.frame,
by(A, block, function(dat){
sapply(dat, function(x) sample(x))
})
)
B
# V1 V2 V3 V4 V5
# 1.1 1 2 1 2 2
# 1.2 2 1 2 1 1
# 2.1 3 6 4 5 3
# 2.2 6 4 5 3 4
# 2.3 4 5 6 6 5
# 2.4 5 3 3 4 6
# 3.1 8 7 9 8 9
# 3.2 9 8 7 9 8
# 3.3 7 9 8 7 7
Ncol <- 50 # Number of columns
A1 <- seq_along(unlist(mylist, use.names = FALSE))
do.call(rbind, # ^^ Generate a sequence
lapply(mylist, function(x) { # Traverse the list
replicate(Ncol, sample(A1[x])) # Use replicate with sample
}))
mylist <- list(1:2,3:6,7:9)
set.seed(1) # to be able to reproduce this answer
Ncol <- 5
A1 <- seq_along(unlist(mylist, use.names = FALSE))
do.call(rbind,
lapply(mylist, function(x) {
replicate(Ncol, sample(A1[x]))
}))
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1 2 1 2 2
# [2,] 2 1 2 1 1
# [3,] 3 6 4 5 3
# [4,] 6 4 5 3 4
# [5,] 4 5 6 6 5
# [6,] 5 3 3 4 6
# [7,] 8 7 9 8 9
# [8,] 9 8 7 9 8
# [9,] 7 9 8 7 7
library(permute)
mylist <- list(1:2,3:6,7:9)
block <- how(blocks = rep(seq_along(mylist), sapply(mylist, length)))
shuffle(length(block$blocks), block)
# [1] 2 1 4 5 3 6 7 9 8
set.seed(1)
replicate(5, shuffle(length(block$blocks), block))
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1 1 1 1 2
# [2,] 2 2 2 2 1
# [3,] 5 3 6 4 5
# [4,] 6 5 3 6 4
# [5,] 3 6 4 5 6
# [6,] 4 4 5 3 3
# [7,] 9 8 7 7 9
# [8,] 8 9 9 8 8
# [9,] 7 7 8 9 7