R 利用因子水平组合对数据集进行子集划分
我有一个数据集,希望使用特定因子的所有可能组合将其子集为多个数据集,每个组合应包含4个级别并显示一次 下面是一些生成非常简单示例的代码:R 利用因子水平组合对数据集进行子集划分,r,R,我有一个数据集,希望使用特定因子的所有可能组合将其子集为多个数据集,每个组合应包含4个级别并显示一次 下面是一些生成非常简单示例的代码: data<-cbind(rep(1:8,each=2),matrix(nrow=16, ncol=4,rnorm(160, mean = 0, sd = 1))) colnames(data)<-LETTERS[1:5] > data "A" B C D
data<-cbind(rep(1:8,each=2),matrix(nrow=16, ncol=4,rnorm(160, mean = 0, sd = 1)))
colnames(data)<-LETTERS[1:5]
> data
"A" B C D E
[1,] 1 -0.07929477 -1.2946058 -1.4072064 0.57159386
[2,] 1 1.83963909 -1.1723990 1.1232986 0.39483666
[3,] 2 -0.36423210 1.3240148 1.3274450 -0.96929628
“A”是具有8个级别的因子,我想从8个级别中选择所有可能的4个组合(即1 2 3 4、1 2 3 5等),并使用这些组合将“数据”拆分为多个数据集,以便在进一步分析中使用 给你:
## Generate all combinations of 4 integers between 1 and 8
ii <- combn(1:8, 4, simplify=FALSE)
## Use those combinations to pick out desired rows in data
x <- lapply(ii, function(II) data[data[,"A"] %in% II, ])
## Check that it worked
x[[1]]
# A B C D E
# [1,] 1 2.7963535 -1.01141834 0.9133376 -1.3128354
# [2,] 1 1.9346950 0.85907646 -0.2222619 -0.8143439
# [3,] 2 2.2966139 -2.43140014 -0.4276004 0.4425973
# [4,] 2 0.9046734 -0.30193977 -0.1641523 1.2068400
# [5,] 3 0.8836684 2.59911207 -0.4339402 0.8922918
# [6,] 3 0.9004662 0.31611677 0.9300422 -0.4947400
# [7,] 4 1.0590443 -0.70879715 -0.2357002 1.0907113
# [8,] 4 1.6175373 -0.02734472 0.9151199 -0.8994856
x[[70]]
# A B C D E
# [1,] 5 1.2375211 -0.8635894 -0.32504939 -0.38956232
# [2,] 5 1.0631257 1.7598401 -0.36029628 1.34065065
# [3,] 6 0.4014502 -0.9167007 -0.37284132 0.90406595
# [4,] 6 1.3352802 -1.4181380 0.27940665 -0.73645846
# [5,] 7 0.3892974 1.8418089 0.39443361 0.10841747
# [6,] 7 0.2152083 -0.4404339 -1.72481747 -0.03888857
# [7,] 8 -1.8517170 0.3844379 -0.04383212 1.02553227
# [8,] 8 -0.6770360 -2.0134745 1.71437731 -0.49894527
##生成1到8之间的4个整数的所有组合
ii您可以使用combn
和split
来实现这一点Josh:非常感谢您——再问一个问题,我如何在每个数据集上应用函数,例如说“平均值”,并汇总结果,并将所有结果放在一个数据集中vector@hema,combn
有一个FUN
参数。把这当作一个开始。然后,看看拉普拉和朋友们。。。。
## Generate all combinations of 4 integers between 1 and 8
ii <- combn(1:8, 4, simplify=FALSE)
## Use those combinations to pick out desired rows in data
x <- lapply(ii, function(II) data[data[,"A"] %in% II, ])
## Check that it worked
x[[1]]
# A B C D E
# [1,] 1 2.7963535 -1.01141834 0.9133376 -1.3128354
# [2,] 1 1.9346950 0.85907646 -0.2222619 -0.8143439
# [3,] 2 2.2966139 -2.43140014 -0.4276004 0.4425973
# [4,] 2 0.9046734 -0.30193977 -0.1641523 1.2068400
# [5,] 3 0.8836684 2.59911207 -0.4339402 0.8922918
# [6,] 3 0.9004662 0.31611677 0.9300422 -0.4947400
# [7,] 4 1.0590443 -0.70879715 -0.2357002 1.0907113
# [8,] 4 1.6175373 -0.02734472 0.9151199 -0.8994856
x[[70]]
# A B C D E
# [1,] 5 1.2375211 -0.8635894 -0.32504939 -0.38956232
# [2,] 5 1.0631257 1.7598401 -0.36029628 1.34065065
# [3,] 6 0.4014502 -0.9167007 -0.37284132 0.90406595
# [4,] 6 1.3352802 -1.4181380 0.27940665 -0.73645846
# [5,] 7 0.3892974 1.8418089 0.39443361 0.10841747
# [6,] 7 0.2152083 -0.4404339 -1.72481747 -0.03888857
# [7,] 8 -1.8517170 0.3844379 -0.04383212 1.02553227
# [8,] 8 -0.6770360 -2.0134745 1.71437731 -0.49894527