R 利用因子水平组合对数据集进行子集划分

R 利用因子水平组合对数据集进行子集划分,r,R,我有一个数据集,希望使用特定因子的所有可能组合将其子集为多个数据集,每个组合应包含4个级别并显示一次 下面是一些生成非常简单示例的代码: data<-cbind(rep(1:8,each=2),matrix(nrow=16, ncol=4,rnorm(160, mean = 0, sd = 1))) colnames(data)<-LETTERS[1:5] > data "A" B C D

我有一个数据集,希望使用特定因子的所有可能组合将其子集为多个数据集,每个组合应包含4个级别并显示一次

下面是一些生成非常简单示例的代码:

  data<-cbind(rep(1:8,each=2),matrix(nrow=16, ncol=4,rnorm(160, mean = 0, sd = 1)))
  colnames(data)<-LETTERS[1:5]

> data
     "A"          B          C          D           E
 [1,] 1 -0.07929477 -1.2946058 -1.4072064  0.57159386
 [2,] 1  1.83963909 -1.1723990  1.1232986  0.39483666
 [3,] 2 -0.36423210  1.3240148  1.3274450 -0.96929628
“A”是具有8个级别的因子,我想从8个级别中选择所有可能的4个组合(即1 2 3 4、1 2 3 5等),并使用这些组合将“数据”拆分为多个数据集,以便在进一步分析中使用

给你:

## Generate all combinations of 4 integers between 1 and 8
ii <- combn(1:8, 4, simplify=FALSE)

## Use those combinations to pick out desired rows in data
x <- lapply(ii, function(II) data[data[,"A"] %in% II, ])

## Check that it worked
x[[1]]
#      A         B           C          D          E
# [1,] 1 2.7963535 -1.01141834  0.9133376 -1.3128354
# [2,] 1 1.9346950  0.85907646 -0.2222619 -0.8143439
# [3,] 2 2.2966139 -2.43140014 -0.4276004  0.4425973
# [4,] 2 0.9046734 -0.30193977 -0.1641523  1.2068400
# [5,] 3 0.8836684  2.59911207 -0.4339402  0.8922918
# [6,] 3 0.9004662  0.31611677  0.9300422 -0.4947400
# [7,] 4 1.0590443 -0.70879715 -0.2357002  1.0907113
# [8,] 4 1.6175373 -0.02734472  0.9151199 -0.8994856

x[[70]]
#      A          B          C           D           E
# [1,] 5  1.2375211 -0.8635894 -0.32504939 -0.38956232
# [2,] 5  1.0631257  1.7598401 -0.36029628  1.34065065
# [3,] 6  0.4014502 -0.9167007 -0.37284132  0.90406595
# [4,] 6  1.3352802 -1.4181380  0.27940665 -0.73645846
# [5,] 7  0.3892974  1.8418089  0.39443361  0.10841747
# [6,] 7  0.2152083 -0.4404339 -1.72481747 -0.03888857
# [7,] 8 -1.8517170  0.3844379 -0.04383212  1.02553227
# [8,] 8 -0.6770360 -2.0134745  1.71437731 -0.49894527
##生成1到8之间的4个整数的所有组合

ii您可以使用
combn
split
来实现这一点Josh:非常感谢您——再问一个问题,我如何在每个数据集上应用函数,例如说“平均值”,并汇总结果,并将所有结果放在一个数据集中vector@hema,
combn
有一个
FUN
参数。把这当作一个开始。然后,看看拉普拉和朋友们。。。。
## Generate all combinations of 4 integers between 1 and 8
ii <- combn(1:8, 4, simplify=FALSE)

## Use those combinations to pick out desired rows in data
x <- lapply(ii, function(II) data[data[,"A"] %in% II, ])

## Check that it worked
x[[1]]
#      A         B           C          D          E
# [1,] 1 2.7963535 -1.01141834  0.9133376 -1.3128354
# [2,] 1 1.9346950  0.85907646 -0.2222619 -0.8143439
# [3,] 2 2.2966139 -2.43140014 -0.4276004  0.4425973
# [4,] 2 0.9046734 -0.30193977 -0.1641523  1.2068400
# [5,] 3 0.8836684  2.59911207 -0.4339402  0.8922918
# [6,] 3 0.9004662  0.31611677  0.9300422 -0.4947400
# [7,] 4 1.0590443 -0.70879715 -0.2357002  1.0907113
# [8,] 4 1.6175373 -0.02734472  0.9151199 -0.8994856

x[[70]]
#      A          B          C           D           E
# [1,] 5  1.2375211 -0.8635894 -0.32504939 -0.38956232
# [2,] 5  1.0631257  1.7598401 -0.36029628  1.34065065
# [3,] 6  0.4014502 -0.9167007 -0.37284132  0.90406595
# [4,] 6  1.3352802 -1.4181380  0.27940665 -0.73645846
# [5,] 7  0.3892974  1.8418089  0.39443361  0.10841747
# [6,] 7  0.2152083 -0.4404339 -1.72481747 -0.03888857
# [7,] 8 -1.8517170  0.3844379 -0.04383212  1.02553227
# [8,] 8 -0.6770360 -2.0134745  1.71437731 -0.49894527