如何循环进行R中所有可能的因子水平比较_R_For Loop_Categorical Data

如何循环进行R中所有可能的因子水平比较

r for-loop

如何循环进行R中所有可能的因子水平比较,r,for-loop,categorical-data,R,For Loop,Categorical Data,考虑以下数据帧： type = c('A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'D', 'D', 'D') val1 = c(.35, .36, .35, .22, .27, .25, .88, .9, .87, .35, .35, .36) val2 = c(.35, .35, .37, .40, .42, .46, .9, .91, .82, .36, .36, .36) df = data.frame (type, val1, val2)

考虑以下数据帧：

type = c('A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'D', 'D', 'D')
val1 = c(.35, .36, .35, .22, .27, .25, .88, .9, .87, .35, .35, .36)
val2 = c(.35, .35, .37, .40, .42, .46, .9, .91, .82, .36, .36, .36)

df = data.frame (type, val1, val2)

我有四个类别（称为类型；A、B、C和D）。每种类型的三个观察值可以平均，以创建一种类型的多元平均值（由val1和val2的平均值组成）。我想使用霍特林测试比较所有可能的类型组合（AB、AC、AD、BC、BD、CD），以确定哪种类型意味着（如果有的话）是相同的。我可以将其硬编码为：

a = filter (df, type == "A") [,2:3]
b = filter (df, type == "B") [,2:3]
c = filter (df, type == "C") [,2:3]
d = filter (df, type == "D") [,2:3]

然后对每对指定类型运行Hotelling的T2测试：

library('Hotelling')
hotelling.test(a, b, shrinkage=FALSE)
hotelling.test(b, c, shrinkage=FALSE)
hotelling.test(a, c, shrinkage=FALSE)

#And so on

这显然是非常低效和不切实际的，因为我的实际数据集有55种不同的类型。我知道答案在于for循环，但我很难找出如何告诉hotelling.test来比较所有可能类型组合的val1/val2多元平均数。我对创建for循环非常陌生，希望有人能给我指出正确的方向

在比较了所有类型之后，理想情况下，我能够得到一个输出，显示Hotelling测试p值大于0.05的类型对，这意味着这两种类型可能是重复的。在示例数据帧中，类型A和D返回的p值大于0.05，而其他比较具有p我们可以使用

combn

创建成对组合，对数据集进行子集划分并应用函数

library(Hotelling)
outlst <- combn(as.character(unique(df$type)), 2, 
    FUN = function(x) hotelling.test(subset(df, type == x[1], select = -1), 
          subset(df, type == x[2], select = -1)), simplify = FALSE)
names(outlst) <- combn(as.character(unique(df$type)), 2, FUN = paste, collapse = "_")

outlst[1]
#$A_B
#Test stat:  36.013 
#Numerator df:  2 
#Denominator df:  3 
#P-value:  0.007996

库（霍特林）
outlst如果要用于循环：
type = c('A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'D', 'D', 'D')
val1 = c(.35, .36, .35, .22, .27, .25, .88, .9, .87, .35, .35, .36)
val2 = c(.35, .35, .37, .40, .42, .46, .9, .91, .82, .36, .36, .36)

df = data.frame (type, val1, val2)

for (first in unique(df$type)) {
  for (second in unique(df$type)) {
    if (first != second) {
      print(c(first, second))
    }
  }
}

[1] "A" "B"
[1] "A" "C"
[1] "A" "D"
[1] "B" "A"
[1] "B" "C"
[1] "B" "D"
[1] "C" "A"
[1] "C" "B"
[1] "C" "D"
[1] "D" "A"
[1] "D" "B"
[1] "D" "C"

筛选的第一步与拆分
相同，这将为您节省大量工作，并能跟上如此多的对象sp不敢相信我以前从未遇到过“拆分”，这是一个很棒的工具。谢谢