运行多个t.tests以比较R中的列值对
我有一个如下所示的数据帧:运行多个t.tests以比较R中的列值对,r,R,我有一个如下所示的数据帧: Age A1U_sweet A2F_dip A3U_bbq C1U_sweet C2F_dip C3U_bbq Comments 23 1 2 1 NA NA NA Good 54 NA NA NA 4 1 2 ABCD 43 2 4 7
Age A1U_sweet A2F_dip A3U_bbq C1U_sweet C2F_dip C3U_bbq Comments
23 1 2 1 NA NA NA Good
54 NA NA NA 4 1 2 ABCD
43 2 4 7 NA NA NA HiHi
我试图运行一系列t.tests来比较以a开头的列和以C开头的相应列。我一直在为每对列手动输入以下内容
t.test(df$A1U_sweet, df$C1U_sweet)
有没有办法让我对A1U和C1U、A2U和C2U以及A3U和C3U进行t.测试?我尝试使用apply函数和for循环,但在本例中无法确定如何使这些函数工作
df <- read.table(header = TRUE, stringsAsFactors = FALSE, text = "
Age A1U_sweet A2F_dip A3U_bbq C1U_sweet C2F_dip C3U_bbq Comments
23 1 2 1 2 5 5 Good
54 1 3 1 4 1 2 ABCD
43 2 4 7 1 1 1 HiHi")
df如果我们需要对“A”和“C”的相应“1s”、“2s”和“3s”进行t.test
,则根据列名的子字符串仅使用数字拆分数据表,然后应用t.test
lapply(split.default(df[2:7], gsub("\\D+", "", names(df)[2:7])), t.test)
任务本身并不困难或复杂,尽管由于数据的排列方式,它看起来是这样的。当您看到变量名传递了不止一条信息时,问问自己数据是否可以以更简单的方式排列是很有帮助的。这一简单的主张是R中流行的“整洁”数据处理方法的核心。虽然我不喜欢以“整洁”的名义所做的一切,但这一核心主张是正确的,你违反它(正如你在这里所做的那样)只会使你的分析比需要的困难得多
好的第一步是重新排列数据,使数据不在列名中编码:
df <- read.table(header = TRUE, stringsAsFactors = FALSE, text = "
Age A1U_sweet A2F_dip A3U_bbq C1U_sweet C2F_dip C3U_bbq Comments
23 1 2 1 2 5 5 Good
54 1 3 1 4 1 2 ABCD
43 2 4 7 1 1 1 HiHi")
library(tidyr)
df <- data.frame(id = 1:nrow(df), df)
dfl <- gather(df, key = "key", value = "value", -id, -Age, -Comments)
dfl <- separate(dfl, key, into = c("key", "kind", "type"), sep = c(1, 4))
dfl
## id Age Comments key kind type value
## 1 1 23 Good A 1U_ sweet 1
## 2 2 54 ABCD A 1U_ sweet 1
## 3 3 43 HiHi A 1U_ sweet 2
## 4 1 23 Good A 2F_ dip 2
## 5 2 54 ABCD A 2F_ dip 3
## 6 3 43 HiHi A 2F_ dip 4
## 7 1 23 Good A 3U_ bbq 1
## 8 2 54 ABCD A 3U_ bbq 1
## 9 3 43 HiHi A 3U_ bbq 7
## 10 1 23 Good C 1U_ sweet 2
## 11 2 54 ABCD C 1U_ sweet 4
## 12 3 43 HiHi C 1U_ sweet 1
## 13 1 23 Good C 2F_ dip 5
## 14 2 54 ABCD C 2F_ dip 1
## 15 3 43 HiHi C 2F_ dip 1
## 16 1 23 Good C 3U_ bbq 5
## 17 2 54 ABCD C 3U_ bbq 2
## 18 3 43 HiHi C 3U_ bbq 1
你可以得到你想要使用的所有列对的索引,并在它们上面循环idx@jaySf的可能副本我认为数据帧的结构是不同的。我不太明白这些答案中的代码将如何应用于我的数据框架。如果我错了,请纠正我!
lapply(split(dfl, dfl$type), function(d) t.test(value ~ key, data = d))
## $bbq
##
## Welch Two Sample t-test
##
## data: value by key
## t = 0.14286, df = 3.2778, p-value = 0.8947
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -6.748715 7.415381
## sample estimates:
## mean in group A mean in group C
## 3.000000 2.666667
##
##
## $dip
##
## Welch Two Sample t-test
##
## data: value by key
## t = 0.45883, df = 2.7245, p-value = 0.6805
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -4.233396 5.566729
## sample estimates:
## mean in group A mean in group C
## 3.000000 2.333333
##
##
## $sweet
##
## Welch Two Sample t-test
##
## data: value by key
## t = -1.0607, df = 2.56, p-value = 0.3785
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -4.31437 2.31437
## sample estimates:
## mean in group A mean in group C
## 1.333333 2.333333