如何关联R中的多个子集_R_Regression_Correlation_Lm

如何关联R中的多个子集

如何关联R中的多个子集,r,regression,correlation,lm,R,Regression,Correlation,Lm,如何将8个子集分别与两个不同的因变量关联？对于两个不同的子集，我总是得到相同的相关系数（下面的例子）。以下是输入： with(subset(mydata2, PARTYID_Strength = 1), cor.test(PARTYID_Strength, mean.legit)) with(subset(mydata2, PARTYID_Strength = 1), cor.test(P

如何将8个子集分别与两个不同的因变量关联？对于两个不同的子集，我总是得到相同的相关系数（下面的例子）。以下是输入：

with(subset(mydata2, PARTYID_Strength = 1), cor.test(PARTYID_Strength,
                                                     mean.legit))

with(subset(mydata2, PARTYID_Strength = 1), cor.test(PARTYID_Strength,
                                                     mean.leegauthor))

with(subset(mydata2, PARTYID_Strength = 2), cor.test(PARTYID_Strength,
                                                     mean.legit))

with(subset(mydata2, PARTYID_Strength = 2), cor.test(PARTYID_Strength,
                                                     mean.leegauthor))

输出（我得到的是双方的强度=1和2）：

皮尔逊积矩相关

数据：PARTYID_强度和平均值。法定t=3.1005，df=607，p值 =0.002022替代假设：真实相关性不等于0.95%置信区间：
0.0458644 0.2023031样本估计值：
cor
0.1248597

皮尔逊积矩相关

数据：PARTYID_强度和平均值。leegauthor t=2.8474，df=607， p值=0.004557替代假设：真实相关性不存在等于0.95%置信区间：
0.03568431 0.19250344样本估计值：
cor
0.1148091

样本数据：

> dput(head(mydata2, 10))
``structure(list(PARTYID = c(1, 3, 1, 1, 1, 4, 3, 1, 1, 1), PARTYID_Other = 
c("NA", 
"NA", "NA", "NA", "NA", "Green", "NA", "NA", "NA", "NA"), PARTYID_Strength = 
c(1, 
7, 1, 2, 1, 8, 1, 6, 1, 1), PARTYID_Strength_Other = c("NA", 
"NA", "NA", "NA", "NA", "Green", "NA", "NA", "NA", "NA"), THERM_Dem = c(80, 
65, 85, 30, 76, 15, 55, 62, 90, 95), THERM_Rep = c(1, 45, 10, 
5, 14, 14, 0, 4, 10, 3), Gender = c("Female", "Male", "Male", 
"Female", "Female", "Male", "Male", "Female", "Female", "Male"
), `MEAN Age` = c(29.5, 49.5, 29.5, 39.5, 29.5, 21, 39.5, 39.5, 
29.5, 65), Age = c("25 - 34", "45 - 54", "25 - 34", "35 - 44", 
"25 - 34", "18 - 24", "35 - 44", "35 - 44", "25 - 34", "65+"), 
Ethnicity = c("White or Caucasian", "Asian or Asian American", 
"White or Caucasian", "White or Caucasian", "Hispanic or Latino", 
"White or Caucasian", "White or Caucasian", "White or Caucasian", 
"White or Caucasian", "White or Caucasian"), Ethnicity_Other = c("NA", 
"NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA"), States = c("Texas", 
"Texas", "Ohio", "Texas", "Puerto Rico", "New Hampshire", 
"South Carolina", "Texas", "Texas", "Texas"), Education = c("Master's 
degree", 
"Bachelor's degree in college (4-year)", "Bachelor's degree in college (4- 
 year)", 
"Master's degree", "Master's degree", "Less than high school degree", 
"Some college but no degree", "Master's degree", "Master's degree", 
"Some college but no degree"), `MEAN Income` = c(30000, 140000, 
150000, 60000, 80000, 30000, 30000, 120000, 150000, 60000
), Income = c("Less than $30,000", "$130,001 to $150,000", 
"More than $150,000", "$50,001 to $70,000", "$70,001 to $90,000", 
"Less than $30,000", "Less than $30,000", "$110,001 to $130,000", 
"More than $150,000", "$50,001 to $70,000"), mean.partystrength = c(3.875, 
2.875, 2.375, 3.5, 2.625, 3.125, 3.375, 3.125, 3.25, 3.625
), mean.traitrep = c(2.5, 2.625, 2.25, 2.625, 2.75, 1.875, 
2.75, 2.875, 2.75, 3), mean.traitdem = c(2.25, 2.625, 2.375, 
2.75, 2.625, 2.125, 1.875, 3, 2, 2.5), mean.leegauthor = c(1, 
2, 2, 4, 1, 4, 1, 1, 1, 1), mean.legit = c(1.71428571428571, 
3.28571428571429, 2.42857142857143, 2.42857142857143, 2.14285714285714, 
1.28571428571429, 1.42857142857143, 1.14285714285714, 2.14285714285714, 
1.28571428571429)), row.names = c(NA, -10L), class = c("tbl_df", 
"tbl", "data.frame"))``

谢谢大家!

要运行测试，请创建感兴趣的列的向量，然后为每个列使用一个匿名函数

fixed <- "PARTYID_Strength"
cols <- c("mean.leegauthor", "mean.legit")

cor_test_result <- sapply(cols, function(x){
  fmla <- paste(fixed, x, sep = "+")
  fmla <- as.formula(paste("~", fmla))
  cor.test(fmla, mydata2)
}, simplify = FALSE)

cor_test_result$mean.leegauthor
#
#        Pearson's product-moment correlation
#
#data:  PARTYID_Strength and mean.leegauthor
#t = 1.4804, df = 8, p-value = 0.177
#alternative hypothesis: true correlation is not equal to 0
#95 percent confidence interval:
# -0.2343269  0.8462610
#sample estimates:
#      cor 
#0.4637152

固定逻辑语句需要=
而不是=
所以PARTYID\u Strength==1
@dcarlson谢谢！虽然我得到了这个结果：皮尔逊积矩相关数据：PARTYID_强度和平均值。legit t=NA，df=67，p值=NA替代假设：真实相关性不等于0.95%置信区间：NA样本估计：cor NAYou仅选择强度为PARTYID_==1的行，因此变量为常数。该变量与任何其他变量的相关性为零。如果要对数据进行子集设置，请不要在相关性中使用子集设置变量。@dcarlson啊，我认为这是有道理的。所以，也许我不应该单独衡量政党，而是将它们组合在一起？还有，如果我用=而不是==，那么最初的公式测量的是什么？它没有任何作用。R没有抱怨，只是返回了原始数据。非常感谢！一些后续问题：1。我得到这些结果，我假设bc拥有完整的数据集（n=609）：数据：PARTYID_强度和平均值。leegauthor t=2.8474，df=607，p值=0.004557替代假设：真实相关性不等于0.95%置信区间：0.03568431 0.19250344样本估计：cor 0.1148091 2。为什么这些公式对同一度量产生不同的结果？cor.test（PARTYID_-Strength，mean.legit+mean.leegauthor，data=mydata2）cor（mydata2$PARTYID_-Strength，mydata2$mean.legit+mean.leegauthor）@LisaByers公式中的加号不是加法运算符，如果您在评论中发布了加号，则不使用公式界面。查看cor
和cor.test
的帮助页面。我明白了，谢谢你，我是一个新手，仍然在学习所有这些！谢谢你的帮助。