R ANOVA循环中的可变长度错误_R_Lm_Anova

R ANOVA循环中的可变长度错误

R ANOVA循环中的可变长度错误,r,lm,anova,R,Lm,Anova,我目前正在尝试对我的数据帧进行方差分析，其格式如下： ethnicity sampleID batch gender gene1 gene2 gene3 ... ..多达数千个基因，表格由基因表达值填写下面是我用来尝试对每个基因进行方差分析以找出种族差异的代码： # here, 'merge' is the dataframe as described above # set ethnicity to categorical merge$ethnicity <- factor(merg

我目前正在尝试对我的数据帧进行方差分析，其格式如下：

ethnicity sampleID batch gender gene1 gene2 gene3 ...

..多达数千个基因，表格由基因表达值填写

下面是我用来尝试对每个基因进行方差分析以找出种族差异的代码：

# here, 'merge' is the dataframe as described above
# set ethnicity to categorical
merge$ethnicity <- factor(merge$ethnicity, levels=c("Chinese","Malay","Indian"))

# parametric anova for each gene
baseformula <- " ~ ethnicity"
for (i in 5:ncol(merge))
{
  p <- anova(lm(colnames(merge)[i] ~ ethnicity, data=merge))  # variable lengths differ??
}

使用代码：

for (i in 5:ncol(merge))
{
  print(colnames(merge)[i])
  print(summary(aov(merge[,i] ~ merge$ethnicity)))

}

似乎给了我以下错误：

级别（x）[x]中的错误：只有0可以与负下标混合此外：警告消息：1：在model.response（mf，“numeric”）：
使用带有系数响应的type=“numeric”将被忽略2:In 运营系数（y，z$残差）：“-”对系数没有意义

我生成了一个示例

df

包含一个变量

etnicity

，有3个组，有两个基因

etnicity

是您的预测变量。

loop

打印与

etnicity

相关的每个基因的

aov

摘要结果

set.seed(1); df <- data.frame(etnicity=c('A', 'B', 'C','A', 'B', 'C','A', 'B', 'C'), gene1=rnorm(9), gene2=rnorm(9))

for(i in 2:ncol(df)){
  print(colnames(df)[i])
  print( summary( aov(df[,i] ~ df$etnicity) ) )
  }

[1] "gene1"
            Df Sum Sq Mean Sq F value Pr(>F)
df$etnicity  2  1.324  0.6619   1.006   0.42
Residuals    6  3.947  0.6579               
[1] "gene2"
            Df Sum Sq Mean Sq F value Pr(>F)
df$etnicity  2  2.436   1.218   0.977  0.429
Residuals    6  7.478   1.246

set.seed（1）；df（F）
df$etnicity 2 1.324 0.6619 1.006 0.42
残差6 3.947 0.6579
[1] “基因2”
Df和Sq平均Sq F值Pr（>F）
df$etnicity 2 2.436 1.218 0.977 0.429
残差674781246

将其应用到与OP类似的数据上

df <- read.table(text="ethnicity sample.id Batch Gender X7896759  
1           1 H60903    B6      1  6.19649  
2           1 H61603    B2      1  6.74464  
3           2 H61608    B7      2  6.20268  
4           2 H62204    B4      1  6.71395  
5           3 H62901    B7      2  6.59963", header=T, stringsAsFactors=F)  


for(i in 5:ncol(df)){
  print(colnames(df)[i])
  print(summary(aov(df[,i]~df$ethnicity)))
}

[1] "X7896759"
             Df  Sum Sq Mean Sq F value Pr(>F)
df$ethnicity  1 0.00803 0.00803   0.084  0.791
Residuals     3 0.28767 0.09589

df）
df$1 0.00803 0.00803 0.084 0.791
残差3 0.28767 0.09589

请注意，“merge”是R中函数的名称；所以建议不要这样命名你的对象。你好，谢谢你的回复！我尝试过修改您的代码，但我得到了以下错误：

级别（x）[x]中的错误：只有0可能与负下标混合。此外：警告消息：1:在模型中。响应（mf，“numeric”）：使用type=“numeric”和因子响应将被忽略2:在操作中。因子（y，z$残差）：“-”对于系数来说没有意义

我已经尝试将我的种族级别转换为数字，但仍然给出了错误。你有什么建议吗？没有。您可以（1）将您的数据设置为我的df示例，或者（2）为我们提供一个具有代表性的小型数据示例，以便我们可以查看数据结构。好的，我已经将前五行和前五列添加到我的数据框中，作为对我文章的编辑。对不起，我不知道如何设置评论的格式。非常感谢。有趣的是，当我把它写在一张桌子上，然后再把它读进去的时候，它似乎起作用了。我构建数据框架的方式可能有问题，我将继续研究。尽管如此，还是非常感谢！

df <- read.table(text="ethnicity sample.id Batch Gender X7896759  
1           1 H60903    B6      1  6.19649  
2           1 H61603    B2      1  6.74464  
3           2 H61608    B7      2  6.20268  
4           2 H62204    B4      1  6.71395  
5           3 H62901    B7      2  6.59963", header=T, stringsAsFactors=F)  


for(i in 5:ncol(df)){
  print(colnames(df)[i])
  print(summary(aov(df[,i]~df$ethnicity)))
}

[1] "X7896759"
             Df  Sum Sq Mean Sq F value Pr(>F)
df$ethnicity  1 0.00803 0.00803   0.084  0.791
Residuals     3 0.28767 0.09589