对两个data.Frame之间的所有列组合运行成对fisher测试
我有两个数据帧:editCounts和nonEditCounts。这些结构具有相同的维度,包含相同的列和行名称,但实际数据不同。以下是每个人的头:对两个data.Frame之间的所有列组合运行成对fisher测试,r,dataframe,permutation,R,Dataframe,Permutation,我有两个数据帧:editCounts和nonEditCounts。这些结构具有相同的维度,包含相同的列和行名称,但实际数据不同。以下是每个人的头: > head(editCounts) Samp0 Samp1 Samp2 chr10_101992307 0 4 3 chr10_101992684 4
> head(editCounts)
Samp0 Samp1 Samp2
chr10_101992307 0 4 3
chr10_101992684 4 0 1
chr10_127480585 0 3 0
chr10_16479385 3 3 3
chr10_73979859 0 3 2
chr10_73979940 0 3 8
> head(nonEditCounts)
Samp0 Samp1 Samp2
chr10_101992307 0 4 3
chr10_101992684 15 0 4
chr10_127480585 0 6 0
chr10_16479385 7 7 4
chr10_73979859 0 13 7
chr10_73979940 0 21 10
这里的最终目标是使用fisher.test对每个data.frames之间的每列和每行执行成对fisher测试。作为输出,我希望创建一个表,其中包含对应于每行名称的每对比较的结果p值,例如:
Samp0_vs_Samp1 Samp0_vs_Samp2 Samp1_vs_Samp2
chr10_101992307 pval pval pval
chr10_101992684 pval pval pval
chr10_127480585 pval pval pval
chr10_16479385 pval pval pval
chr10_73979859 pval pval pval
... ... ... ...
因此,以Samp0和Samp1为例,第一个fisher测试将由如下所示的矩阵组成:
> tempMat=matrix(c(editCounts$ERR188028_GBR[1], nonEditCounts$ERR188028_GBR[1],
+ editCounts$ERR188035_GBR[1], nonEditCounts$ERR188035_GBR[1]), 2, 2)
> tempMat
[,1] [,2]
[1,] 0 4
[2,] 0 4
这些值对应于第一行chr10_101992307。在这种情况下,fisher测试将导致p值为1
我知道我可以使用combn来计算每个列的排列,但我不确定如何循环遍历每个列,从4个值创建一个列联表,并运行fisher测试。下面列出了我迄今为止编写的代码;但是,它在尝试创建tempMat时会抛出一个错误
editCounts <- read.table("editCountMatrix.txt", sep="\t", header=TRUE, row.names=1)
nonEditCounts <- read.table("nonEditCountMatrix.txt", sep="\t", header=TRUE, row.names=1)
pairwiseComb <- combn(names(editCounts),2)
for (j in seq(1,length(pairwiseComb),2)){
tempCol1 = pairwiseComb[[j]]
tempCol2 = pairwiseComb[[j+1]]
cat("Processing: ",tempCol1," vs. ",tempCol2, "\n", sep="") # Prints correctly
for (i in 1:nrow(editCounts)){
tempMat=matrix(c(editCounts$tempCol1[i], nonEditCounts$tempCol1[i],
editCounts$tempCol2[i], nonEditCounts$tempCol2[i]), 2, 2)
tempFisher=fisher.test(tempMat, alternative="two.sided")
pval=tempFisher$p.value
pvalAdj=p.adjust(pval,method="fdr")
}
}
任何帮助都将不胜感激
谢谢 这里是一个建议的解决方案,我已经纠正了代码中的一些小索引问题,并建议使用预先分配的矩阵来存储Fisher精确测试结果
# Create data.frames using your sample data.
editCounts <- read.table(header=TRUE,
text=" Samp0 Samp1 Samp2
chr10_101992307 0 4 3
chr10_101992684 4 0 1
chr10_127480585 0 3 0
chr10_16479385 3 3 3
chr10_73979859 0 3 2
chr10_73979940 0 3 8")
nonEditCounts <- read.table(header=TRUE,
text=" Samp0 Samp1 Samp2
chr10_101992307 0 4 3
chr10_101992684 15 0 4
chr10_127480585 0 6 0
chr10_16479385 7 7 4
chr10_73979859 0 13 7
chr10_73979940 0 21 10")
可以使用[[或]而不是$来更正错误。例如:editCounts[[tempCol1]][i]或editCounts[i,tempCol1],但不能使用editCounts$tempCol1[i]。@bdemarest editCounts[[tempCol1]]似乎有效。尽管如此,我还是选择使用下面的解决方案。我注意到在仅使用两个示例时尝试设置结果矩阵的列名时出现错误。请查看下面的错误。有什么想法吗?谢谢。谢谢!这似乎有效,但我注意到,当只有两个示例时,我似乎在尝试时遇到错误将列名分配给结果矩阵:colnamesFixed中的错误。创建结果矩阵时发生了一个错误,恰好适用于n=3个样本!结果矩阵需要为每个可能的对指定一列:ncol=ncolpairwiseCombAh是!现在工作得很好。谢谢!
# Create data.frames using your sample data.
editCounts <- read.table(header=TRUE,
text=" Samp0 Samp1 Samp2
chr10_101992307 0 4 3
chr10_101992684 4 0 1
chr10_127480585 0 3 0
chr10_16479385 3 3 3
chr10_73979859 0 3 2
chr10_73979940 0 3 8")
nonEditCounts <- read.table(header=TRUE,
text=" Samp0 Samp1 Samp2
chr10_101992307 0 4 3
chr10_101992684 15 0 4
chr10_127480585 0 6 0
chr10_16479385 7 7 4
chr10_73979859 0 13 7
chr10_73979940 0 21 10")
pairwiseComb <- combn(names(editCounts), 2)
# Create a matrix to hold results.
results <- matrix(NA, ncol=ncol(pairwiseComb), nrow=nrow(editCounts))
# Create row and column names to use for indexing/assignment of results.
rownames(results) <- rownames(editCounts)
colnames(results) <- apply(pairwiseComb, 2,
function(x) {paste(x[1], "_vs_", x[2], sep="")})
# Loop over number of column pairs.
for (j in seq(ncol(pairwiseComb))) {
tempCol1 <- pairwiseComb[1, j]
tempCol2 <- pairwiseComb[2, j]
resultsCol <- paste(tempCol1, "_vs_", tempCol2, sep="")
cols <- c(tempCol1, tempCol2)
# Loop over rownames.
for (row in rownames(results)) {
tempMat <- rbind( editCounts[row, cols], # Grab values using row and
nonEditCounts[row, cols]) # column names. Use rbind to
# create two-row matrix.
tempFisher <- fisher.test(tempMat, alternative="two.sided")
results[row, resultsCol] <- tempFisher$p.value # Use row and column name
# indexing to assign
# p-value to results.
}
}
# Compute adjusted p-values using all of the computed p-values, outside of loop.
padj <- results # First make copy of results matrix.
padj[] <- p.adjust(results, method="fdr") # Trick to retain shape and attributes.
results
# Samp0_vs_Samp1 Samp0_vs_Samp2 Samp1_vs_Samp2
# chr10_101992307 1 1.0000000 1.00000000
# chr10_101992684 1 1.0000000 1.00000000
# chr10_127480585 1 1.0000000 1.00000000
# chr10_16479385 1 0.6436652 0.64366516
# chr10_73979859 1 1.0000000 1.00000000
# chr10_73979940 1 1.0000000 0.03290832
padj
# Samp0_vs_Samp1 Samp0_vs_Samp2 Samp1_vs_Samp2
# chr10_101992307 1 1 1.0000000
# chr10_101992684 1 1 1.0000000
# chr10_127480585 1 1 1.0000000
# chr10_16479385 1 1 1.0000000
# chr10_73979859 1 1 1.0000000
# chr10_73979940 1 1 0.5923497