Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/83.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
对两个data.Frame之间的所有列组合运行成对fisher测试_R_Dataframe_Permutation - Fatal编程技术网

对两个data.Frame之间的所有列组合运行成对fisher测试

对两个data.Frame之间的所有列组合运行成对fisher测试,r,dataframe,permutation,R,Dataframe,Permutation,我有两个数据帧:editCounts和nonEditCounts。这些结构具有相同的维度,包含相同的列和行名称,但实际数据不同。以下是每个人的头: > head(editCounts) Samp0 Samp1 Samp2 chr10_101992307 0 4 3 chr10_101992684 4

我有两个数据帧:editCounts和nonEditCounts。这些结构具有相同的维度,包含相同的列和行名称,但实际数据不同。以下是每个人的头:

> head(editCounts)
                        Samp0         Samp1       Samp2
chr10_101992307             0             4           3
chr10_101992684             4             0           1
chr10_127480585             0             3           0
chr10_16479385              3             3           3
chr10_73979859              0             3           2
chr10_73979940              0             3           8
> head(nonEditCounts)
                        Samp0         Samp1       Samp2
chr10_101992307             0             4           3
chr10_101992684            15             0           4
chr10_127480585             0             6           0
chr10_16479385              7             7           4
chr10_73979859              0            13           7
chr10_73979940              0            21          10
这里的最终目标是使用fisher.test对每个data.frames之间的每列和每行执行成对fisher测试。作为输出,我希望创建一个表,其中包含对应于每行名称的每对比较的结果p值,例如:

               Samp0_vs_Samp1     Samp0_vs_Samp2     Samp1_vs_Samp2 
chr10_101992307          pval               pval               pval 
chr10_101992684          pval               pval               pval 
chr10_127480585          pval               pval               pval 
chr10_16479385           pval               pval               pval 
chr10_73979859           pval               pval               pval 
...                       ...                ...                ...
因此,以Samp0和Samp1为例,第一个fisher测试将由如下所示的矩阵组成:

    > tempMat=matrix(c(editCounts$ERR188028_GBR[1], nonEditCounts$ERR188028_GBR[1],
    +                  editCounts$ERR188035_GBR[1], nonEditCounts$ERR188035_GBR[1]), 2, 2)
    > tempMat
         [,1] [,2]
    [1,]    0    4
    [2,]    0    4
这些值对应于第一行chr10_101992307。在这种情况下,fisher测试将导致p值为1

我知道我可以使用combn来计算每个列的排列,但我不确定如何循环遍历每个列,从4个值创建一个列联表,并运行fisher测试。下面列出了我迄今为止编写的代码;但是,它在尝试创建tempMat时会抛出一个错误

editCounts    <- read.table("editCountMatrix.txt", sep="\t", header=TRUE, row.names=1)
nonEditCounts <- read.table("nonEditCountMatrix.txt", sep="\t", header=TRUE, row.names=1)

pairwiseComb <- combn(names(editCounts),2)

for (j in seq(1,length(pairwiseComb),2)){
  tempCol1 = pairwiseComb[[j]]
  tempCol2 = pairwiseComb[[j+1]]
  cat("Processing: ",tempCol1," vs. ",tempCol2, "\n", sep="") # Prints correctly
  for (i in 1:nrow(editCounts)){
    tempMat=matrix(c(editCounts$tempCol1[i], nonEditCounts$tempCol1[i],
                 editCounts$tempCol2[i], nonEditCounts$tempCol2[i]), 2, 2)
    tempFisher=fisher.test(tempMat, alternative="two.sided")
    pval=tempFisher$p.value
    pvalAdj=p.adjust(pval,method="fdr")
  }
}
任何帮助都将不胜感激


谢谢

这里是一个建议的解决方案,我已经纠正了代码中的一些小索引问题,并建议使用预先分配的矩阵来存储Fisher精确测试结果

# Create data.frames using your sample data.
editCounts <- read.table(header=TRUE,
text="                        Samp0         Samp1       Samp2
chr10_101992307             0             4           3
chr10_101992684             4             0           1
chr10_127480585             0             3           0
chr10_16479385              3             3           3
chr10_73979859              0             3           2
chr10_73979940              0             3           8")

nonEditCounts <- read.table(header=TRUE,
text="                        Samp0         Samp1       Samp2
chr10_101992307             0             4           3
chr10_101992684            15             0           4
chr10_127480585             0             6           0
chr10_16479385              7             7           4
chr10_73979859              0            13           7
chr10_73979940              0            21          10")

可以使用[[或]而不是$来更正错误。例如:editCounts[[tempCol1]][i]或editCounts[i,tempCol1],但不能使用editCounts$tempCol1[i]。@bdemarest editCounts[[tempCol1]]似乎有效。尽管如此,我还是选择使用下面的解决方案。我注意到在仅使用两个示例时尝试设置结果矩阵的列名时出现错误。请查看下面的错误。有什么想法吗?谢谢。谢谢!这似乎有效,但我注意到,当只有两个示例时,我似乎在尝试时遇到错误将列名分配给结果矩阵:colnamesFixed中的错误。创建结果矩阵时发生了一个错误,恰好适用于n=3个样本!结果矩阵需要为每个可能的对指定一列:ncol=ncolpairwiseCombAh是!现在工作得很好。谢谢!
# Create data.frames using your sample data.
editCounts <- read.table(header=TRUE,
text="                        Samp0         Samp1       Samp2
chr10_101992307             0             4           3
chr10_101992684             4             0           1
chr10_127480585             0             3           0
chr10_16479385              3             3           3
chr10_73979859              0             3           2
chr10_73979940              0             3           8")

nonEditCounts <- read.table(header=TRUE,
text="                        Samp0         Samp1       Samp2
chr10_101992307             0             4           3
chr10_101992684            15             0           4
chr10_127480585             0             6           0
chr10_16479385              7             7           4
chr10_73979859              0            13           7
chr10_73979940              0            21          10")
pairwiseComb <- combn(names(editCounts), 2)

# Create a matrix to hold results.
results <- matrix(NA, ncol=ncol(pairwiseComb), nrow=nrow(editCounts))

# Create row and column names to use for indexing/assignment of results.
rownames(results) <- rownames(editCounts)
colnames(results) <- apply(pairwiseComb, 2, 
                           function(x) {paste(x[1], "_vs_", x[2], sep="")})

# Loop over number of column pairs.
for (j in seq(ncol(pairwiseComb))) {
    tempCol1 <- pairwiseComb[1, j]
    tempCol2 <- pairwiseComb[2, j]
    resultsCol <- paste(tempCol1, "_vs_", tempCol2, sep="")
    cols <- c(tempCol1, tempCol2)
    # Loop over rownames.
    for (row in rownames(results)) {
        tempMat <- rbind(   editCounts[row, cols], # Grab values using row and
                         nonEditCounts[row, cols]) # column names. Use rbind to
                                                   # create two-row matrix.

        tempFisher <- fisher.test(tempMat, alternative="two.sided")
        results[row, resultsCol] <- tempFisher$p.value # Use row and column name
                                                       # indexing to assign
                                                       # p-value to results.
    }
}

# Compute adjusted p-values using all of the computed p-values, outside of loop.
padj <- results                           # First make copy of results matrix.  
padj[] <- p.adjust(results, method="fdr") # Trick to retain shape and attributes.
results
#                 Samp0_vs_Samp1 Samp0_vs_Samp2 Samp1_vs_Samp2
# chr10_101992307              1      1.0000000     1.00000000
# chr10_101992684              1      1.0000000     1.00000000
# chr10_127480585              1      1.0000000     1.00000000
# chr10_16479385               1      0.6436652     0.64366516
# chr10_73979859               1      1.0000000     1.00000000
# chr10_73979940               1      1.0000000     0.03290832

padj
#                 Samp0_vs_Samp1 Samp0_vs_Samp2 Samp1_vs_Samp2
# chr10_101992307              1              1      1.0000000
# chr10_101992684              1              1      1.0000000
# chr10_127480585              1              1      1.0000000
# chr10_16479385               1              1      1.0000000
# chr10_73979859               1              1      1.0000000
# chr10_73979940               1              1      0.5923497