在列表中合并data.frames:如何选择多个元素

在列表中合并data.frames:如何选择多个元素,r,list,merge,R,List,Merge,我有一个数据帧列表,其中包含我想要合并的不同行数。对于合并我使用和工作的多个数据帧,有一个很好的方法: > go.sigtop.l[c(1:3)] $SRSF1_cyto GoTerm PValue Fold.Enrichment 1 lipid kinase activity 0.0044501957

我有一个数据帧列表,其中包含我想要合并的不同行数。对于合并我使用和工作的多个数据帧,有一个很好的方法:

> go.sigtop.l[c(1:3)]
$SRSF1_cyto
                                                   GoTerm       PValue Fold.Enrichment
1                                   lipid kinase activity 0.0044501957        5.378668
2 general RNA polymerase II transcription factor activity 0.0070975052        4.840801
3                      protein methyltransferase activity 0.0022675162        4.302935
4                            N-methyltransferase activity 0.0089131138        3.850638
5                          structure-specific DNA binding 0.0002666942        3.821685
6                  purine NTP-dependent helicase activity 0.0007861753        3.377303

$SRSF1_total
                                             GoTerm       PValue Fold.Enrichment
1 translation factor activity, nucleic acid binding 1.460691e-04        6.953428
2                structural constituent of ribosome 8.530549e-03        3.948718
3                                       RNA binding 3.479534e-09        3.675900
4                                nucleotide binding 9.800564e-04        1.638817

$SRSF2_cyto
                                       GoTerm      PValue Fold.Enrichment
1 protein-lysine N-methyltransferase activity 0.001722436       16.486352
2         lysine N-methyltransferase activity 0.001722436       16.486352
3 histone-lysine N-methyltransferase activity 0.001722436       16.486352
4          histone methyltransferase activity 0.003756630       12.607211
5                N-methyltransferase activity 0.007775608        9.741935
6          protein methyltransferase activity 0.008275521        9.525448

> merge.all <- function(by, ...) {
+   frames <- list(...)
+   df <- Reduce(function(x, y) { merge(x, y, by = by, all = TRUE) }, frames)
+   names(df) <- c(by, paste("V", seq(length(frames)), sep = ""))
+   
+   return(df)
+ }
> go.df <- merge.all("GoTerm", go.sigtop.l[[1]], go.sigtop.l[[2]], go.sigtop.l[[3]])
> go.df
                                                    GoTerm           V1       V2           V3       NA          NA        NA
1  general RNA polymerase II transcription factor activity 0.0070975052 4.840801           NA       NA          NA        NA
2              histone-lysine N-methyltransferase activity           NA       NA           NA       NA 0.001722436 16.486352
3                       histone methyltransferase activity           NA       NA           NA       NA 0.003756630 12.607211
4                                    lipid kinase activity 0.0044501957 5.378668           NA       NA          NA        NA
5                      lysine N-methyltransferase activity           NA       NA           NA       NA 0.001722436 16.486352
6                             N-methyltransferase activity 0.0089131138 3.850638           NA       NA 0.007775608  9.741935
7                                       nucleotide binding           NA       NA 9.800564e-04 1.638817          NA        NA
8              protein-lysine N-methyltransferase activity           NA       NA           NA       NA 0.001722436 16.486352
9                       protein methyltransferase activity 0.0022675162 4.302935           NA       NA 0.008275521  9.525448
10                  purine NTP-dependent helicase activity 0.0007861753 3.377303           NA       NA          NA        NA
11                                             RNA binding           NA       NA 3.479534e-09 3.675900          NA        NA
12                      structural constituent of ribosome           NA       NA 8.530549e-03 3.948718          NA        NA
13                          structure-specific DNA binding 0.0002666942 3.821685           NA       NA          NA        NA
14       translation factor activity, nucleic acid binding           NA       NA 1.460691e-04 6.953428          NA        NA
但这并不奏效


我知道许多类似问题的答案,但我所看到的答案中没有一个能解决我的问题。干杯。

这并不漂亮,但可以用for循环完成。如果有更好的解决方案,我会接受它,而不是这个:

df.m <- go.sigtop.l[[1]]
for (i in 2:length(names(go.sigtop.l))){
df.m <- merge(df.m, go.sigtop.l[[i]], by ="GoTerm", all = TRUE, suffixes = c(paste(".", names(go.sigtop.l)[i-1], sep=""), paste(".", names(go.sigtop.l)[i], sep="")))
}
df.m[is.na(df.m)] <- 0 

> head(df.m)
                                                   GoTerm PValue.SRSF1_cyto Fold.Enrichment.SRSF1_cyto PValue.SRSF1_total Fold.Enrichment.SRSF1_total PValue.SRSF2_cyto
1                          aminoacyl-tRNA ligase activity       0.000000000                   0.000000                  0                           0                 0
2                                    beta-catenin binding       0.000000000                   0.000000                  0                           0                 0
3                          cell adhesion molecule binding       0.000000000                   0.000000                  0                           0                 0
4                           cytochrome-c oxidase activity       0.000000000                   0.000000                  0                           0                 0
5                            cytoskeletal protein binding       0.000000000                   0.000000                  0                           0                 0
6 general RNA polymerase II transcription factor activity       0.007097505                   4.840801                  0                           0                 0
  Fold.Enrichment.SRSF2_cyto PValue.SRSF2_total Fold.Enrichment.SRSF2_total PValue.SRSF3_cyto Fold.Enrichment.SRSF3_cyto PValue.SRSF3_total Fold.Enrichment.SRSF3_total
1                          0                  0                           0       0.000000000                   0.000000                  0                           0
2                          0                  0                           0       0.000186408                   5.037574                  0                           0
3                          0                  0                           0       0.000000000                   0.000000                  0                           0
4                          0                  0                           0       0.000000000                   0.000000                  0                           0
5                          0                  0                           0       0.000000000                   0.000000                  0                           0
6                          0                  0                           0       0.000000000                   0.000000                  0                           0
  PValue.SRSF4_cyto Fold.Enrichment.SRSF4_cyto PValue.SRSF4_total Fold.Enrichment.SRSF4_total PValue.SRSF5_cyto Fold.Enrichment.SRSF5_cyto PValue.SRSF5_total
1         0.0000000                    0.00000          0.0000000                    0.000000                 0                          0                  0
2         0.0000000                    0.00000          0.0000000                    0.000000                 0                          0                  0
3         0.0000000                    0.00000          0.0000000                    0.000000                 0                          0                  0
4         0.0025874                   14.26516          0.0000000                    0.000000                 0                          0                  0
5         0.0000000                    0.00000          0.0053485                    4.239176                 0                          0                  0
6         0.0000000                    0.00000          0.0000000                    0.000000                 0                          0                  0
  Fold.Enrichment.SRSF5_total PValue.SRSF6_cyto Fold.Enrichment.SRSF6_cyto PValue.SRSF6_total Fold.Enrichment.SRSF6_total PValue.SRSF7_cyto Fold.Enrichment.SRSF7_cyto
1                           0      0.0007474458                   12.03623                  0                           0                 0                          0
2                           0      0.0000000000                    0.00000                  0                           0                 0                          0
3                           0      0.0000000000                    0.00000                  0                           0                 0                          0
4                           0      0.0000000000                    0.00000                  0                           0                 0                          0
5                           0      0.0000000000                    0.00000                  0                           0                 0                          0
6                           0      0.0000000000                    0.00000                  0                           0                 0                          0
  PValue.SRSF7_total Fold.Enrichment.SRSF7_total
1        0.000000000                     0.00000
2        0.000000000                     0.00000
3        0.009078473                    20.42213
4        0.000000000                     0.00000
5        0.000000000                     0.00000
6        0.000000000                     0.00000

df.m这并不漂亮,但可以通过for循环来完成。如果有更好的解决方案,我会接受它,而不是这个:

df.m <- go.sigtop.l[[1]]
for (i in 2:length(names(go.sigtop.l))){
df.m <- merge(df.m, go.sigtop.l[[i]], by ="GoTerm", all = TRUE, suffixes = c(paste(".", names(go.sigtop.l)[i-1], sep=""), paste(".", names(go.sigtop.l)[i], sep="")))
}
df.m[is.na(df.m)] <- 0 

> head(df.m)
                                                   GoTerm PValue.SRSF1_cyto Fold.Enrichment.SRSF1_cyto PValue.SRSF1_total Fold.Enrichment.SRSF1_total PValue.SRSF2_cyto
1                          aminoacyl-tRNA ligase activity       0.000000000                   0.000000                  0                           0                 0
2                                    beta-catenin binding       0.000000000                   0.000000                  0                           0                 0
3                          cell adhesion molecule binding       0.000000000                   0.000000                  0                           0                 0
4                           cytochrome-c oxidase activity       0.000000000                   0.000000                  0                           0                 0
5                            cytoskeletal protein binding       0.000000000                   0.000000                  0                           0                 0
6 general RNA polymerase II transcription factor activity       0.007097505                   4.840801                  0                           0                 0
  Fold.Enrichment.SRSF2_cyto PValue.SRSF2_total Fold.Enrichment.SRSF2_total PValue.SRSF3_cyto Fold.Enrichment.SRSF3_cyto PValue.SRSF3_total Fold.Enrichment.SRSF3_total
1                          0                  0                           0       0.000000000                   0.000000                  0                           0
2                          0                  0                           0       0.000186408                   5.037574                  0                           0
3                          0                  0                           0       0.000000000                   0.000000                  0                           0
4                          0                  0                           0       0.000000000                   0.000000                  0                           0
5                          0                  0                           0       0.000000000                   0.000000                  0                           0
6                          0                  0                           0       0.000000000                   0.000000                  0                           0
  PValue.SRSF4_cyto Fold.Enrichment.SRSF4_cyto PValue.SRSF4_total Fold.Enrichment.SRSF4_total PValue.SRSF5_cyto Fold.Enrichment.SRSF5_cyto PValue.SRSF5_total
1         0.0000000                    0.00000          0.0000000                    0.000000                 0                          0                  0
2         0.0000000                    0.00000          0.0000000                    0.000000                 0                          0                  0
3         0.0000000                    0.00000          0.0000000                    0.000000                 0                          0                  0
4         0.0025874                   14.26516          0.0000000                    0.000000                 0                          0                  0
5         0.0000000                    0.00000          0.0053485                    4.239176                 0                          0                  0
6         0.0000000                    0.00000          0.0000000                    0.000000                 0                          0                  0
  Fold.Enrichment.SRSF5_total PValue.SRSF6_cyto Fold.Enrichment.SRSF6_cyto PValue.SRSF6_total Fold.Enrichment.SRSF6_total PValue.SRSF7_cyto Fold.Enrichment.SRSF7_cyto
1                           0      0.0007474458                   12.03623                  0                           0                 0                          0
2                           0      0.0000000000                    0.00000                  0                           0                 0                          0
3                           0      0.0000000000                    0.00000                  0                           0                 0                          0
4                           0      0.0000000000                    0.00000                  0                           0                 0                          0
5                           0      0.0000000000                    0.00000                  0                           0                 0                          0
6                           0      0.0000000000                    0.00000                  0                           0                 0                          0
  PValue.SRSF7_total Fold.Enrichment.SRSF7_total
1        0.000000000                     0.00000
2        0.000000000                     0.00000
3        0.009078473                    20.42213
4        0.000000000                     0.00000
5        0.000000000                     0.00000
6        0.000000000                     0.00000

df.m您是否尝试过此功能:

请参见链接中的此代码:

df1 <- data.frame(list(A=1:10), B=LETTERS[1:10], C=rnorm(10) )
df2 <- data.frame(A=11:20, D=rnorm(10), E=letters[1:10] )
df3 <- df1
df4 <- df2

out <- smartbind( list(df1, df2, df3, df4))

df1您是否尝试过此功能:

请参见链接中的此代码:

df1 <- data.frame(list(A=1:10), B=LETTERS[1:10], C=rnorm(10) )
df2 <- data.frame(A=11:20, D=rnorm(10), E=letters[1:10] )
df3 <- df1
df4 <- df2

out <- smartbind( list(df1, df2, df3, df4))

df1它们是否都有相同的列?然后只需执行
do.call(rbind,go.sigtop.l)
。它们执行@Thomas,但这会为GoTerm创建一个包含重复行的数据帧。在发布之前我已经尝试过:)那么你想对那些重复的行做什么呢?我不想。目标是创建一个表,其中每一行表示一个唯一的GoTerm,每一列表示每个条件下该go项的一组值。我打算选择其中的一些值来创建一个矩阵,然后聚集/绘制热图。无论如何,我想我已经找到了使用for循环的解决方案。虽然不漂亮,但能胜任。我现在就发布。是的,请发布它作为答案。它们都有相同的栏目吗?然后只需执行
do.call(rbind,go.sigtop.l)
。它们执行@Thomas,但这会为GoTerm创建一个包含重复行的数据帧。在发布之前我已经尝试过:)那么你想对那些重复的行做什么呢?我不想。目标是创建一个表,其中每一行表示一个唯一的GoTerm,每一列表示每个条件下该go项的一组值。我打算选择其中的一些值来创建一个矩阵,然后聚集/绘制热图。无论如何,我想我已经找到了使用for循环的解决方案。虽然不漂亮,但能胜任。我现在就发布它。是的,请将其作为答案发布。它不适用于列表和列表中数量不确定的元素。您可以进行foreach并浏览您的列表,然后使用smartbind函数将它们合并。我想这是你想要的。。。我应该发布一个可能的例子吗!?是,由于dfs的行数不相同,它似乎给出了一个错误:
smartbind(go.sigtop.l)data.frame(SRSF1_total=list(GoTerm=c)中的错误(5L、3L、7L、10L、9L,:参数意味着不同的行数:6、5、1、3
它不适用于列表和列表中数量不确定的元素。您可以进行foreach并遍历您的列表,然后与smartbind函数合并它们。我想这是您想要的…我是否应该发布一个可能的示例!?是的,请看ms由于dfs没有相同的行数而给出错误:
smartbind(go.sigtop.l)data.frame中的错误(SRSF1_total=list)(GoTerm=c(5L,3L,7L,10L,9L,:参数表示不同的行数:6,5,1,3