在列表中合并data.frames:如何选择多个元素
我有一个数据帧列表,其中包含我想要合并的不同行数。对于合并我使用和工作的多个数据帧,有一个很好的方法:在列表中合并data.frames:如何选择多个元素,r,list,merge,R,List,Merge,我有一个数据帧列表,其中包含我想要合并的不同行数。对于合并我使用和工作的多个数据帧,有一个很好的方法: > go.sigtop.l[c(1:3)] $SRSF1_cyto GoTerm PValue Fold.Enrichment 1 lipid kinase activity 0.0044501957
> go.sigtop.l[c(1:3)]
$SRSF1_cyto
GoTerm PValue Fold.Enrichment
1 lipid kinase activity 0.0044501957 5.378668
2 general RNA polymerase II transcription factor activity 0.0070975052 4.840801
3 protein methyltransferase activity 0.0022675162 4.302935
4 N-methyltransferase activity 0.0089131138 3.850638
5 structure-specific DNA binding 0.0002666942 3.821685
6 purine NTP-dependent helicase activity 0.0007861753 3.377303
$SRSF1_total
GoTerm PValue Fold.Enrichment
1 translation factor activity, nucleic acid binding 1.460691e-04 6.953428
2 structural constituent of ribosome 8.530549e-03 3.948718
3 RNA binding 3.479534e-09 3.675900
4 nucleotide binding 9.800564e-04 1.638817
$SRSF2_cyto
GoTerm PValue Fold.Enrichment
1 protein-lysine N-methyltransferase activity 0.001722436 16.486352
2 lysine N-methyltransferase activity 0.001722436 16.486352
3 histone-lysine N-methyltransferase activity 0.001722436 16.486352
4 histone methyltransferase activity 0.003756630 12.607211
5 N-methyltransferase activity 0.007775608 9.741935
6 protein methyltransferase activity 0.008275521 9.525448
> merge.all <- function(by, ...) {
+ frames <- list(...)
+ df <- Reduce(function(x, y) { merge(x, y, by = by, all = TRUE) }, frames)
+ names(df) <- c(by, paste("V", seq(length(frames)), sep = ""))
+
+ return(df)
+ }
> go.df <- merge.all("GoTerm", go.sigtop.l[[1]], go.sigtop.l[[2]], go.sigtop.l[[3]])
> go.df
GoTerm V1 V2 V3 NA NA NA
1 general RNA polymerase II transcription factor activity 0.0070975052 4.840801 NA NA NA NA
2 histone-lysine N-methyltransferase activity NA NA NA NA 0.001722436 16.486352
3 histone methyltransferase activity NA NA NA NA 0.003756630 12.607211
4 lipid kinase activity 0.0044501957 5.378668 NA NA NA NA
5 lysine N-methyltransferase activity NA NA NA NA 0.001722436 16.486352
6 N-methyltransferase activity 0.0089131138 3.850638 NA NA 0.007775608 9.741935
7 nucleotide binding NA NA 9.800564e-04 1.638817 NA NA
8 protein-lysine N-methyltransferase activity NA NA NA NA 0.001722436 16.486352
9 protein methyltransferase activity 0.0022675162 4.302935 NA NA 0.008275521 9.525448
10 purine NTP-dependent helicase activity 0.0007861753 3.377303 NA NA NA NA
11 RNA binding NA NA 3.479534e-09 3.675900 NA NA
12 structural constituent of ribosome NA NA 8.530549e-03 3.948718 NA NA
13 structure-specific DNA binding 0.0002666942 3.821685 NA NA NA NA
14 translation factor activity, nucleic acid binding NA NA 1.460691e-04 6.953428 NA NA
但这并不奏效
我知道许多类似问题的答案,但我所看到的答案中没有一个能解决我的问题。干杯。这并不漂亮,但可以用for循环完成。如果有更好的解决方案,我会接受它,而不是这个:
df.m <- go.sigtop.l[[1]]
for (i in 2:length(names(go.sigtop.l))){
df.m <- merge(df.m, go.sigtop.l[[i]], by ="GoTerm", all = TRUE, suffixes = c(paste(".", names(go.sigtop.l)[i-1], sep=""), paste(".", names(go.sigtop.l)[i], sep="")))
}
df.m[is.na(df.m)] <- 0
> head(df.m)
GoTerm PValue.SRSF1_cyto Fold.Enrichment.SRSF1_cyto PValue.SRSF1_total Fold.Enrichment.SRSF1_total PValue.SRSF2_cyto
1 aminoacyl-tRNA ligase activity 0.000000000 0.000000 0 0 0
2 beta-catenin binding 0.000000000 0.000000 0 0 0
3 cell adhesion molecule binding 0.000000000 0.000000 0 0 0
4 cytochrome-c oxidase activity 0.000000000 0.000000 0 0 0
5 cytoskeletal protein binding 0.000000000 0.000000 0 0 0
6 general RNA polymerase II transcription factor activity 0.007097505 4.840801 0 0 0
Fold.Enrichment.SRSF2_cyto PValue.SRSF2_total Fold.Enrichment.SRSF2_total PValue.SRSF3_cyto Fold.Enrichment.SRSF3_cyto PValue.SRSF3_total Fold.Enrichment.SRSF3_total
1 0 0 0 0.000000000 0.000000 0 0
2 0 0 0 0.000186408 5.037574 0 0
3 0 0 0 0.000000000 0.000000 0 0
4 0 0 0 0.000000000 0.000000 0 0
5 0 0 0 0.000000000 0.000000 0 0
6 0 0 0 0.000000000 0.000000 0 0
PValue.SRSF4_cyto Fold.Enrichment.SRSF4_cyto PValue.SRSF4_total Fold.Enrichment.SRSF4_total PValue.SRSF5_cyto Fold.Enrichment.SRSF5_cyto PValue.SRSF5_total
1 0.0000000 0.00000 0.0000000 0.000000 0 0 0
2 0.0000000 0.00000 0.0000000 0.000000 0 0 0
3 0.0000000 0.00000 0.0000000 0.000000 0 0 0
4 0.0025874 14.26516 0.0000000 0.000000 0 0 0
5 0.0000000 0.00000 0.0053485 4.239176 0 0 0
6 0.0000000 0.00000 0.0000000 0.000000 0 0 0
Fold.Enrichment.SRSF5_total PValue.SRSF6_cyto Fold.Enrichment.SRSF6_cyto PValue.SRSF6_total Fold.Enrichment.SRSF6_total PValue.SRSF7_cyto Fold.Enrichment.SRSF7_cyto
1 0 0.0007474458 12.03623 0 0 0 0
2 0 0.0000000000 0.00000 0 0 0 0
3 0 0.0000000000 0.00000 0 0 0 0
4 0 0.0000000000 0.00000 0 0 0 0
5 0 0.0000000000 0.00000 0 0 0 0
6 0 0.0000000000 0.00000 0 0 0 0
PValue.SRSF7_total Fold.Enrichment.SRSF7_total
1 0.000000000 0.00000
2 0.000000000 0.00000
3 0.009078473 20.42213
4 0.000000000 0.00000
5 0.000000000 0.00000
6 0.000000000 0.00000
df.m这并不漂亮,但可以通过for循环来完成。如果有更好的解决方案,我会接受它,而不是这个:
df.m <- go.sigtop.l[[1]]
for (i in 2:length(names(go.sigtop.l))){
df.m <- merge(df.m, go.sigtop.l[[i]], by ="GoTerm", all = TRUE, suffixes = c(paste(".", names(go.sigtop.l)[i-1], sep=""), paste(".", names(go.sigtop.l)[i], sep="")))
}
df.m[is.na(df.m)] <- 0
> head(df.m)
GoTerm PValue.SRSF1_cyto Fold.Enrichment.SRSF1_cyto PValue.SRSF1_total Fold.Enrichment.SRSF1_total PValue.SRSF2_cyto
1 aminoacyl-tRNA ligase activity 0.000000000 0.000000 0 0 0
2 beta-catenin binding 0.000000000 0.000000 0 0 0
3 cell adhesion molecule binding 0.000000000 0.000000 0 0 0
4 cytochrome-c oxidase activity 0.000000000 0.000000 0 0 0
5 cytoskeletal protein binding 0.000000000 0.000000 0 0 0
6 general RNA polymerase II transcription factor activity 0.007097505 4.840801 0 0 0
Fold.Enrichment.SRSF2_cyto PValue.SRSF2_total Fold.Enrichment.SRSF2_total PValue.SRSF3_cyto Fold.Enrichment.SRSF3_cyto PValue.SRSF3_total Fold.Enrichment.SRSF3_total
1 0 0 0 0.000000000 0.000000 0 0
2 0 0 0 0.000186408 5.037574 0 0
3 0 0 0 0.000000000 0.000000 0 0
4 0 0 0 0.000000000 0.000000 0 0
5 0 0 0 0.000000000 0.000000 0 0
6 0 0 0 0.000000000 0.000000 0 0
PValue.SRSF4_cyto Fold.Enrichment.SRSF4_cyto PValue.SRSF4_total Fold.Enrichment.SRSF4_total PValue.SRSF5_cyto Fold.Enrichment.SRSF5_cyto PValue.SRSF5_total
1 0.0000000 0.00000 0.0000000 0.000000 0 0 0
2 0.0000000 0.00000 0.0000000 0.000000 0 0 0
3 0.0000000 0.00000 0.0000000 0.000000 0 0 0
4 0.0025874 14.26516 0.0000000 0.000000 0 0 0
5 0.0000000 0.00000 0.0053485 4.239176 0 0 0
6 0.0000000 0.00000 0.0000000 0.000000 0 0 0
Fold.Enrichment.SRSF5_total PValue.SRSF6_cyto Fold.Enrichment.SRSF6_cyto PValue.SRSF6_total Fold.Enrichment.SRSF6_total PValue.SRSF7_cyto Fold.Enrichment.SRSF7_cyto
1 0 0.0007474458 12.03623 0 0 0 0
2 0 0.0000000000 0.00000 0 0 0 0
3 0 0.0000000000 0.00000 0 0 0 0
4 0 0.0000000000 0.00000 0 0 0 0
5 0 0.0000000000 0.00000 0 0 0 0
6 0 0.0000000000 0.00000 0 0 0 0
PValue.SRSF7_total Fold.Enrichment.SRSF7_total
1 0.000000000 0.00000
2 0.000000000 0.00000
3 0.009078473 20.42213
4 0.000000000 0.00000
5 0.000000000 0.00000
6 0.000000000 0.00000
df.m您是否尝试过此功能:
请参见链接中的此代码:
df1 <- data.frame(list(A=1:10), B=LETTERS[1:10], C=rnorm(10) )
df2 <- data.frame(A=11:20, D=rnorm(10), E=letters[1:10] )
df3 <- df1
df4 <- df2
out <- smartbind( list(df1, df2, df3, df4))
df1您是否尝试过此功能:
请参见链接中的此代码:
df1 <- data.frame(list(A=1:10), B=LETTERS[1:10], C=rnorm(10) )
df2 <- data.frame(A=11:20, D=rnorm(10), E=letters[1:10] )
df3 <- df1
df4 <- df2
out <- smartbind( list(df1, df2, df3, df4))
df1它们是否都有相同的列?然后只需执行do.call(rbind,go.sigtop.l)
。它们执行@Thomas,但这会为GoTerm创建一个包含重复行的数据帧。在发布之前我已经尝试过:)那么你想对那些重复的行做什么呢?我不想。目标是创建一个表,其中每一行表示一个唯一的GoTerm,每一列表示每个条件下该go项的一组值。我打算选择其中的一些值来创建一个矩阵,然后聚集/绘制热图。无论如何,我想我已经找到了使用for循环的解决方案。虽然不漂亮,但能胜任。我现在就发布。是的,请发布它作为答案。它们都有相同的栏目吗?然后只需执行do.call(rbind,go.sigtop.l)
。它们执行@Thomas,但这会为GoTerm创建一个包含重复行的数据帧。在发布之前我已经尝试过:)那么你想对那些重复的行做什么呢?我不想。目标是创建一个表,其中每一行表示一个唯一的GoTerm,每一列表示每个条件下该go项的一组值。我打算选择其中的一些值来创建一个矩阵,然后聚集/绘制热图。无论如何,我想我已经找到了使用for循环的解决方案。虽然不漂亮,但能胜任。我现在就发布它。是的,请将其作为答案发布。它不适用于列表和列表中数量不确定的元素。您可以进行foreach并浏览您的列表,然后使用smartbind函数将它们合并。我想这是你想要的。。。我应该发布一个可能的例子吗!?是,由于dfs的行数不相同,它似乎给出了一个错误:smartbind(go.sigtop.l)data.frame(SRSF1_total=list(GoTerm=c)中的错误(5L、3L、7L、10L、9L,:参数意味着不同的行数:6、5、1、3
它不适用于列表和列表中数量不确定的元素。您可以进行foreach并遍历您的列表,然后与smartbind函数合并它们。我想这是您想要的…我是否应该发布一个可能的示例!?是的,请看ms由于dfs没有相同的行数而给出错误:smartbind(go.sigtop.l)data.frame中的错误(SRSF1_total=list)(GoTerm=c(5L,3L,7L,10L,9L,:参数表示不同的行数:6,5,1,3