在数据帧中合并列以获得r中可变的列数
我有一个数据框,如下所示。我希望从V2开始合并列(pref和common-between),但在合并列中排除NAs。每行中有不同数量的带有NAs的列在数据帧中合并列以获得r中可变的列数,r,R,我有一个数据框,如下所示。我希望从V2开始合并列(pref和common-between),但在合并列中排除NAs。每行中有不同数量的带有NAs的列 V1 V2 V3 V4 V5 V6 V7 chr11:69464719-69502928 CCND1 ORAOV1 NA NA NA NA chr7:55075808-5509
V1 V2 V3 V4 V5 V6 V7
chr11:69464719-69502928 CCND1 ORAOV1 NA NA NA NA
chr7:55075808-55093954 EGFR NA NA NA NA NA
chr3:169389459-169490555 TERC ARPM1 NA NA NA NA
chr1:150496857-150678056 ENSA MCL1 ADAMTSL4 GOLPH3L HORMAD1 MIR4257
我希望的结果是:
V1 V2
chr11:69464719-69502928 CCND1,ORAOV1
chr7:55075808-55093954 EGFR
chr3:169389459-169490555 TERC,ARPM1
chr1:150496857-150678056 ENSA,MCL1,ADAMTSL4,GOLPH3L,HORMAD1,MIR4257
我知道如何连接固定列,但排除NA的变量列让我感到困惑 我们可以使用
apply
和MARGIN=1
(不包括第1列),paste
非NA元素(toString
是粘贴(,,collapse=,'))的包装器。
)
数据
df1
V2 <- apply(df1[-1],1, function(x) toString(x[!is.na(x)]))
res <- data.frame(V1=df1[,1], V2, stringsAsFactors=FALSE)
res
# V1 V2
#1 chr11:69464719-69502928 CCND1, ORAOV1
#2 chr7:55075808-55093954 EGFR
#3 chr3:169389459-169490555 TERC, ARPM1
#4 chr1:150496857-150678056 ENSA, MCL1, ADAMTSL4, GOLPH3L, HORMAD1, MIR4257
library(data.table)
melt(setDT(df1), id.var='V1', na.rm=TRUE)[, list(V2=toString(value)) , V1]
# V1 V2
#1: chr11:69464719-69502928 CCND1, ORAOV1
#2: chr7:55075808-55093954 EGFR
#3: chr3:169389459-169490555 TERC, ARPM1
#4: chr1:150496857-150678056 ENSA, MCL1, ADAMTSL4, GOLPH3L, HORMAD1, MIR4257
df1 <- structure(list(V1 = c("chr11:69464719-69502928",
"chr7:55075808-55093954",
"chr3:169389459-169490555", "chr1:150496857-150678056"),
V2 = c("CCND1",
"EGFR", "TERC", "ENSA"), V3 = c("ORAOV1", NA, "ARPM1", "MCL1"
), V4 = c(NA, NA, NA, "ADAMTSL4"), V5 = c(NA, NA, NA, "GOLPH3L"
), V6 = c(NA, NA, NA, "HORMAD1"), V7 = c(NA, NA, NA, "MIR4257"
)), .Names = c("V1", "V2", "V3", "V4", "V5", "V6", "V7"),
class = "data.frame", row.names = c(NA, -4L))