R 数据帧中的公共元素

R 数据帧中的公共元素,r,dataframe,bioinformatics,intersection,R,Dataframe,Bioinformatics,Intersection,我有三个数据帧,包含大量信息和以下行名称: ENSG00000000971 ENSG00000000971 ENSG00000000971 ENSG00000004139 ENSG00000004139 ENSG00000003987 ENSG00000005001 ENSG00000004848 ENSG00000004848 ENSG00000005102 ENSG00000002330 ENSG00000002330 ENSG00000005486 ENSG00000005102 ENSG

我有三个数据帧,包含大量信息和以下行名称:

ENSG00000000971 ENSG00000000971 ENSG00000000971
ENSG00000004139 ENSG00000004139 ENSG00000003987
ENSG00000005001 ENSG00000004848 ENSG00000004848
ENSG00000005102 ENSG00000002330 ENSG00000002330
ENSG00000005486 ENSG00000005102 ENSG00000006047
...             ...             ...
我想做的是找到至少两个数据帧中的所有公共项(行名称)。也就是说,最终结果应该是一个列表,如下所示:

ENSG00000000971
ENSG00000004139
ENSG00000004848
ENSG00000005102
ENSG00000002330
我该怎么做呢?我试着这样做:

shared.DESeq2.edgeR = data.frame(row.names(res.DESeq2) %in% row.names(res.edgeR))
shared.DESeq2.limma = data.frame(row.names(res.DESeq2) %in% row.names(res.limma))
shared.edgeR.limma = data.frame(row.names(res.edgeR) %in% row.names(res.limma))
shared = merge(merge(shared.DESeq2.edgeR, shared.DESeq2.limma), shared.edgeR.limma)
。。。其中三个
res.[DESeq2/edgeR/limma]
是三个数据帧,但这需要很长时间才能运行(我甚至没有让它完成,所以我不知道它是否真的有效)。我有一些代码可以为所有三个数据帧共用的元素执行此操作,但我也对两个或更多数据帧共用的元素感兴趣,但我真的找不到一个好方法来执行此操作。有什么想法吗?

试试这个例子:

#dummy data, with real data we would do: res.DESeq2_rn <-row.names(res.DESeq2)
res.DESeq2_rn <- letters[1:4]
res.edgeR_rn <- letters[3:8]
res.limma_rn <- letters[c(1,3,8,10)]

#get counts
res <- table(c(res.DESeq2_rn, res.edgeR_rn, res.limma_rn))
res
# a b c d e f g h j 
# 2 1 3 2 1 1 1 2 1 

#result
names(res)[ res>=2 ]
#[1] "a" "c" "d" "h"

#虚拟数据,我们将使用真实数据:res.DESeq2_rn另一种方法,采用@zx8754的样本数据:

# dummy data
res.DESeq2 <- letters[ 1:4 ]
res.edgeR <- letters[ 3:8 ]
res.limma <- letters[ c( 1, 3, 8, 10 ) ]

# combine into one vector                  
res <- c( res.DESeq2, res.edgeR, res.limma )
res
[1] "a" "b" "c" "d" "c" "d" "e" "f" "g" "h" "a" "c" "h" "j"

# result
unique( res[ which( duplicated( res ) ) ] )
[1] "c" "d" "a" "h"                  
#虚拟数据

res.DESeq2是否有任何数据帧包含重复项?否,任何数据帧中都没有重复的行名称。是的,基准测试表明您的方法最快。见我编辑的帖子。
# dummy data
res.DESeq2 <- letters[ 1:4 ]
res.edgeR <- letters[ 3:8 ]
res.limma <- letters[ c( 1, 3, 8, 10 ) ]

# combine into one vector                  
res <- c( res.DESeq2, res.edgeR, res.limma )
res
[1] "a" "b" "c" "d" "c" "d" "e" "f" "g" "h" "a" "c" "h" "j"

# result
unique( res[ which( duplicated( res ) ) ] )
[1] "c" "d" "a" "h"                  
# create a large random character vector (this takes a lot of time!)
res <- rep( "x", 1000000 )
for( i in 1:1000000) 
    res[ i ] <- paste( sample( letters, 8, replace = TRUE ), collapse = "" )
head( res )
[1] "vsvkljgr" "ulxhqnas" "upqqtrdk" "pynuaihp" "srjtnvqm" "mxnlytvd"

# vaettchen:
system.time( x <- unique( res[ which( duplicated( res ) ) ] ) )
 user  system elapsed 
0.173   0.000   0.171 
x
[1] "zlzlwinb" "wielycpx"

# zx8754
system.time( { y <- table( res ); z <- names( y )[ y >= 2 ] } )
  user  system elapsed
18.945   0.020  19.058 
z
[1] "wielycpx" "zlzlwinb"