R 如何基于字符串进行组合
我有一个包含许多列的数据框,如下所示R 如何基于字符串进行组合,r,R,我有一个包含许多列的数据框,如下所示 Column1 Column2 Column3 Q9Y6Y8 P28074 Q9Y6A4 Q9Y6W5 P28066 Q9Y623 Q9Y6H1 P27695 Q9Y5W9 Q5T1J5 P25786;Q9Y623 Q9Y6A4 Q9Y623;P27695;Q9Y
Column1 Column2 Column3
Q9Y6Y8 P28074 Q9Y6A4
Q9Y6W5 P28066 Q9Y623
Q9Y6H1 P27695 Q9Y5W9
Q5T1J5 P25786;Q9Y623
Q9Y6A4
Q9Y623;P27695;Q9Y623
Q9Y5W9
Q9Y6Y8
所以我想先把它们放在一起,得到它们的独特之处,如下图所示
Q9Y6Y8
Q9Y6W5
Q9Y6H1
Q5T1J5
Q9Y6A4
Q9Y623
P27695
Q9Y623
Q9Y5W9
Q9Y6Y8
P25786
P28074
P28066
Q9Y6Y8 Q9Y6W5
Q9Y6Y8 Q9Y6H1
Q9Y6Y8 Q9Y6A4
Q9Y6Y8 Q5T1J5
Q9Y6Y8 Q9Y6A4
Q9Y6Y8 Q9Y623
Q9Y6Y8 P27695
Q9Y6Y8 Q9Y623
.
.
.
Q9Y6W5 Q9Y6H1
Q9Y6W5 Q9Y6A4
Q9Y6W5 Q5T1J5
.
.
.
然后我想要一个所有字符串的组合,两个接两个,如下所示
Q9Y6Y8
Q9Y6W5
Q9Y6H1
Q5T1J5
Q9Y6A4
Q9Y623
P27695
Q9Y623
Q9Y5W9
Q9Y6Y8
P25786
P28074
P28066
Q9Y6Y8 Q9Y6W5
Q9Y6Y8 Q9Y6H1
Q9Y6Y8 Q9Y6A4
Q9Y6Y8 Q5T1J5
Q9Y6Y8 Q9Y6A4
Q9Y6Y8 Q9Y623
Q9Y6Y8 P27695
Q9Y6Y8 Q9Y623
.
.
.
Q9Y6W5 Q9Y6H1
Q9Y6W5 Q9Y6A4
Q9Y6W5 Q5T1J5
.
.
.
直到所有字符串都在巴黎一次之前,我们可以通过
将data.frame(因为data.frame是列表
)取消列表
到向量
,按拆分
,然后取消列出列表
输出(来自strsplit
),并获得唯一的元素作为向量
Un1 <- unique(unlist(strsplit(unlist(df1), ";")))
或者,如果我们只需要有限的组合,则可以使用combn
t(combn(Un1, 2))
# [,1] [,2]
# [1,] "Q9Y6Y8" "Q9Y6W5"
# [2,] "Q9Y6Y8" "Q9Y6H1"
# [3,] "Q9Y6Y8" "Q5T1J5"
# [4,] "Q9Y6Y8" "Q9Y6A4"
# [5,] "Q9Y6Y8" "Q9Y623"
# [6,] "Q9Y6Y8" "P27695"
# [7,] "Q9Y6Y8" "Q9Y5W9"
# [8,] "Q9Y6Y8" "P28074"
# [9,] "Q9Y6Y8" "P28066"
#[10,] "Q9Y6Y8" "P25786"
#[11,] "Q9Y6W5" "Q9Y6H1"
#[12,] "Q9Y6W5" "Q5T1J5"
#[13,] "Q9Y6W5" "Q9Y6A4"
#[14,] "Q9Y6W5" "Q9Y623"
#[15,] "Q9Y6W5" "P27695"
#[16,] "Q9Y6W5" "Q9Y5W9"
#[17,] "Q9Y6W5" "P28074"
#[18,] "Q9Y6W5" "P28066"
#[19,] "Q9Y6W5" "P25786"
#[20,] "Q9Y6H1" "Q5T1J5"
#[21,] "Q9Y6H1" "Q9Y6A4"
#[22,] "Q9Y6H1" "Q9Y623"
#[23,] "Q9Y6H1" "P27695"
#[24,] "Q9Y6H1" "Q9Y5W9"
#[25,] "Q9Y6H1" "P28074"
#[26,] "Q9Y6H1" "P28066"
#[27,] "Q9Y6H1" "P25786"
#[28,] "Q5T1J5" "Q9Y6A4"
#[29,] "Q5T1J5" "Q9Y623"
#[30,] "Q5T1J5" "P27695"
#[31,] "Q5T1J5" "Q9Y5W9"
#[32,] "Q5T1J5" "P28074"
#[33,] "Q5T1J5" "P28066"
#[34,] "Q5T1J5" "P25786"
#[35,] "Q9Y6A4" "Q9Y623"
#[36,] "Q9Y6A4" "P27695"
#[37,] "Q9Y6A4" "Q9Y5W9"
#[38,] "Q9Y6A4" "P28074"
#[39,] "Q9Y6A4" "P28066"
#[40,] "Q9Y6A4" "P25786"
#[41,] "Q9Y623" "P27695"
#[42,] "Q9Y623" "Q9Y5W9"
#[43,] "Q9Y623" "P28074"
#[44,] "Q9Y623" "P28066"
#[45,] "Q9Y623" "P25786"
#[46,] "P27695" "Q9Y5W9"
#[47,] "P27695" "P28074"
#[48,] "P27695" "P28066"
#[49,] "P27695" "P25786"
#[50,] "Q9Y5W9" "P28074"
#[51,] "Q9Y5W9" "P28066"
#[52,] "Q9Y5W9" "P25786"
#[53,] "P28074" "P28066"
#[54,] "P28074" "P25786"
#[55,] "P28066" "P25786"
注意:这里我假设所有列都是字符
类。@nik您的列是因子
。所以strsplit(as.character(unlist(df1)),“,”
我喜欢你的答案,但我必须等待2分钟,然后接受it@nik添加了一些描述。@nikres是的,我刚刚检查了combn函数,这是有史以来最伟大的函数