在R数据帧的两列中查找字符串之间的交集

在R数据帧的两列中查找字符串之间的交集,r,string,intersection,R,String,Intersection,我试图为数据帧中的每一行找到两列之间的公共字 例如,我的输入是: C1 | C2 Roy goes to Japan | Roy goes to Australia I go to Japan | You go to Japan 我需要一列附加为 C1 | C2 | Result Roy goes to Japan | Roy goes to Australia | Roy goes t

我试图为数据帧中的每一行找到两列之间的公共字
例如,我的输入是:

C1                | C2
Roy goes to Japan | Roy goes to Australia 
I go to Japan     | You go to Japan
我需要一列附加为

C1                | C2                    | Result
Roy goes to Japan | Roy goes to Australia | Roy goes to
I go to Japan     | He goes to Japan      | to Japan

我尝试了相交,但它给出了C1和C2之间的相交,而不是C1和C2的每个元素。我想我必须使用
stringr
stringi
中的内容,但不确定是什么。另外,我的数据集非常庞大,因此使用
快速
会很好。

您可以在空白处拆分字符串,然后使用
相交
查找常用词

df$result <- mapply(function(x, y) paste0(intersect(x, y), collapse = " "),
                    strsplit(df$C1, '\\s'), strsplit(df$C2, '\\s'))
df
#                 C1                    C2      result
#1 Roy goes to Japan Roy goes to Australia Roy goes to
#2     I go to Japan      He goes to Japan    to Japan
数据

df <- structure(list(C1 = c("Roy goes to Japan", "I go to Japan"), 
    C2 = c("Roy goes to Australia", "He goes to Japan")), row.names = c(NA, 
-2L), class = "data.frame")
df
df <- structure(list(C1 = c("Roy goes to Japan", "I go to Japan"), 
    C2 = c("Roy goes to Australia", "He goes to Japan")), row.names = c(NA, 
-2L), class = "data.frame")