在R数据帧的两列中查找字符串之间的交集
我试图为数据帧中的每一行找到两列之间的公共字在R数据帧的两列中查找字符串之间的交集,r,string,intersection,R,String,Intersection,我试图为数据帧中的每一行找到两列之间的公共字 例如,我的输入是: C1 | C2 Roy goes to Japan | Roy goes to Australia I go to Japan | You go to Japan 我需要一列附加为 C1 | C2 | Result Roy goes to Japan | Roy goes to Australia | Roy goes t
例如,我的输入是:
C1 | C2
Roy goes to Japan | Roy goes to Australia
I go to Japan | You go to Japan
我需要一列附加为
C1 | C2 | Result
Roy goes to Japan | Roy goes to Australia | Roy goes to
I go to Japan | He goes to Japan | to Japan
我尝试了相交,但它给出了C1和C2之间的相交,而不是C1和C2的每个元素。我想我必须使用
stringr
或stringi
中的内容,但不确定是什么。另外,我的数据集非常庞大,因此使用快速
会很好。您可以在空白处拆分字符串,然后使用相交
查找常用词
df$result <- mapply(function(x, y) paste0(intersect(x, y), collapse = " "),
strsplit(df$C1, '\\s'), strsplit(df$C2, '\\s'))
df
# C1 C2 result
#1 Roy goes to Japan Roy goes to Australia Roy goes to
#2 I go to Japan He goes to Japan to Japan
数据
df <- structure(list(C1 = c("Roy goes to Japan", "I go to Japan"),
C2 = c("Roy goes to Australia", "He goes to Japan")), row.names = c(NA,
-2L), class = "data.frame")
df
df <- structure(list(C1 = c("Roy goes to Japan", "I go to Japan"),
C2 = c("Roy goes to Australia", "He goes to Japan")), row.names = c(NA,
-2L), class = "data.frame")