R两列中的精确匹配字符串_R_String Matching_Data Manipulation

R两列中的精确匹配字符串

R两列中的精确匹配字符串,r,string-matching,data-manipulation,R,String Matching,Data Manipulation,我有以下形式的数据框： Column1 = c('Elephant,Starship Enterprise,Cat','Random word','Word','Some more words, Even more words') Column2=c('Rat,Starship Enterprise,Elephant','Ocean','No','more') d1 = data.frame(Column1,Column2) 我想做的是查找并计算第1列和第2列中单词的精确匹配。每列可以有多个

我有以下形式的数据框：

Column1 = c('Elephant,Starship Enterprise,Cat','Random word','Word','Some more words, Even more words')
Column2=c('Rat,Starship Enterprise,Elephant','Ocean','No','more')
d1 = data.frame(Column1,Column2)

我想做的是查找并计算第1列和第2列中单词的精确匹配。每列可以有多个用逗号分隔的单词

例如，在第一行中，我们看到两个常用词a）星舰企业号和b）大象号。但是，在第4行中，即使单词“more”出现在两列中，也不会出现确切的字符串（更多的单词甚至更多的单词）。预期的输出是这样的

任何帮助都将不胜感激。

以逗号分隔列，并计算单词的交叉点

mapply(function(x, y) length(intersect(x, y)), 
        strsplit(d1$Column1, ","), strsplit(d1$Column2, ","))
#[1] 2 0 0 0

或者是一种

tidyverse

方式

library(tidyverse)
d1 %>%
  mutate(Common = map2_dbl(Column1, Column2, ~ 
      length(intersect(str_split(.x, ",")[[1]], str_split(.y, ",")[[1]]))))


#                           Column1                          Column2 Common
#1 Elephant,Starship Enterprise,Cat Rat,Starship Enterprise,Elephant      2
#2                      Random word                            Ocean      0
#3                             Word                               No      0
#4 Some more words, Even more words                             more      0

我们可以使用

cSplit

library(splitstackshape)
library(data.table)
v1 <- cSplit(setDT(d1, keep.rownames = TRUE), 2:3, ",", "long")[, 
    length(intersect(na.omit(Column1), na.omit(Column2))), rn]$V1
d1[, Common := v1][, rn := NULL][]
#                             Column1                          Column2 Common
#1: Elephant,Starship Enterprise,Cat Rat,Starship Enterprise,Elephant      2
#2:                      Random word                            Ocean      0
#3:                             Word                               No      0
#4: Some more words, Even more words                             more      0

库（splitstackshape）
库（数据表）
v1