R 匹配字符串并返回不匹配的单词

R 匹配字符串并返回不匹配的单词,r,string,R,String,我想在两列之间匹配一串单词并返回不匹配的单词 示例数据帧: data = data.frame(animal1 = c("cat, dog, horse, mouse", "cat, dog, horse", "mouse, frog", "cat, dog, frog, cow"), animal2 = c("dog, horse, mouse", "cat, horse", "frog", "cat, dog, frog")) animal1

我想在两列之间匹配一串单词并返回不匹配的单词

示例数据帧:

data = data.frame(animal1 = c("cat, dog, horse, mouse", "cat, dog, horse", "mouse, frog", "cat, dog, frog, cow"), animal2 = c("dog, horse, mouse", "cat, horse", "frog", "cat, dog, frog"))
                 animal1           animal2 unique_animal
1 cat, dog, horse, mouse dog, horse, mouse           cat
2        cat, dog, horse        cat, horse           dog
3            mouse, frog              frog         mouse
4    cat, dog, frog, cow    cat, dog, frog           cow
我想添加一个新的列“unique_animal”,其中包含生成的数据框:

data = data.frame(animal1 = c("cat, dog, horse, mouse", "cat, dog, horse", "mouse, frog", "cat, dog, frog, cow"), animal2 = c("dog, horse, mouse", "cat, horse", "frog", "cat, dog, frog"))
                 animal1           animal2 unique_animal
1 cat, dog, horse, mouse dog, horse, mouse           cat
2        cat, dog, horse        cat, horse           dog
3            mouse, frog              frog         mouse
4    cat, dog, frog, cow    cat, dog, frog           cow
我已尝试了此问题中的代码:

逗号不是问题,我可以轻松删除它们。但当不匹配的单词位于字符串末尾时,它就不起作用了。出于某种原因,在这种情况下,它不计算元素的总数。你知道如何修改这个代码,使它不会这样做吗?还是另一种方法


谢谢大家!

在拆分
,\\s*
处的列后,我们可以使用
map2
setdiff

library(dplyr)
library(purrr)
library(stringr)
data %>%
   mutate(unique_animal = map2_chr(strsplit(as.character(animal1), ",\\s+"), 
                 strsplit(as.character(animal2), ",\\s+"), 
             ~ str_c(setdiff(.x, .y), collapse=", ")))
#                 animal1           animal2 unique_animal
#1 cat, dog, horse, mouse dog, horse, mouse           cat
#2        cat, dog, horse        cat, horse           dog
#3            mouse, frog              frog         mouse
#4    cat, dog, frog, cow    cat, dog, frog           cow