R 匹配字符串并返回不匹配的单词
我想在两列之间匹配一串单词并返回不匹配的单词 示例数据帧:R 匹配字符串并返回不匹配的单词,r,string,R,String,我想在两列之间匹配一串单词并返回不匹配的单词 示例数据帧: data = data.frame(animal1 = c("cat, dog, horse, mouse", "cat, dog, horse", "mouse, frog", "cat, dog, frog, cow"), animal2 = c("dog, horse, mouse", "cat, horse", "frog", "cat, dog, frog")) animal1
data = data.frame(animal1 = c("cat, dog, horse, mouse", "cat, dog, horse", "mouse, frog", "cat, dog, frog, cow"), animal2 = c("dog, horse, mouse", "cat, horse", "frog", "cat, dog, frog"))
animal1 animal2 unique_animal
1 cat, dog, horse, mouse dog, horse, mouse cat
2 cat, dog, horse cat, horse dog
3 mouse, frog frog mouse
4 cat, dog, frog, cow cat, dog, frog cow
我想添加一个新的列“unique_animal”,其中包含生成的数据框:
data = data.frame(animal1 = c("cat, dog, horse, mouse", "cat, dog, horse", "mouse, frog", "cat, dog, frog, cow"), animal2 = c("dog, horse, mouse", "cat, horse", "frog", "cat, dog, frog"))
animal1 animal2 unique_animal
1 cat, dog, horse, mouse dog, horse, mouse cat
2 cat, dog, horse cat, horse dog
3 mouse, frog frog mouse
4 cat, dog, frog, cow cat, dog, frog cow
我已尝试了此问题中的代码:
逗号不是问题,我可以轻松删除它们。但当不匹配的单词位于字符串末尾时,它就不起作用了。出于某种原因,在这种情况下,它不计算元素的总数。你知道如何修改这个代码,使它不会这样做吗?还是另一种方法
谢谢大家! 在拆分
,\\s*
处的列后,我们可以使用map2
与setdiff
library(dplyr)
library(purrr)
library(stringr)
data %>%
mutate(unique_animal = map2_chr(strsplit(as.character(animal1), ",\\s+"),
strsplit(as.character(animal2), ",\\s+"),
~ str_c(setdiff(.x, .y), collapse=", ")))
# animal1 animal2 unique_animal
#1 cat, dog, horse, mouse dog, horse, mouse cat
#2 cat, dog, horse cat, horse dog
#3 mouse, frog frog mouse
#4 cat, dog, frog, cow cat, dog, frog cow