R 检查数据帧中的字

R 检查数据帧中的字,r,lapply,R,Lapply,我有两个数据帧A和B。我想检查数据帧B中是否存在数据帧A的唯一字。如果存在,则保留该字,否则从数据帧B的每行中删除该字 A <- data.frame(name = c( "X-ray right leg arteries", "consultation of gynecologist", "x-ray leg arteries", "x-ray leg with 20km distance" ), stringsAsFactors = F) B <- data.f

我有两个数据帧A和B。我想检查数据帧B中是否存在数据帧A的唯一字。如果存在,则保留该字,否则从数据帧B的每行中删除该字

A <- data.frame(name = c(
  "X-ray right leg arteries",
  "consultation of gynecologist",
  "x-ray leg arteries",
  "x-ray leg with 20km distance"
), stringsAsFactors = F)

B <- data.frame(name = c(
  "X-ray left leg arteries",
  "consultation (inspection) of gynecalogist",
  "MRI right leg arteries",
  "X-ray right leg arteries with special care"
), stringsAsFactors = F)


k=unique(unlist(strsplit(A$name, " ")))
d = do.call(rbind, lapply(B$name, function(z) {
  xx = lapply(lapply(k, function(x) grepl(x, unlist(strsplit(z, " ")), fixed = T)), which)
  paste(k[sapply(xx, function(x) length(x)>0)], collapse = " ")
}
))

A我们可以使用“k”从“B”中提取唯一的单词,然后
将这些元素粘贴在一起,而不是多个循环

library(stringr)
unlist(lapply(str_extract_all(B$name, paste(k, collapse="|")), 
             paste, collapse=' '))

一个选项是将
sapply(xx,函数(x)长度(x)>0)
更改为
length(xx)>0
,这将更快