R-提取匹配的字符串，将其拆分为多个列，这些列由字典向量匹配_R_String

R-提取匹配的字符串，将其拆分为多个列，这些列由字典向量匹配

r string

R-提取匹配的字符串，将其拆分为多个列，这些列由字典向量匹配,r,string,R,String,我想提取目标数据中偏好列的特定字符串，该字符串由字典匹配。以下是我的数据： dictionary <- c("apple", "banana", "orange", "grape") target <- data.frame("user" = c("A", "B", "C"), "favor" = c("I like apple and banana", "grape and kiwi", "orange, banana and grape

我想提取

目标

数据中

偏好

列的特定字符串，该字符串由

字典

匹配。以下是我的数据：

dictionary <- c("apple", "banana", "orange", "grape")

target <- data.frame("user" = c("A", "B", "C"),
                     "favor" = c("I like apple and banana", "grape and kiwi", "orange, banana and grape are the best"))
target
  user                                 favor
1    A               I like apple and banana
2    B                        grape and kiwi
3    C orange, banana and grape are the best

任何帮助都将感激不尽。

#删除“target$follow”中字典中没有的所有单词
# Remove all words from `target$favor` that are not in the dictionary
result <- lapply(strsplit(target$favor, ',| '), function(x) { x[x %in% dictionary] })
result
# [[1]]
# [1] "apple"  "banana"
# 
# [[2]]
# [1] "grape" 
# 
# [[3]]
# [1] "orange" "banana" "grape" 

# Fill in NAs when the rows have different numbers of items
result <- lapply(result, `length<-`, max(lengths(result)))

# Rebuild the data.frame using the list of words in each row
cbind(target[ , 'user', drop = F], do.call(rbind, result))
#   user      1      2     3
# 1    A  apple banana  <NA>
# 2    B  grape   <NA>  <NA>
# 3    C orange banana grape

结果#从'target$follow'中删除字典中没有的所有单词
结果您的最佳选择可能是对每行应用str\u extract\u all

library(stringr)
result <- t(apply(target, 1,
                  function(x) str_extract_all(x[['favor']], dictionary, simplify = TRUE)))

     [,1]    [,2]     [,3]     [,4]   
[1,] "apple" "banana" ""       ""     
[2,] ""      ""       ""       "grape"
[3,] ""      "banana" "orange" "grape"

库（stringr）
结果您的最佳选择可能是对每行应用str\u extract\u all

library(stringr)
result <- t(apply(target, 1,
                  function(x) str_extract_all(x[['favor']], dictionary, simplify = TRUE)))

     [,1]    [,2]     [,3]     [,4]   
[1,] "apple" "banana" ""       ""     
[2,] ""      ""       ""       "grape"
[3,] ""      "banana" "orange" "grape"

库（stringr）
结果可能重复的可能重复（可选地，第一个lappy
可能类似于：lappy（目标$follow，函数（x）regmatches（x，gregexpr（粘贴（dictionary，collapse=“|”）x）））
这样你就不必删除字符串。如果你想的话，你也可以在赋值中使用data.frame
，比如：target[，c（“favor_1”，“favor_2”，3）]，或者，第一个lappy
可以是这样的：lappy（target$favour，function（x）regmatches（x，gregexpr（paste（dictionary，collapse=“”），x）））
这样你就不必删除字符串。如果你愿意，你也可以在作业中使用data.frame
，比如：target[，c（“赞成1”，“赞成2”，3）]