R-提取匹配的字符串,将其拆分为多个列,这些列由字典向量匹配
我想提取R-提取匹配的字符串,将其拆分为多个列,这些列由字典向量匹配,r,string,R,String,我想提取目标数据中偏好列的特定字符串,该字符串由字典匹配。以下是我的数据: dictionary <- c("apple", "banana", "orange", "grape") target <- data.frame("user" = c("A", "B", "C"), "favor" = c("I like apple and banana", "grape and kiwi", "orange, banana and grape
目标
数据中偏好
列的特定字符串,该字符串由字典
匹配。以下是我的数据:
dictionary <- c("apple", "banana", "orange", "grape")
target <- data.frame("user" = c("A", "B", "C"),
"favor" = c("I like apple and banana", "grape and kiwi", "orange, banana and grape are the best"))
target
user favor
1 A I like apple and banana
2 B grape and kiwi
3 C orange, banana and grape are the best
任何帮助都将感激不尽。#删除“target$follow”中字典中没有的所有单词
# Remove all words from `target$favor` that are not in the dictionary
result <- lapply(strsplit(target$favor, ',| '), function(x) { x[x %in% dictionary] })
result
# [[1]]
# [1] "apple" "banana"
#
# [[2]]
# [1] "grape"
#
# [[3]]
# [1] "orange" "banana" "grape"
# Fill in NAs when the rows have different numbers of items
result <- lapply(result, `length<-`, max(lengths(result)))
# Rebuild the data.frame using the list of words in each row
cbind(target[ , 'user', drop = F], do.call(rbind, result))
# user 1 2 3
# 1 A apple banana <NA>
# 2 B grape <NA> <NA>
# 3 C orange banana grape
结果#从'target$follow'中删除字典中没有的所有单词
结果您的最佳选择可能是对每行应用str\u extract\u all
library(stringr)
result <- t(apply(target, 1,
function(x) str_extract_all(x[['favor']], dictionary, simplify = TRUE)))
[,1] [,2] [,3] [,4]
[1,] "apple" "banana" "" ""
[2,] "" "" "" "grape"
[3,] "" "banana" "orange" "grape"
库(stringr)
结果您的最佳选择可能是对每行应用str\u extract\u all
library(stringr)
result <- t(apply(target, 1,
function(x) str_extract_all(x[['favor']], dictionary, simplify = TRUE)))
[,1] [,2] [,3] [,4]
[1,] "apple" "banana" "" ""
[2,] "" "" "" "grape"
[3,] "" "banana" "orange" "grape"
库(stringr)
结果可能重复的可能重复(可选地,第一个lappy
可能类似于:lappy(目标$follow,函数(x)regmatches(x,gregexpr(粘贴(dictionary,collapse=“|”)x)))
这样你就不必删除字符串。如果你想的话,你也可以在赋值中使用data.frame
,比如:target[,c(“favor_1”,“favor_2”,3)],或者,第一个lappy
可以是这样的:lappy(target$favour,function(x)regmatches(x,gregexpr(paste(dictionary,collapse=“”),x)))
这样你就不必删除字符串。如果你愿意,你也可以在作业中使用data.frame
,比如:target[,c(“赞成1”,“赞成2”,3)]