R 如何从多个文本中识别和检索多个模式?

R 如何从多个文本中识别和检索多个模式?,r,regex,string,R,Regex,String,我希望这不会被标记为重复。我见过类似的stackoverflow帖子,但我不能让它为我工作 我的目标是: 第一:我想检测主_-df中是否有辅助_-df中的变量“Code”。 第二:在检测到后,我想创建一个列,带有识别的代码。例如,对于文本“School Performance,我希望有一行类似于“A1、A6、A7” main_df <- read.table(header = TRUE, stringsAsFactors = FALSE,

我希望这不会被标记为重复。我见过类似的stackoverflow帖子,但我不能让它为我工作

我的目标是: 第一:我想检测主_-df中是否有辅助_-df中的变量“Code”。 第二:在检测到后,我想创建一个列,带有识别的代码。例如,对于文本“School Performance,我希望有一行类似于“A1、A6、A7”

main_df <- read.table(header = TRUE, 
                   stringsAsFactors = FALSE, 
                   text="Title Text
'School Performance' 'Students A1, A6 and A7 are great'
'Groceries Performance' 'Students A9, A3 are ok'
'Fruit Performance' 'A5 and A7 will be great fruit pickers'
'Jedi Performance' 'A3, A6, A5 will be great Jedis'
'Sith Performance' 'No one is very good. We should be happy.'")



auxiliary_df <- read.table(header = TRUE, 
                   stringsAsFactors = FALSE, 
                   text="FirstName Code
'Alex' 'A1'
'Figo' 'A6'
'Rui' 'A7'
'Deco' 'A5'
'Cristiano' 'A9'
'Ronaldo' 'A3'")


main\u df我们可以使用将所有
code
折叠成一个模式,并使用
str\u extract\u all
提取出现在
Text
中的所有代码,并将它们组合成一个逗号分隔的字符串

main_df$extract_string <- sapply(stringr::str_extract_all(main_df$Text, 
             paste0('\\b', auxiliary_df$Code, '\\b', collapse = '|')), toString)
main_df

#                  Title                                     Text extract_string
#1    School Performance         Students A1, A6 and A7 are great     A1, A6, A7
#2 Groceries Performance                   Students A9, A3 are ok         A9, A3
#3     Fruit Performance    A5 and A7 will be great fruit pickers         A5, A7
#4      Jedi Performance           A3, A6, A5 will be great Jedis     A3, A6, A5
#5      Sith Performance No one is very good. We should be happy.               

main_df$extract_string您尝试匹配
main_df$Title
而不是
main_df$Text
。您可以使用
gregexpr
regmatches
来提取点击(主要使用您的代码)

main_df$extract_string <- sapply(stringr::str_extract_all(main_df$Text, 
             paste0('\\b', auxiliary_df$Code, '\\b', collapse = '|')), toString)
main_df

#                  Title                                     Text extract_string
#1    School Performance         Students A1, A6 and A7 are great     A1, A6, A7
#2 Groceries Performance                   Students A9, A3 are ok         A9, A3
#3     Fruit Performance    A5 and A7 will be great fruit pickers         A5, A7
#4      Jedi Performance           A3, A6, A5 will be great Jedis     A3, A6, A5
#5      Sith Performance No one is very good. We should be happy.               
regmatches(main_df$Text, gregexpr(paste(auxiliary_df$Code, collapse = "|"),
 main_df$Text))
#[[1]]
#[1] "A1" "A6" "A7"
#
#[[2]]
#[1] "A9" "A3"
#
#[[3]]
#[1] "A5" "A7"
#
#[[4]]
#[1] "A3" "A6" "A5"
#
#[[5]]
#character(0)
#