Stringr模式用于检测大写单词_R_Stringr

Stringr模式用于检测大写单词

Stringr模式用于检测大写单词,r,stringr,R,Stringr,我试图写一个函数来检测所有大写的单词目前，代码： df <- data.frame(title = character(), id = numeric())%>% add_row(title= "THIS is an EXAMPLE where I DONT get the output i WAS hoping for", id = 6) df <- df %>% mutate(sec_code_1 = unlis

我试图写一个函数来检测所有大写的单词

目前，代码：

df <- data.frame(title = character(), id = numeric())%>%
        add_row(title= "THIS is an EXAMPLE where I DONT get the output i WAS hoping for", id = 6)

df <- df %>%
        mutate(sec_code_1 = unlist(str_extract_all(title," [A-Z]{3,5} ")[[1]][1]) 
               , sec_code_2 = unlist(str_extract_all(title," [A-Z]{3,5} ")[[1]][2]) 
               , sec_code_3 = unlist(str_extract_all(title," [A-Z]{3,5} ")[[1]][3]))
df

df%
add_row（title=“这是一个我没有得到我希望的输出的示例”，id=6）
df%
mutate（sec_code_1=unlist（str_extract_all（标题，“[A-Z]{3,5}”）[[1]][1]）
，sec_code_2=unlist（str_extract_all（标题，“[A-Z]{3,5}”）[[1]][2]）
，sec_code_3=unlist（str_extract_all（标题，“[A-Z]{3,5}”）[[1]][3]））
df

其中输出为：

标题身份证件第1节代码第2节代码第3节代码这是一个我没有得到我所希望的结果的例子 6. 不是

如果使用正则表达式运行代码，您将意识到输出中根本不包括

“THIS”

str_extract_all(df$title," [A-Z]{3,5} ")[[1]]
#[1] " DONT " " WAS "

这是因为您正在提取带有前导空格和后置空格的单词<代码>“THIS”没有滞后空格，因为它是句子的开头，因此不满足正则表达式模式。您可以改为使用单词边界（

\\b

）

如果在代码中使用上述模式，那么代码将正常工作

或者您也可以使用：

library(tidyverse)

df %>%
  mutate(code = str_extract_all(title,"\\b[A-Z]{3,5}\\b")) %>%
  unnest_wider(code) %>%
  rename_with(~paste0('sec_code_', seq_along(.)), starts_with('..'))

# title                                     id sec_code_1 sec_code_2 sec_code_3
#  <chr>                                  <dbl> <chr>      <chr>      <chr>     
#1 THIS is an EXAMPLE where I DONT get t…     6 THIS       DONT       WAS

库（tidyverse）
df%>%
突变（代码=str_extract_all（标题，\\b[A-Z]{3,5}\\b”））%>%
unnest_加宽（代码）%%>%
将_重命名为（~paste0（'sec_code_uuu'，seq_沿线（.）），以（'..'）开头）
#标题id秒代码1秒代码2秒代码3
#                                                      
#这是一个我不明白的例子

legend，这正是我想要的。我不知道单词的边界

library(tidyverse)

df %>%
  mutate(code = str_extract_all(title,"\\b[A-Z]{3,5}\\b")) %>%
  unnest_wider(code) %>%
  rename_with(~paste0('sec_code_', seq_along(.)), starts_with('..'))

# title                                     id sec_code_1 sec_code_2 sec_code_3
#  <chr>                                  <dbl> <chr>      <chr>      <chr>     
#1 THIS is an EXAMPLE where I DONT get t…     6 THIS       DONT       WAS