R Mutate更改整列，而不是逐行更改_R_Regex_Dplyr_Stringr

R Mutate更改整列，而不是逐行更改

r regex

R Mutate更改整列，而不是逐行更改,r,regex,dplyr,stringr,R,Regex,Dplyr,Stringr,在dataframe中，我希望根据另一列中出现的一组特定字符串（char vector）创建一个新列所以基本上，我想要这个： ID Phrases 1 some words 2 some words dec 3 some words nov may 要返回此文件： ID Phrases MonthsOccur 1 some words NA 2 some words dec dec 3 some words no

在dataframe中，我希望根据另一列中出现的一组特定字符串（char vector）创建一个新列

所以基本上，我想要这个：

ID  Phrases
1   some words
2   some words dec
3   some words nov may

要返回此文件：

ID  Phrases             MonthsOccur
1   some words          NA
2   some words dec      dec
3   some words nov may  may nov

我尝试了以下方法，但我不确定为什么它会给我这样的结果：

library(dplyr)

vMonths <- c("jan","feb","mar","apr","may","jun","jul","aug","sept","nov","dec")

a <- c(1,2,3)
b <- c('phrase number one', 'phrase dec','phrase nov')

df <- data.frame(a,b)
names(df) <- c("ID","Phrases")
df <- df %>% mutate(MonthsOccur = paste(vMonths[str_detect(Phrases, vMonths)],collapse=" "))

一个选项是应用

stru-detect

rowwise

library(dplyr)
library(stringr)

df %>%
  rowwise() %>%
  mutate(MonthsOccur = paste0(vMonths[str_detect(Phrases, vMonths)], 
                       collapse = " "))

但是，

rowwise

将来可能会继续，也可能不会继续，因此更好的方法是使用

map

操作

df %>%
  mutate(MonthsOccur = purrr::map_chr(Phrases,  
                      ~paste0(vMonths[str_detect(.x, vMonths)], collapse = " ")))

#  ID           Phrases MonthsOccur
#1  1 phrase number one            
#2  2        phrase dec         dec
#3  3    phrase nov may     may nov

基本R选项将与

regmatches

和

gregexpr

sapply(regmatches(df$Phrases, gregexpr(paste0(vMonths, collapse = "|"),
        df$Phrases)), paste0, collapse = " ")

数据

df <- structure(list(ID = c(1, 2, 3), Phrases = structure(c(3L, 1L, 
2L), .Label = c("phrase dec", "phrase nov may", "phrase number one"
), class = "factor")), class = "data.frame", row.names = c(NA, -3L))

df另一个涉及dplyr
和stringr
的选项可能是：
df %>%
 mutate(MonthsOccur = str_extract_all(Phrases, paste(tolower(month.abb), collapse = "|")))

  ID            Phrases MonthsOccur
1  1         some words            
2  2     some words dec         dec
3  3 some words nov may    nov, may

这里的输出不是一个字符向量，而是一个列表
如果您确实在查找字符向量，则添加purrr
：
df %>%
 mutate(MonthsOccur = map_chr(str_extract_all(Phrases, paste(tolower(month.abb), collapse = "|")), 
                              paste, collapse = ", "))

谢谢，工作得很有魅力！对于为什么这似乎与mutate
中使用的其他函数（它们似乎是逐行工作的）的工作方式不同，有什么见解吗？似乎无法理解它为什么这样做。@BroQ好吧，那是因为str\u detect
在string
和pattern
上都是矢量化的。因此，短语[1]
与vMonths[1]
进行比较，短语[2]
与vMonths[2]
进行比较。所以你没有得到你期望的所有匹配。通过指定rowwise
或使用map
我们将短语[1]
与所有vMonths
进行比较，然后将短语[2]
与所有vMonths
进行比较。旁注：tolower（month.abb）
df %>%
 mutate(MonthsOccur = map_chr(str_extract_all(Phrases, paste(tolower(month.abb), collapse = "|")), 
                              paste, collapse = ", "))