R Mutate更改整列,而不是逐行更改
在dataframe中,我希望根据另一列中出现的一组特定字符串(char vector)创建一个新列 所以基本上,我想要这个:R Mutate更改整列,而不是逐行更改,r,regex,dplyr,stringr,R,Regex,Dplyr,Stringr,在dataframe中,我希望根据另一列中出现的一组特定字符串(char vector)创建一个新列 所以基本上,我想要这个: ID Phrases 1 some words 2 some words dec 3 some words nov may 要返回此文件: ID Phrases MonthsOccur 1 some words NA 2 some words dec dec 3 some words no
ID Phrases
1 some words
2 some words dec
3 some words nov may
要返回此文件:
ID Phrases MonthsOccur
1 some words NA
2 some words dec dec
3 some words nov may may nov
我尝试了以下方法,但我不确定为什么它会给我这样的结果:
library(dplyr)
vMonths <- c("jan","feb","mar","apr","may","jun","jul","aug","sept","nov","dec")
a <- c(1,2,3)
b <- c('phrase number one', 'phrase dec','phrase nov')
df <- data.frame(a,b)
names(df) <- c("ID","Phrases")
df <- df %>% mutate(MonthsOccur = paste(vMonths[str_detect(Phrases, vMonths)],collapse=" "))
一个选项是应用
stru-detect
rowwise
library(dplyr)
library(stringr)
df %>%
rowwise() %>%
mutate(MonthsOccur = paste0(vMonths[str_detect(Phrases, vMonths)],
collapse = " "))
但是,rowwise
将来可能会继续,也可能不会继续,因此更好的方法是使用map
操作
df %>%
mutate(MonthsOccur = purrr::map_chr(Phrases,
~paste0(vMonths[str_detect(.x, vMonths)], collapse = " ")))
# ID Phrases MonthsOccur
#1 1 phrase number one
#2 2 phrase dec dec
#3 3 phrase nov may may nov
基本R选项将与
regmatches
和gregexpr
sapply(regmatches(df$Phrases, gregexpr(paste0(vMonths, collapse = "|"),
df$Phrases)), paste0, collapse = " ")
数据
df <- structure(list(ID = c(1, 2, 3), Phrases = structure(c(3L, 1L,
2L), .Label = c("phrase dec", "phrase nov may", "phrase number one"
), class = "factor")), class = "data.frame", row.names = c(NA, -3L))
df另一个涉及dplyr
和stringr
的选项可能是:
df %>%
mutate(MonthsOccur = str_extract_all(Phrases, paste(tolower(month.abb), collapse = "|")))
ID Phrases MonthsOccur
1 1 some words
2 2 some words dec dec
3 3 some words nov may nov, may
这里的输出不是一个字符向量,而是一个列表
如果您确实在查找字符向量,则添加purrr
:
df %>%
mutate(MonthsOccur = map_chr(str_extract_all(Phrases, paste(tolower(month.abb), collapse = "|")),
paste, collapse = ", "))
谢谢,工作得很有魅力!对于为什么这似乎与mutate
中使用的其他函数(它们似乎是逐行工作的)的工作方式不同,有什么见解吗?似乎无法理解它为什么这样做。@BroQ好吧,那是因为str\u detect
在string
和pattern
上都是矢量化的。因此,短语[1]
与vMonths[1]
进行比较,短语[2]
与vMonths[2]
进行比较。所以你没有得到你期望的所有匹配。通过指定rowwise
或使用map
我们将短语[1]
与所有vMonths
进行比较,然后将短语[2]
与所有vMonths
进行比较。旁注:tolower(month.abb)
df %>%
mutate(MonthsOccur = map_chr(str_extract_all(Phrases, paste(tolower(month.abb), collapse = "|")),
paste, collapse = ", "))