R 如何从包含简短注释的列中提取字符串?
我正试图想出一个代码,允许我通过执行以下操作对包含调查简短注释的数据集中的列进行重新分类:R 如何从包含简短注释的列中提取字符串?,r,regex,string,R,Regex,String,我正试图想出一个代码,允许我通过执行以下操作对包含调查简短注释的数据集中的列进行重新分类: 如果注释为空/NA/空,则分配“无注释” 如果短评论中有“过时”一词(小写、大写或任意组合),则指定“过时” 任何其他评论都将保留原样 例如: 如果我有桌子 名称 评论 斜纹棉布 “这个网站对我来说似乎有点过时了” 德拉 “我不喜欢它” 内特 NA 乔希 “非常过时” 您可以在嵌套的ifelse()中使用grepl()和tolower()函数 给 NAME COMMENT 1 Jean
ifelse()
中使用grepl()
和tolower()
函数
给
NAME COMMENT
1 Jean OUTDATED
2 Dela I didnt like it
3 Nate NO_COMMENT
4 Josh OUTDATED
数据:
df <- data.frame(NAME=c("Jean","Dela","Nate","Josh"),
COMMENT=c("This website seems a bit outdated for me",
"I didnt like it",
NA,
"Very outdated"),stringsAsFactors=F)
dftidyverse管道语法
dplyr::case_when
有助于传递多个条件以改变列
stringr::str_detect
如果在传递的字符串中找到给定的模式,则给出TRUE或FALSE
tolower
消除了在任何情况下(大写、小写或混合)列中包含过时
单词的可能性
使用的数据
df <- structure(list(NAME = c("Jean", "Dela", "Nate", "Josh"), COMMENT = c("This website seems a bit OUTDATED for me",
"I didnt like it", NA, "Very Outdated")), class = "data.frame", row.names = c(NA,
-4L))
NAME COMMENT
1 Jean This website seems a bit OUTDATED for me
2 Dela I didnt like it
3 Nate <NA>
4 Josh Very Outdated
df您可以这样做,其中(?i)
确保匹配是大小写敏感的:
df$COMMENT <- ifelse(grepl("(?i)outdated",df$COMMENT), "OUTDATED",
ifelse(is.na(df$COMMENT), "NO COMMENT", df$COMMENT))
Upvote但是grep
和grepl
有一个ignore.case
参数(默认为FALSE
),不需要额外的函数调用tolower
@ruibradas-Hmm。。我不知道那个论点。谢谢
library(stringr)
library(dplyr)
df %>% mutate(COMMENT = case_when(str_detect(tolower(COMMENT), "outdated") ~ "OUTDATED",
is.na(COMMENT) | COMMENT == "" ~ "NO_COMMENTS",
TRUE ~ COMMENT))
NAME COMMENT
1 Jean OUTDATED
2 Dela I didnt like it
3 Nate NO_COMMENTS
4 Josh OUTDATED
df <- structure(list(NAME = c("Jean", "Dela", "Nate", "Josh"), COMMENT = c("This website seems a bit OUTDATED for me",
"I didnt like it", NA, "Very Outdated")), class = "data.frame", row.names = c(NA,
-4L))
NAME COMMENT
1 Jean This website seems a bit OUTDATED for me
2 Dela I didnt like it
3 Nate <NA>
4 Josh Very Outdated
df$COMMENT <- ifelse(grepl("(?i)outdated",df$COMMENT), "OUTDATED",
ifelse(is.na(df$COMMENT), "NO COMMENT", df$COMMENT))
df
NAME COMMENT
1 Jean OUTDATED
2 Dela I didnt like it
3 Nate NO COMMENT
4 Josh OUTDATED