R 如何从包含简短注释的列中提取字符串?

R 如何从包含简短注释的列中提取字符串?,r,regex,string,R,Regex,String,我正试图想出一个代码,允许我通过执行以下操作对包含调查简短注释的数据集中的列进行重新分类: 如果注释为空/NA/空,则分配“无注释” 如果短评论中有“过时”一词(小写、大写或任意组合),则指定“过时” 任何其他评论都将保留原样 例如: 如果我有桌子 名称 评论 斜纹棉布 “这个网站对我来说似乎有点过时了” 德拉 “我不喜欢它” 内特 NA 乔希 “非常过时” 您可以在嵌套的ifelse()中使用grepl()和tolower()函数 给 NAME COMMENT 1 Jean

我正试图想出一个代码,允许我通过执行以下操作对包含调查简短注释的数据集中的列进行重新分类:

  • 如果注释为空/NA/空,则分配“无注释”
  • 如果短评论中有“过时”一词(小写、大写或任意组合),则指定“过时”
  • 任何其他评论都将保留原样
  • 例如: 如果我有桌子

    名称 评论 斜纹棉布 “这个网站对我来说似乎有点过时了” 德拉 “我不喜欢它” 内特 NA 乔希 “非常过时” 您可以在嵌套的
    ifelse()
    中使用
    grepl()
    tolower()
    函数

      NAME         COMMENT
    1 Jean        OUTDATED
    2 Dela I didnt like it
    3 Nate      NO_COMMENT
    4 Josh        OUTDATED
    
    数据:

    df <- data.frame(NAME=c("Jean","Dela","Nate","Josh"),
        COMMENT=c("This website seems a bit outdated for me",
            "I didnt like it",
            NA,
            "Very outdated"),stringsAsFactors=F)
    
    dftidyverse管道语法

    • dplyr::case_when
      有助于传递多个条件以改变列
    • stringr::str_detect
      如果在传递的字符串中找到给定的模式,则给出TRUE或FALSE
    • tolower
      消除了在任何情况下(大写、小写或混合)列中包含
      过时
      单词的可能性
    使用的数据

    df <- structure(list(NAME = c("Jean", "Dela", "Nate", "Josh"), COMMENT = c("This website seems a bit OUTDATED for me", 
    "I didnt like it", NA, "Very Outdated")), class = "data.frame", row.names = c(NA, 
    -4L))
    
      NAME                                  COMMENT
    1 Jean This website seems a bit OUTDATED for me
    2 Dela                          I didnt like it
    3 Nate                                     <NA>
    4 Josh                            Very Outdated
    

    df您可以这样做,其中
    (?i)
    确保匹配是大小写敏感的:

    df$COMMENT <- ifelse(grepl("(?i)outdated",df$COMMENT), "OUTDATED",
                         ifelse(is.na(df$COMMENT), "NO COMMENT", df$COMMENT))
    

    Upvote但是
    grep
    grepl
    有一个
    ignore.case
    参数(默认为
    FALSE
    ),不需要额外的函数调用
    tolower
    @ruibradas-Hmm。。我不知道那个论点。谢谢
    library(stringr)
    library(dplyr)
    
    df %>% mutate(COMMENT = case_when(str_detect(tolower(COMMENT), "outdated") ~ "OUTDATED",
                                      is.na(COMMENT) | COMMENT == "" ~ "NO_COMMENTS",
                                      TRUE ~ COMMENT))
    
      NAME         COMMENT
    1 Jean        OUTDATED
    2 Dela I didnt like it
    3 Nate     NO_COMMENTS
    4 Josh        OUTDATED
    
    
    df <- structure(list(NAME = c("Jean", "Dela", "Nate", "Josh"), COMMENT = c("This website seems a bit OUTDATED for me", 
    "I didnt like it", NA, "Very Outdated")), class = "data.frame", row.names = c(NA, 
    -4L))
    
      NAME                                  COMMENT
    1 Jean This website seems a bit OUTDATED for me
    2 Dela                          I didnt like it
    3 Nate                                     <NA>
    4 Josh                            Very Outdated
    
    df$COMMENT <- ifelse(grepl("(?i)outdated",df$COMMENT), "OUTDATED",
                         ifelse(is.na(df$COMMENT), "NO COMMENT", df$COMMENT))
    
    df
      NAME         COMMENT
    1 Jean        OUTDATED
    2 Dela I didnt like it
    3 Nate      NO COMMENT
    4 Josh        OUTDATED