Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/regex/17.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
str_detect工作时,使用%in%的字符串搜索(包含特殊字符)无效_R_Regex_Stringr - Fatal编程技术网

str_detect工作时,使用%in%的字符串搜索(包含特殊字符)无效

str_detect工作时,使用%in%的字符串搜索(包含特殊字符)无效,r,regex,stringr,R,Regex,Stringr,我在做情绪分析,我想让所有的大字都以否定词开头,比如“没有”。在%中使用%可以很好地处理简单字符串,但对于那些包含特殊字符(如撇号)的字符串,它不适用于我的文本 文本中的双字符: > head(sup4_bigrams_count,3) # A tibble: 3 x 3 word1 word2 n <chr> <chr> <int> 1 parent’s day 8 2 mother’s d

我在做情绪分析,我想让所有的大字都以否定词开头,比如“没有”。在%中使用
%可以很好地处理简单字符串,但对于那些包含特殊字符(如撇号)的字符串,它不适用于我的文本

文本中的双字符:

> head(sup4_bigrams_count,3)
# A tibble: 3 x 3
  word1      word2      n
  <chr>      <chr>  <int>
1 parent’s   day        8
2 mother’s   day        7
3 bachelor’s degree     6

> sup4_bigrams_count$word1 %>% unique  
 ......
 [61] "daily"          "day"            "de"             "define"        
 [65] "depth"          "developed"      "didn’t"         "differentiated"
 [69] "difunctioning"  "diploma"        "doesn’t"        "don’t" 
但是使用%in%根本不起作用

negate_words <- c("didn’t","doesn’t","don’t")

> sup4_bigrams_count %>% filter(word1 %in% negate_words)
# A tibble: 0 x 3
# ... with 3 variables: word1 <chr>, word2 <chr>, n <int>
negate\u words sup4\u bigrams\u count%>%过滤器(word1%在%negate\u words中)
#一个tibble:0 x 3
# ... 有3个变量:word1、word2、n
但如果我用这些词来创建另一个数据帧,%in%就可以了

a <- data_frame(word=c("didn’t","doesn’t","don’t"),ind=1:3)
n <- c("didn’t","doesn’t")

> a %>% filter(word %in% n)
# A tibble: 2 x 2
  word      ind
  <chr>   <int>
1 didn’t      1
2 doesn’t     2
a%过滤器(单词%n中的%n)
#一个tibble:2x2
单词索引
我没有
2不等于2
我所能做的只是通过
str\u detect
三次过滤,然后
rbind
将它们一起过滤,但是如果我有一长串否定词的话,那就麻烦多了,也不容易了。希望有人能帮上忙。

你可以构造一个“OR”正则表达式,一次搜索所有否定词

library(stringr)

negate_words <- c("didn’t","doesn’t","don’t")

strings <-  c("daily",  "day", "de", "define",
              "depth", "developed", "didn’t", "differentiated",
              "difunctioning", "diploma", "doesn’t", "don’t")

str_detect(strings, "didn’t")
# FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE

pattern <- paste0("(", paste(negate_words, collapse="|"), ")")
pattern
# "(didn’t|doesn’t|don’t)"

str_detect(strings, pattern)
# FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE  TRUE
库(stringr)
否定词
library(stringr)

negate_words <- c("didn’t","doesn’t","don’t")

strings <-  c("daily",  "day", "de", "define",
              "depth", "developed", "didn’t", "differentiated",
              "difunctioning", "diploma", "doesn’t", "don’t")

str_detect(strings, "didn’t")
# FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE

pattern <- paste0("(", paste(negate_words, collapse="|"), ")")
pattern
# "(didn’t|doesn’t|don’t)"

str_detect(strings, pattern)
# FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE  TRUE