用stru-detect检测模式_R_Text - Fatal编程技术网

用stru-detect检测模式

r text

用stru-detect检测模式,r,text,R,Text,我有一些推特，我想检测其中的数字表情。对于这个任务，我想使用textclean包中的hash\u表情符号词典 hash_emoticons[1:5] x y 1: #-) partied all night 2: %) drunk 3: %-) drunk 4: ',:-l scepticism 5: ',:-| scepticism 如果将其与标准函数一起使

我有一些推特，我想检测其中的数字表情。对于这个任务，我想使用

textclean

包中的

hash\u表情符号

词典

hash_emoticons[1:5]
       x                 y
1:   #-) partied all night
2:    %)             drunk
3:   %-)             drunk
4: ',:-l        scepticism
5: ',:-|        scepticism

如果将其与标准函数一起使用，则会出现以下错误：

library(stringr)

str_detect(Tweets$text, hash_emoticons$x)


longer object length is not a multiple of shorter object lengthError in 
stri_detect_regex(string, pattern, opts_regex = opts(pattern)): 
Incorrectly nested parentheses in regexp pattern. (U_REGEX_MISMATCHED_PAREN)

知道如何解决这个问题吗？

这里有一种直接使用

stringi

包的方法。但是，您需要更仔细地解释/考虑一些边界因素

# Generate some data
xxx <- tibble(Text = c("asdasd", ":o)", "hej :o) :o) :-*"))

[1] 0 2 5

现在，如果您查看输入字符串，您将看到4个表情符号。元素

:o）

将匹配两个表情符号

:o

和

:o）

，这就是第二个元素为2的原因。相反，字符串

hej:o）：o:-*

将返回5，这是因为它匹配

：o

两次，

：o）

两次和

：-*

一次。

请包括（部分）

Tweets

通过

dput

在您的问题中发布，并提供相应的准确预期结果。我认为您不能同时在字符串向量和模式向量上调用str_detect。您需要对每个表情符号进行str_-detect（Tweets$text，hash_-emotics$x[i]），或者您可以将所有表情符号粘贴在一起，查看字符串是否与任何表情符号匹配，例如

all_-emots
[1] 0 2 5