Regex 使用正则表达式匹配R中的阿拉伯语文本_Regex_R_Unicode_Arabic

Regex 使用正则表达式匹配R中的阿拉伯语文本

regex r unicode

Regex 使用正则表达式匹配R中的阿拉伯语文本,regex,r,unicode,arabic,Regex,R,Unicode,Arabic,我正在尝试匹配字符向量中包含特定阿拉伯语短语的元素到目前为止，我已经： #load list of Arabic phrases list.of.phrases <- read.table("arabic_phrases.txt") #look for the first phrase phrase1 <- arabic.text.vector[grepl(list.of.phrases[1],arabic.text.vector)] 不幸的是，这种方

我正在尝试匹配字符向量中包含特定阿拉伯语短语的元素

到目前为止，我已经：

   #load list of Arabic phrases 
   list.of.phrases <- read.table("arabic_phrases.txt")

   #look for the first phrase
   phrase1 <- arabic.text.vector[grepl(list.of.phrases[1],arabic.text.vector)]

不幸的是，这种方法或使用原始阿拉伯语文本似乎没有返回任何信息，我得到了以下信息：

   Error in `[[<-.data.frame`(`*tmp*`, qname, value = 1) : 
   replacement has 1 row, data has 0

我知道我可以使用：[U0627-U06FF]+匹配阿拉伯语单词，如：

   #look for all cells containing arabic
   arabic <-arabic.text.vector[grepl("[U0627-U06FF]+",arabic.text.vector)]

。。。到目前为止，我的方法是将阿拉伯语文本转换为Unicode点值，然后使用grep；然而，我在转换方面遇到了麻烦

我是否朝着正确的方向前进，或者是否有人有其他解决方案/方法

请提供一个可复制的示例。