R编程中的Hashtag提取函数_R_Function_If Statement_Hashtag

R编程中的Hashtag提取函数

r function if-statement

R编程中的Hashtag提取函数,r,function,if-statement,hashtag,R,Function,If Statement,Hashtag,我正在尝试在R中创建一个hashtag提取函数。该函数将从帖子中提取一个hashtags，如果有，否则将给出一个空白。我的功能是 hashtag_extract= function(text){ match = str_extract_all(text,"#\\S+") if (match) { return match }else{ ret

我正在尝试在R中创建一个hashtag提取函数。该函数将从帖子中提取一个hashtags，如果有，否则将给出一个空白。我的功能是

hashtag_extract= function(text){
              match = str_extract_all(text,"#\\S+")
              if (match) { 
                 return match
                 }else{
               return ''}}
String="#letsdoit #Tonewbeginnign world is on a new#route

但我的函数不起作用，显示了大量错误。例如，第一个错误是

Error: unexpected symbol in:
      "  if (match) { 
     return match"

所以我想把它作为

hashatag_extract(string)

答案应该是这样的

#letsdoit  ##Tonewbeginnign   #route

最后我将使用sapply在整个专栏中应用这个函数，这就是为什么If部分很重要。请忽略我对R的缩进，因为它对R不重要，但每个建议都会有帮助

Hashtag正则表达式并没有那么简单

我不确定您是否了解hashtags的普遍接受的“规则”

我不相信

str\u extract\u all（）

会返回您认为是的内容

只需使用

stringi

，其中

stringr

函数构建在

人们需要停止分析推特

这应该可以处理大多数（如果不是全部）情况：

get_tags <- function(x) {
  # via http://stackoverflow.com/a/5768660/1457051
  twitter_hashtag_regex <- "(^|[^&\\p{L}\\p{M}\\p{Nd}_\u200c\u200d\ua67e\u05be\u05f3\u05f4\u309b\u309c\u30a0\u30fb\u3003\u0f0b\u0f0c\u00b7])(#|\uFF03)(?!\uFE0F|\u20E3)([\\p{L}\\p{M}\\p{Nd}_\u200c\u200d\ua67e\u05be\u05f3\u05f4\u309b\u309c\u30a0\u30fb\u3003\u0f0b\u0f0c\u00b7]*[\\p{L}\\p{M}][\\p{L}\\p{M}\\p{Nd}_\u200c\u200d\ua67e\u05be\u05f3\u05f4\u309b\u309c\u30a0\u30fb\u3003\u0f0b\u0f0c\u00b7]*)"
  stringi::stri_match_all_regex(x, hashtag_regex) %>% 
    purrr::map(~.[,4]) %>% 
    purrr::flatten_chr()

}

tests <- c("#teste_teste      //underscore accepted",
           "#teste-teste      //Hyphen not accepted",
           "#leof_gfg.sdfsd   //dot not accepted",
           "#f34234@45#6fgh6  // @ not accepted",
           "#leo#leo2#asd     //followed hastag without space ",
           "#6663             // only number accepted",
           "_#asd_            // hashtag can't start or finish with underscore",
           "-#sdfsdf-         // hashtag can't start or finish with hyphen",
           ".#sdfsdf.         // hashtag can't start or finish with dot",
           "#leo_leo__leo__leo____leo // decline followed underline")


get_tags(tests)
##  [1] "teste_teste"              "teste"                   
##  [3] "leof_gfg"                 "f34234"                  
##  [5] "leo"                      NA                        
##  [7] NA                         "sdfsdf"                  
##  [9] "sdfsdf"                   "leo_leo__leo__leo____leo"

your_string <- "#letsdoit #Tonewbeginnign world is on a new#route"

get_tags(your_string)
## [1] "letsdoit"       "Tonewbeginnign"

获取标签%
purrr:：展平
}
测试@manu sharma我想你不需要申请，如果里面有其他人的话。让不匹配的行取“NA”值。应用函数后，将其更改为空白。
希望我的代码能帮助您：
   aaa <- readLines("C:\\MY_FOLDER\\NOI\\file2sample.txt")
 ttt <- function(x){

  r <- sapply(x, function(x) { matches <- str_match(x,"#\\w+\\s+")})
  r


  }

 y <-ttt(aaa)
 y[is.na(y)]=''

aaa谢谢大家的帮助，我不知怎么搞的成功了，我觉得这和沙利尼的答案差不多
1.更换消息上的所有NAs
message[is.na(message)]='abc'

2.提取Hashtags的函数
hashtag_extrac= function(text){
match = str_extract_all(text,"#\\S+")
if (match!= "") { 
match
} else {
'' }}

整列应用函数
问题到底是什么（与此相关）？给定的功能是否不工作，或者在所有情况下都不工作，或者是否缺少“功能”？请补充问题。谢谢。@Docendiscimus这很重要。编辑一个小的示例tweet将是helpful@docendodiscimus-它就在那里，请看“字符串”请读？return
：R调用它就像一个函数return（match）
，使用parens。非常感谢。虽然，str_extract_all对我来说真的很好，但是，请帮助我实现我的功能。你应该拥有这个答案所需的一切。如果不符合您的要求，我很乐意将其删除。map
和flatten\u chr
的出色工作流程。我认为.x
而不是占位符是必须的，比如map（~.x[，4]）。很高兴知道它对我很有用。为什么会有这样的if声明？它没有任何作用。。。如果不是空的，那么什么也不做。如果为空，则将其设为空。我很困惑你为什么不使用上面高质量的答案。非常感谢！但我会要求，保持冷静，即使是在脚本中，我们也有自己的案例和用途，有时我们无法用一句话来解释，当然，它们是更好的答案。所以你接受@Shalini的答案-我是这样理解的，还是我误读了？
hashtags= sapply(message, hashtag_extrac)