R 词典中短语的情感分析_R_Twitter_Machine Learning_Sentiment Analysis

R 词典中短语的情感分析

r twitter machine-learning

R 词典中短语的情感分析,r,twitter,machine-learning,sentiment-analysis,R,Twitter,Machine Learning,Sentiment Analysis,我正在对我收到的一组推特进行情绪分析，现在我想知道如何在积极和消极词典中添加短语我已经在文件中读过我想测试的短语，但是当运行情绪分析时，它并没有给我一个结果当阅读情感算法时，我可以看到它将单词与词典进行匹配，但是有没有一种方法可以扫描单词和短语代码如下： score.sentiment = function(sentences, pos.words, neg.words, .progress='none') { require(plyr) require(stringr

我正在对我收到的一组推特进行情绪分析，现在我想知道如何在积极和消极词典中添加短语

我已经在文件中读过我想测试的短语，但是当运行情绪分析时，它并没有给我一个结果

当阅读情感算法时，我可以看到它将单词与词典进行匹配，但是有没有一种方法可以扫描单词和短语

代码如下：

    score.sentiment = function(sentences, pos.words, neg.words, .progress='none')
{
  require(plyr)  
  require(stringr)  
  # we got a vector of sentences. plyr will handle a list  
  # or a vector as an "l" for us  
  # we want a simple array ("a") of scores back, so we use  
  # "l" + "a" + "ply" = "laply":  
  scores = laply(sentences, function(sentence, pos.words, neg.words) {
    # clean up sentences with R's regex-driven global substitute, gsub():
    sentence = gsub('[[:punct:]]', '', sentence)
    sentence = gsub('[[:cntrl:]]', '', sentence)
    sentence = gsub('\\d+', '', sentence)    
    # and convert to lower case:    
    sentence = tolower(sentence)    
    # split into words. str_split is in the stringr package    
    word.list = str_split(sentence, '\\s+')    
    # sometimes a list() is one level of hierarchy too much    
    words = unlist(word.list)    
    # compare our words to the dictionaries of positive & negative terms
    pos.matches = match(words, pos)
    neg.matches = match(words, neg)   
    # match() returns the position of the matched term or NA    
    # we just want a TRUE/FALSE:    
    pos.matches = !is.na(pos.matches)   
    neg.matches = !is.na(neg.matches)   
    # and conveniently enough, TRUE/FALSE will be treated as 1/0 by sum():
    score = sum(pos.matches) - sum(neg.matches)    
    return(score)    
  }, pos.words, neg.words, .progress=.progress )  
  scores.df = data.frame(score=scores, text=sentences)  
  return(scores.df)  
}
analysis=score.sentiment(Tweets, pos, neg)
table(analysis$score)

这是我得到的结果：

0
20

然而，我追求的是这个函数提供的标准表 e、 g

比如说

有人对如何在短语上运行这个有什么想法吗？

注意：TWEETS文件是一个句子文件。

功能

评分。情绪似乎起作用。如果我尝试一个非常简单的设置
Tweets = c("this is good", "how bad it is")
neg = c("bad")
pos = c("good")
analysis=score.sentiment(Tweets, pos, neg)
table(analysis$score)

我得到了预期的结果
> table(analysis$score)

-1  1 
 1  1 

你是如何将这20条推文反馈给该方法的？从你发布的结果来看，020
，我想说你的问题是你的20条推文没有任何正面或负面的词，尽管你当然会注意到。也许如果你在推特列表上发布更多的细节，你的积极和消极的话语会更容易帮助你
不管怎样，你的功能似乎工作得很好
希望能有帮助
通过评论在澄清后编辑：
实际上，为了解决问题，你需要将你的句子标记为n-grams
，其中n
对应于你在正负n-grams
列表中使用的最大字数。您可以在中查看如何执行此操作。为了完整性，并且因为我自己已经测试过了，这里有一个例子来说明您可以做些什么。我将其简化为bigrams
（n=2），并使用以下输入：
Tweets = c("rewarding hard work with raising taxes and VAT. #LabourManifesto", 
              "Ed Miliband is offering 'wrong choice' of 'more cuts' in #LabourManifesto")
pos = c("rewarding hard work")
neg = c("wrong choice")

您可以像这样创建一个二元标记器
library(tm)
library(RWeka)
BigramTokenizer <- function(x) NGramTokenizer(x, Weka_control(min=2,max=2))

然后在你的方法中，你只需替换这一行
word.list = str_split(sentence, '\\s+')

由此
word.list = BigramTokenizer(sentence)

当然，如果您将word.list
更改为ngram.list
或类似的内容会更好
结果正如所料
> table(analysis$score)

-1  0 
 1  1

只要确定你的n-gram
大小并将其添加到Weka_控制中
，你就会没事了
希望能有帮助。不知道，但我想你可能是指Lappy而不是laply？@dd3它是plyr软件包中的Lappy，而不是base中的Lappy。我是R的初学者。你的“.progress”在这里做什么？好像你没有在你的功能中使用它？@Irnczig。我设法获得了分数。我想用我的积极和消极词典工作，但如果我想在你的例子中加上“是好的”和“多坏的”，而不仅仅是“坏的”和“好的”，你知道如何工作吗？例如，通过以下推特：[[[“通过提高税收和增值税来奖励辛勤工作。#劳动宣言”，“埃德·米利班德在#劳动宣言"中提出了‘更多削减’的‘错误选择’。]]在字典中，我想“奖励辛勤工作”积极的一面，“提高税收”，“更多削减”“消极的。我运行情绪，它将这些短语分解。好的，明白了。让我看看。刚刚更新了我的答案。一句小小的评论：因为n-grams的要点非常重要，所以最好编辑你的问题，使其清晰明了。我认为如果你只是发布你在第二条评论中添加的例子就足够了。谢谢我没有考虑n-gram。我将通过你的帮助工作，非常感谢！
word.list = BigramTokenizer(sentence)

> table(analysis$score)

-1  0 
 1  1