Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/80.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
定制情绪分析:基于单词及其各自分数的评分文档-R NLP_R_Nlp_Tidyverse_Tidyr_Sentiment Analysis - Fatal编程技术网

定制情绪分析:基于单词及其各自分数的评分文档-R NLP

定制情绪分析:基于单词及其各自分数的评分文档-R NLP,r,nlp,tidyverse,tidyr,sentiment-analysis,R,Nlp,Tidyverse,Tidyr,Sentiment Analysis,我试图根据文档中出现的单词对文档进行评分。对于语料库中出现的每个单词,我有两种类型的分数。它本质上类似于情绪分析,但有一个定制的字典和相应的分数。谢谢这可以通过 tidytext::unest_token将文档转换为单个单词 dplyr::left_join单词得分 dplyr::总结计算每个文档的分数 库(dplyr) 图书馆(tidytext) #需在两个维度上评分的文件:评分1和评分2 文件% 分组依据(文本ID,文本)%>% 总结(跨越(以“分数”开头)、总和、na.rm=TRUE))%

我试图根据文档中出现的单词对文档进行评分。对于语料库中出现的每个单词,我有两种类型的分数。它本质上类似于情绪分析,但有一个定制的字典和相应的分数。谢谢这可以通过

  • tidytext::unest_token
    将文档转换为单个单词
  • dplyr::left_join
    单词得分
  • dplyr::总结
    计算每个文档的分数
  • 库(dplyr)
    图书馆(tidytext)
    #需在两个维度上评分的文件:评分1和评分2
    文件%
    分组依据(文本ID,文本)%>%
    总结(跨越(以“分数”开头)、总和、na.rm=TRUE))%>%
    重命名(scored1=scores1,scored2=scores2)%>%
    解组()
    #>`summary()`按'textID'重新分组输出(用'.groups'参数覆盖)
    #>#tibble:3 x 4
    #>textID text scored1 scored2
    #>                                                           
    #>1“大家好,很高兴看到大家在一起”28 91
    #>2“DHL邮递员今年面临困难”77 140
    #>3“离婚者在这个国家找工作有困难”128 200
    
    不要这样做。对于情绪分析来说,单靠语言是绝对没有用的。这不仅仅是语言。我使用的分数实际上是指单词嵌入。我简化了问题。谢谢@stefan!我得到以下错误:“必须使用有效的下标向量重命名列。x下标的类型“data.framerename…。“
    的数据类型错误。frame
    ”表示联接和汇总后没有列
    scores1`,这意味着R认为
    scores1
    指的是数据框
    scores1
    。不确定是什么问题。但我建议干脆把最后一行删掉。
    #documents to be scored on 2 dimensions: score1 and score2
    documents <- data.frame(textID = 1:3, text = c("Hello everybody, pleased to see everyone together", " DHL postmen have faced difficulties this year", "divorcees have trouble finding jobs in this country"), scored1 = rep(NA,3), scored2=rep(NA,3) )
    
    #first scoring dimension
    scores1 <- as.matrix(data.frame(words = c("hello", "everybody", "pleased", "to" ,"see", "everyone","together", "DHL", "postmen", "have", "faced","difficulties","this", "year", "divorcees", "trouble", "finding", "jobs", "in", "country" ), scores = 1:20))
    
    #second scoring dimension
    scores2 <- as.matrix(data.frame(words = c("hello", "everybody", "pleased", "to" ,"see", "everyone","together", "DHL", "postmen", "have", "faced","difficulties","this", "year", "divorcees", "trouble", "finding", "jobs", "in", "country" ), scores = 10:29))
    
    #the result should look like this, where each text receives a score that represents the sum of #individual word scores: 
    
    #textID                                                  text      scored1 scored2
    #1      1   Hello everybody, pleased to see everyone together       28        91
    #2      2       DHL postmen have faced difficulties this year       77        140
    #3      3 divorcees have trouble finding jobs in this country       128       200