定制情绪分析：基于单词及其各自分数的评分文档-R NLP_R_Nlp_Tidyverse_Tidyr_Sentiment Analysis

定制情绪分析：基于单词及其各自分数的评分文档-R NLP

r nlp

定制情绪分析：基于单词及其各自分数的评分文档-R NLP,r,nlp,tidyverse,tidyr,sentiment-analysis,R,Nlp,Tidyverse,Tidyr,Sentiment Analysis,我试图根据文档中出现的单词对文档进行评分。对于语料库中出现的每个单词，我有两种类型的分数。它本质上类似于情绪分析，但有一个定制的字典和相应的分数。谢谢这可以通过 tidytext:：unest_token将文档转换为单个单词 dplyr:：left_join单词得分 dplyr:：总结计算每个文档的分数库（dplyr）图书馆（tidytext） #需在两个维度上评分的文件：评分1和评分2 文件% 分组依据（文本ID，文本）%>% 总结（跨越（以“分数”开头）、总和、na.rm=TRUE））%

我试图根据文档中出现的单词对文档进行评分。对于语料库中出现的每个单词，我有两种类型的分数。它本质上类似于情绪分析，但有一个定制的字典和相应的分数。谢谢这可以通过

tidytext:：unest_token

将文档转换为单个单词

dplyr:：left_join

单词得分

dplyr:：总结

计算每个文档的分数

库（dplyr）
图书馆（tidytext）
#需在两个维度上评分的文件：评分1和评分2
文件%
分组依据（文本ID，文本）%>%
总结（跨越（以“分数”开头）、总和、na.rm=TRUE））%>%
重命名（scored1=scores1，scored2=scores2）%>%
解组（）
#>`summary（）`按'textID'重新分组输出（用'.groups'参数覆盖）
#>#tibble:3 x 4
#>textID text scored1 scored2
#>                                                           
#>1“大家好，很高兴看到大家在一起”28 91
#>2“DHL邮递员今年面临困难”77 140
#>3“离婚者在这个国家找工作有困难”128 200

不要这样做。对于情绪分析来说，单靠语言是绝对没有用的。这不仅仅是语言。我使用的分数实际上是指单词嵌入。我简化了问题。谢谢@stefan！我得到以下错误：“必须使用有效的下标向量重命名列。x下标的类型“data.framerename…。“

的数据类型错误。frame”表示联接和汇总后没有列
scores1`，这意味着R认为scores1
指的是数据框scores1。不确定是什么问题。但我建议干脆把最后一行删掉。
#documents to be scored on 2 dimensions: score1 and score2
documents <- data.frame(textID = 1:3, text = c("Hello everybody, pleased to see everyone together", " DHL postmen have faced difficulties this year", "divorcees have trouble finding jobs in this country"), scored1 = rep(NA,3), scored2=rep(NA,3) )

#first scoring dimension
scores1 <- as.matrix(data.frame(words = c("hello", "everybody", "pleased", "to" ,"see", "everyone","together", "DHL", "postmen", "have", "faced","difficulties","this", "year", "divorcees", "trouble", "finding", "jobs", "in", "country" ), scores = 1:20))

#second scoring dimension
scores2 <- as.matrix(data.frame(words = c("hello", "everybody", "pleased", "to" ,"see", "everyone","together", "DHL", "postmen", "have", "faced","difficulties","this", "year", "divorcees", "trouble", "finding", "jobs", "in", "country" ), scores = 10:29))

#the result should look like this, where each text receives a score that represents the sum of #individual word scores: 

#textID                                                  text      scored1 scored2
#1      1   Hello everybody, pleased to see everyone together       28        91
#2      2       DHL postmen have faced difficulties this year       77        140
#3      3 divorcees have trouble finding jobs in this country       128       200