定制情绪分析:基于单词及其各自分数的评分文档-R NLP
我试图根据文档中出现的单词对文档进行评分。对于语料库中出现的每个单词,我有两种类型的分数。它本质上类似于情绪分析,但有一个定制的字典和相应的分数。谢谢这可以通过定制情绪分析:基于单词及其各自分数的评分文档-R NLP,r,nlp,tidyverse,tidyr,sentiment-analysis,R,Nlp,Tidyverse,Tidyr,Sentiment Analysis,我试图根据文档中出现的单词对文档进行评分。对于语料库中出现的每个单词,我有两种类型的分数。它本质上类似于情绪分析,但有一个定制的字典和相应的分数。谢谢这可以通过 tidytext::unest_token将文档转换为单个单词 dplyr::left_join单词得分 dplyr::总结计算每个文档的分数 库(dplyr) 图书馆(tidytext) #需在两个维度上评分的文件:评分1和评分2 文件% 分组依据(文本ID,文本)%>% 总结(跨越(以“分数”开头)、总和、na.rm=TRUE))%
tidytext::unest_token
将文档转换为单个单词dplyr::left_join
单词得分dplyr::总结
计算每个文档的分数库(dplyr)
图书馆(tidytext)
#需在两个维度上评分的文件:评分1和评分2
文件%
分组依据(文本ID,文本)%>%
总结(跨越(以“分数”开头)、总和、na.rm=TRUE))%>%
重命名(scored1=scores1,scored2=scores2)%>%
解组()
#>`summary()`按'textID'重新分组输出(用'.groups'参数覆盖)
#>#tibble:3 x 4
#>textID text scored1 scored2
#>
#>1“大家好,很高兴看到大家在一起”28 91
#>2“DHL邮递员今年面临困难”77 140
#>3“离婚者在这个国家找工作有困难”128 200
不要这样做。对于情绪分析来说,单靠语言是绝对没有用的。这不仅仅是语言。我使用的分数实际上是指单词嵌入。我简化了问题。谢谢@stefan!我得到以下错误:“必须使用有效的下标向量重命名列。x下标的类型“data.framerename…
。“的数据类型错误。frame”表示联接和汇总后没有列
scores1`,这意味着R认为scores1
指的是数据框scores1
。不确定是什么问题。但我建议干脆把最后一行删掉。
#documents to be scored on 2 dimensions: score1 and score2
documents <- data.frame(textID = 1:3, text = c("Hello everybody, pleased to see everyone together", " DHL postmen have faced difficulties this year", "divorcees have trouble finding jobs in this country"), scored1 = rep(NA,3), scored2=rep(NA,3) )
#first scoring dimension
scores1 <- as.matrix(data.frame(words = c("hello", "everybody", "pleased", "to" ,"see", "everyone","together", "DHL", "postmen", "have", "faced","difficulties","this", "year", "divorcees", "trouble", "finding", "jobs", "in", "country" ), scores = 1:20))
#second scoring dimension
scores2 <- as.matrix(data.frame(words = c("hello", "everybody", "pleased", "to" ,"see", "everyone","together", "DHL", "postmen", "have", "faced","difficulties","this", "year", "divorcees", "trouble", "finding", "jobs", "in", "country" ), scores = 10:29))
#the result should look like this, where each text receives a score that represents the sum of #individual word scores:
#textID text scored1 scored2
#1 1 Hello everybody, pleased to see everyone together 28 91
#2 2 DHL postmen have faced difficulties this year 77 140
#3 3 divorcees have trouble finding jobs in this country 128 200