带有R的dataframe列中的值错误

带有R的dataframe列中的值错误,r,R,我三天来就遇到了这个问题,我非常希望能找到能帮助我找到解决方案的人: 为了对文本进行情感分析,我将单词列表及其正负极性存储在数据框中: word positive.polarity negative.polarity 1 interesting 1 0 2 boring 0 1

我三天来就遇到了这个问题,我非常希望能找到能帮助我找到解决方案的人:

为了对文本进行情感分析,我将单词列表及其正负极性存储在数据框中:

 word         positive.polarity       negative.polarity 
1 interesting                 1                 0                          
2      boring                 0                 1    
然后,对于数据框中这些单词中的每个单词,我想知道它们的上下文(上下文是单词前面的一组3个单词)中是否有一个助词或否定词:

-booster_words <- c("more","enough", "a lot", "as", "so")
-negative_words <- c("not", "rien", "ni", "aucun", "nul", "jamais", "pas", "non plus", "sans")
我得到这个结果:

         word positive.polarity negative.polarity positive.ponderate.polarity
1 interesting                 1                 0                           5
2      boring                 0                 1                           4
但这并不正确, 正确的结果是:

 word positive.polarity negative.polarity positive.ponderate.polarity
1 interesting                 1                 0                  1
2      boring                 0                 1                  4
我不知道我得到了什么不正确的值。。 有什么想法可以帮我吗

多谢各位

编辑:

例如,如果我有此数据帧:

      word positive.polarity negative.polarity positive.ponderate.polarity   negative.ponderate.polarity
1 interesting                 1                 0                           1        1  
   2      boring                 0                 1                           4      2

结果应该是:
(1+4)-(1+2)

我发现了错误。在这种情况下,建议逐行调试,并打印初始变量、每个if语句的结果或处理if else语句时的指示符

这里你的首字母
subDF$positive.polarity
是一个长度为2的向量
c(1,0)
,它是情感中的字数

当i=1时,
context=“课程很有趣”
,没有助推器,也没有否定词--
subDF$positive.polarity
c(1,0)
subDF$positive.pounderate.polarity
NULL

当i=2时,
context=“was not some无聊”
,有一个助推器和一个否定词--
subDF$positive。极性
c(1,0)
,当您只想将4添加到与
无聊相对应的第二个元素时,您将向两个元素添加4“
,因此
subDF$positive.pounderate.polarity
c(5,4)
,这是返回的

这里的诀窍是
subDF$positive.polarity
subDF$positive.pounderate.polarity
的长度取决于句子中
情感词的数量。下面是更正的代码和调试。以下是修复方法:

A.初始化以使长度相等

 subDF$positive.ponderate.polarity <- subDF$positive.polarity
D.其他情况检查:

calcPolarity(sentiment_DF, "The course was interesting, but the professor was not so boring")
         word positive.polarity negative.polarity positive.ponderate.polarity
1 interesting                 1                 0                           1
2      boring                 0                 1                           4

calcPolarity(sentiment_DF, "The course was so interesting")
         word positive.polarity negative.polarity positive.ponderate.polarity
1 interesting                 1                 0                          10
编辑以更正注释中的极性结果:
极性的输出是
c(0,5)
因为原始代码是:
polarity[i]我不能得到与你相同的结果。上下文只提取3个字符,而不是3个单词(此处上下文给出“o”)。也许你的正则表达式错了?对我来说,我得到contet=[1]“这门课很有趣”“没那么无聊”,这是正确的,不知道?我不知道为什么它看起来这么复杂。。没有人能帮我..你的预期结果不应该是有趣的:10
和无聊的:0
<代码>有趣
从1开始,没有负面影响,因此1+9=10,
无聊
没有任何加强词开始。非常感谢你花了这么多时间回答我,但我仍然有一个问题:为什么我做了“calcPolarity(感悟),这门课很有趣,但教授没有那么无聊”)”结果我得到“05”。我只有一句话,所以我应该只有一个值?这是极性的输出正确吗?这是你的代码:
polarity[i]我编辑了我的帖子,请看例子,看看我是如何确定polarity的,我理解它的。请参阅我的注释C。在您的代码中,您从不指定negative.pounderate.polarity,此变量仅出现在末尾,在其他地方看不到。这就是为什么我问评论C。目前你只有(1 + 4)- 0=5。请考虑我的评论C,以及我在那里指出的情况。您不能让else语句保持原样,它需要另一个限定符,否则“如此无聊”将被指定为positive.pounderate.polarity点。我会把这个留给你,并考虑回答的问题。如果需要,您可以发送评论,并请接受答案,以便对此进行标记。
 subDF$positive.ponderate.polarity <- subDF$positive.polarity
  subDF$positive.ponderate.polarity[i] <- subDF$positive.polarity[i] + 4
  subDF$positive.ponderate.polarity[i] <- subDF$positive.polarity[i] + 9
calcPolarity(sentiment_DF, "The course was so boring")
    word positive.polarity negative.polarity positive.ponderate.polarity
2 boring                 0                 1                           9
calcPolarity(sentiment_DF, "The course was interesting, but the professor was not so boring")
         word positive.polarity negative.polarity positive.ponderate.polarity
1 interesting                 1                 0                           1
2      boring                 0                 1                           4

calcPolarity(sentiment_DF, "The course was so interesting")
         word positive.polarity negative.polarity positive.ponderate.polarity
1 interesting                 1                 0                          10
calcPolarity <- function(sentiment_DF,sentences){
  booster_words <- c("more","enough", "a lot", "as", "so")
  negative_words <- c("not", "rien", "ni", "aucun", "nul", "jamais", "pas", "non plus", "sans")
  reduce_words <- c("peu", "presque", "moins", "seulement")
  # pre-allocate the polarity result vector with size = number of sentences
  polarity <- rep.int(0,length(sentences))

  # loop per sentence
  for(i in 1:length(polarity)){
sentence <- sentences[i]

# separate each sentence in words using regular expression 
wordsOfASentence <- unlist(regmatches(sentence,gregexpr("[[:word:]]+",sentence,perl=TRUE)))

# get the rows of sentiment_DF corresponding to the words in the sentence using match
# N.B. if a word occurs twice, there will be two equal rows 
# (but I think it's correct since in this way you count its polarity twice)
subDF <- sentiment_DF[match(wordsOfASentence,sentiment_DF$word,nomatch = 0),]
print(subDF)

# Find (number) of matching word. 
wordOfInterest <- wordsOfASentence[which(wordsOfASentence %in% levels(sentiment_DF$word))]  # No multigrepl, so working with duplicates instead. eg interesting
regexOfInterest <- paste0("([^\\s]+\\s){0,3}", wordOfInterest, "(\\s[^\\s]+){0,3}")

# extract a context of 3 words before the word in the dataframe
context <-  stringr::str_extract(sentence, regexOfInterest)
names(context) <- wordOfInterest  # Helps in forloop

for(i in 1:length(context)){
  print(paste("i:", i))
  print(context)
  print("initial")
  print(subDF$positive.polarity)
  subDF$positive.ponderate.polarity <- subDF$positive.polarity
  print(subDF$positive.ponderate.polarity)

  if (any(unlist(strsplit(context[i], " ")) %in% booster_words)) {
    print(booster_words)
    length(booster_words)
    print("if level 1")
    print(subDF$positive.polarity)
    if (any(unlist(strsplit(context[i], " ")) %in% negative_words)) {
      subDF$positive.ponderate.polarity[i] <- subDF$positive.polarity[i] + 4
      print("if level 2A")
      print(subDF$positive.ponderate.polarity)
    } else {
      print("if level 2B")
      subDF$positive.ponderate.polarity[i] <- subDF$positive.polarity[i] + 9
      print(subDF$positive.ponderate.polarity)
    }

    print("level 2 result")
    print(subDF$positive.ponderate.polarity)
  }
  print("level 1 result")
  print(subDF$positive.ponderate.polarity)

    }
  }
    # Debug option
    print(subDF)

    # calculate the total polarity of the sentence and store in the vector
    polarity <- sum(subDF$positive.ponderate.polarity) - sum(subDF$negative.ponderate.polarity)

  return(polarity)
}

sentiment_DF <- data.frame(word=c('interesting','boring','pretty'),
                       positive.polarity=c(1,0,1),
                       negative.polarity=c(0,1,0))
calcPolarity(sentiment_DF, "The course was interesting, but the professor was not so boring")
calcPolarity(sentiment_DF, "The course was so interesting")
calcPolarity(sentiment_DF, "The course was so boring")