带有R的dataframe列中的值错误_R

带有R的dataframe列中的值错误

带有R的dataframe列中的值错误,r,R,我三天来就遇到了这个问题，我非常希望能找到能帮助我找到解决方案的人：为了对文本进行情感分析，我将单词列表及其正负极性存储在数据框中： word positive.polarity negative.polarity 1 interesting 1 0 2 boring 0 1

我三天来就遇到了这个问题，我非常希望能找到能帮助我找到解决方案的人：

为了对文本进行情感分析，我将单词列表及其正负极性存储在数据框中：

 word         positive.polarity       negative.polarity 
1 interesting                 1                 0                          
2      boring                 0                 1

然后，对于数据框中这些单词中的每个单词，我想知道它们的上下文（上下文是单词前面的一组3个单词）中是否有一个助词或否定词：

-booster_words <- c("more","enough", "a lot", "as", "so")
-negative_words <- c("not", "rien", "ni", "aucun", "nul", "jamais", "pas", "non plus", "sans")

我得到这个结果：

         word positive.polarity negative.polarity positive.ponderate.polarity
1 interesting                 1                 0                           5
2      boring                 0                 1                           4

但这并不正确，正确的结果是：

 word positive.polarity negative.polarity positive.ponderate.polarity
1 interesting                 1                 0                  1
2      boring                 0                 1                  4

我不知道我得到了什么不正确的值。。有什么想法可以帮我吗

多谢各位

编辑：

例如，如果我有此数据帧：

      word positive.polarity negative.polarity positive.ponderate.polarity   negative.ponderate.polarity
1 interesting                 1                 0                           1        1  
   2      boring                 0                 1                           4      2

结果应该是：

（1+4）-（1+2）

我发现了错误。在这种情况下，建议逐行调试，并打印初始变量、每个if语句的结果或处理if else语句时的指示符

这里你的首字母

subDF$positive.polarity

是一个长度为2的向量

c（1,0）

，它是情感中的字数

当i=1时，

context=“课程很有趣”

，没有助推器，也没有否定词--

subDF$positive.polarity

是

c（1,0）

和

subDF$positive.pounderate.polarity

是

NULL

当i=2时，

context=“was not some无聊”

，有一个助推器和一个否定词--

subDF$positive。极性是c（1,0）
，当您只想将4添加到与无聊相对应的第二个元素时，您将向两个元素添加4“
，因此subDF$positive.pounderate.polarity
是c（5,4）
，这是返回的
这里的诀窍是subDF$positive.polarity
和subDF$positive.pounderate.polarity
的长度取决于句子中情感词的数量。下面是更正的代码和调试。以下是修复方法：
A.初始化以使长度相等
 subDF$positive.ponderate.polarity <- subDF$positive.polarity

D.其他情况检查：
calcPolarity(sentiment_DF, "The course was interesting, but the professor was not so boring")
         word positive.polarity negative.polarity positive.ponderate.polarity
1 interesting                 1                 0                           1
2      boring                 0                 1                           4

calcPolarity(sentiment_DF, "The course was so interesting")
         word positive.polarity negative.polarity positive.ponderate.polarity
1 interesting                 1                 0                          10

编辑以更正注释中的极性结果：
极性的输出是c（0,5）
因为原始代码是：polarity[i]我不能得到与你相同的结果。上下文只提取3个字符，而不是3个单词（此处上下文给出“o”）。也许你的正则表达式错了？对我来说，我得到contet=[1]“这门课很有趣”“没那么无聊”，这是正确的，不知道？我不知道为什么它看起来这么复杂。。没有人能帮我..你的预期结果不应该是有趣的：10
和无聊的：0
<代码>有趣
从1开始，没有负面影响，因此1+9=10，无聊
没有任何加强词开始。非常感谢你花了这么多时间回答我，但我仍然有一个问题：为什么我做了“calcPolarity（感悟），这门课很有趣，但教授没有那么无聊”）”结果我得到“05”。我只有一句话，所以我应该只有一个值？这是极性的输出正确吗？这是你的代码：polarity[i]我编辑了我的帖子，请看例子，看看我是如何确定polarity的，我理解它的。请参阅我的注释C。在您的代码中，您从不指定negative.pounderate.polarity，此变量仅出现在末尾，在其他地方看不到。这就是为什么我问评论C。目前你只有（1 + 4）- 0＝5。请考虑我的评论C，以及我在那里指出的情况。您不能让else语句保持原样，它需要另一个限定符，否则“如此无聊”将被指定为positive.pounderate.polarity点。我会把这个留给你，并考虑回答的问题。如果需要，您可以发送评论，并请接受答案，以便对此进行标记。
 subDF$positive.ponderate.polarity <- subDF$positive.polarity

  subDF$positive.ponderate.polarity[i] <- subDF$positive.polarity[i] + 4
  subDF$positive.ponderate.polarity[i] <- subDF$positive.polarity[i] + 9

calcPolarity(sentiment_DF, "The course was so boring")
    word positive.polarity negative.polarity positive.ponderate.polarity
2 boring                 0                 1                           9

calcPolarity(sentiment_DF, "The course was interesting, but the professor was not so boring")
         word positive.polarity negative.polarity positive.ponderate.polarity
1 interesting                 1                 0                           1
2      boring                 0                 1                           4

calcPolarity(sentiment_DF, "The course was so interesting")
         word positive.polarity negative.polarity positive.ponderate.polarity
1 interesting                 1                 0                          10

calcPolarity <- function(sentiment_DF,sentences){
  booster_words <- c("more","enough", "a lot", "as", "so")
  negative_words <- c("not", "rien", "ni", "aucun", "nul", "jamais", "pas", "non plus", "sans")
  reduce_words <- c("peu", "presque", "moins", "seulement")
  # pre-allocate the polarity result vector with size = number of sentences
  polarity <- rep.int(0,length(sentences))

  # loop per sentence
  for(i in 1:length(polarity)){
sentence <- sentences[i]

# separate each sentence in words using regular expression 
wordsOfASentence <- unlist(regmatches(sentence,gregexpr("[[:word:]]+",sentence,perl=TRUE)))

# get the rows of sentiment_DF corresponding to the words in the sentence using match
# N.B. if a word occurs twice, there will be two equal rows 
# (but I think it's correct since in this way you count its polarity twice)
subDF <- sentiment_DF[match(wordsOfASentence,sentiment_DF$word,nomatch = 0),]
print(subDF)

# Find (number) of matching word. 
wordOfInterest <- wordsOfASentence[which(wordsOfASentence %in% levels(sentiment_DF$word))]  # No multigrepl, so working with duplicates instead. eg interesting
regexOfInterest <- paste0("([^\\s]+\\s){0,3}", wordOfInterest, "(\\s[^\\s]+){0,3}")

# extract a context of 3 words before the word in the dataframe
context <-  stringr::str_extract(sentence, regexOfInterest)
names(context) <- wordOfInterest  # Helps in forloop

for(i in 1:length(context)){
  print(paste("i:", i))
  print(context)
  print("initial")
  print(subDF$positive.polarity)
  subDF$positive.ponderate.polarity <- subDF$positive.polarity
  print(subDF$positive.ponderate.polarity)

  if (any(unlist(strsplit(context[i], " ")) %in% booster_words)) {
    print(booster_words)
    length(booster_words)
    print("if level 1")
    print(subDF$positive.polarity)
    if (any(unlist(strsplit(context[i], " ")) %in% negative_words)) {
      subDF$positive.ponderate.polarity[i] <- subDF$positive.polarity[i] + 4
      print("if level 2A")
      print(subDF$positive.ponderate.polarity)
    } else {
      print("if level 2B")
      subDF$positive.ponderate.polarity[i] <- subDF$positive.polarity[i] + 9
      print(subDF$positive.ponderate.polarity)
    }

    print("level 2 result")
    print(subDF$positive.ponderate.polarity)
  }
  print("level 1 result")
  print(subDF$positive.ponderate.polarity)

    }
  }
    # Debug option
    print(subDF)

    # calculate the total polarity of the sentence and store in the vector
    polarity <- sum(subDF$positive.ponderate.polarity) - sum(subDF$negative.ponderate.polarity)

  return(polarity)
}

sentiment_DF <- data.frame(word=c('interesting','boring','pretty'),
                       positive.polarity=c(1,0,1),
                       negative.polarity=c(0,1,0))
calcPolarity(sentiment_DF, "The course was interesting, but the professor was not so boring")
calcPolarity(sentiment_DF, "The course was so interesting")
calcPolarity(sentiment_DF, "The course was so boring")