Python 在数据帧上应用函数以执行情绪分析_Python_Pandas_Dataframe

Python 在数据帧上应用函数以执行情绪分析

python pandas dataframe

Python 在数据帧上应用函数以执行情绪分析,python,pandas,dataframe,Python,Pandas,Dataframe,下面的函数在短语中执行情绪分析，并返回元组（情绪，%NB分类器），如（悲伤，0.78）我想在熊猫数据帧df.Message上应用此函数对其进行分析，然后再创建另外两列df.interaction，df.Prob 代码如下： def avalia(teste): testeStemming = [] stemmer = nltk.stem.RSLPStemmer() for (palavras_treinamento) in teste.split():

下面的函数在短语中执行情绪分析，并返回元组

（情绪，%NB分类器）

，如

（悲伤，0.78）

我想在熊猫数据帧

df.Message

上应用此函数对其进行分析，然后再创建另外两列

df.interaction

，

df.Prob

代码如下：

def avalia(teste):
    testeStemming = []
    stemmer = nltk.stem.RSLPStemmer()
    for (palavras_treinamento) in teste.split():
        comStem = [p for p in palavras_treinamento.split()]
        testeStemming.append(str(stemmer.stem(comStem[0])))

    novo = extrator_palavras(testeStemming)
    distribuicao = classificador.prob_classify(novo)
    classe_array = [(classe, (distribuicao.prob(classe))) for classe in distribuicao.samples()]
    inverse = [(value, key) for key, value in classe_array]
    max_key = max(inverse)[1]
    for each in classe_array:
       if each[0] == max_key:
           a=each[0] # returns the sentiment
           b=each[1] # returns the probability
           #print(each)
           return a, b

单个字符串的示例：

avalia('i am sad today!')
(sadness, 0.98)

现在我有了一个包含13k行和一列的数据帧：Message。我可以将我的函数应用于dataframe列并获得pandas.series，如：

0       (surpresa, 0.27992165905522154)
1            (medo, 0.5632686358414051)
2        (surpresa, 0.2799216590552195)
3         (alegria, 0.5429940754962914)

我想使用这些信息在同一数据帧中创建两个新列，如下所示

    Message    Sentiment      Probability
0   I am sad    surpresa        0.2799
1   I am happy  medo            0.56

我不能完成最后一部分。有什么帮助吗？

请尝试在函数末尾返回这两个值，并使用

apply（）

将它们保存到单独的列中：

您的数据帧的结构是什么？单列[消息]包含一个字符串，如“今天我买了一辆车”…有13k行。我按照您的建议更新了代码，现在只需要最后一部分，将结果添加到新列中。我尝试了，但出现了此错误消息：ValueError:太多值无法解压缩（预期为2）

def avalia(teste):
    testeStemming = []
    stemmer = nltk.stem.RSLPStemmer()
    for (palavras_treinamento) in teste.split():
        comStem = [p for p in palavras_treinamento.split()]
        testeStemming.append(str(stemmer.stem(comStem[0])))

    novo = extrator_palavras(testeStemming)
    distribuicao = classificador.prob_classify(novo)
    classe_array = [(classe, (distribuicao.prob(classe))) for classe in distribuicao.samples()]
    inverse = [(value, key) for key, value in classe_array]
    max_key = max(inverse)[1]
    for each in classe_array:
       if each[0] == max_key:
           a=each[0] # returns the sentiment
           b=each[1] # returns the probability
    return a, b

df.Sentiment, df.Prob = df.Message.apply(avalia)