Python:NLTK:returndictionary-仅返回1个值

Python:NLTK:returndictionary-仅返回1个值,python,dictionary,nltk,Python,Dictionary,Nltk,很抱歉在这里转储了一整段代码。我一直在试图找出我做错了什么,但不幸的是我不知道 在我的论文中,我必须将推特分类为中性0、负-1或正1。我正在尝试使用NLTK。目标是代码以“tweetA,0”和“tweetB,-1”的形式返回字典。。。目前,如果我输入多条tweet作为输入,我只会得到第一条tweet返回的结果,即-1/0/1 例如,如果我把“我爱橙子”,“我讨厌西红柿”作为输入,我只得到“1”作为返回,而不是“1”,“1” 如果有人能帮助我,我将非常感激 到目前为止,我拥有的代码是: impor

很抱歉在这里转储了一整段代码。我一直在试图找出我做错了什么,但不幸的是我不知道

在我的论文中,我必须将推特分类为中性0、负-1或正1。我正在尝试使用NLTK。目标是代码以“tweetA,0”和“tweetB,-1”的形式返回字典。。。目前,如果我输入多条tweet作为输入,我只会得到第一条tweet返回的结果,即-1/0/1

例如,如果我把“我爱橙子”,“我讨厌西红柿”作为输入,我只得到“1”作为返回,而不是“1”,“1”

如果有人能帮助我,我将非常感激

到目前为止,我拥有的代码是:

import re, math, collections, itertools
import nltk
import nltk.classify.util, nltk.metrics
from nltk.classify import NaiveBayesClassifier
from nltk.metrics import BigramAssocMeasures
from nltk.probability import FreqDist, ConditionalFreqDist  
from nltk.util import ngrams
from nltk.tokenize import word_tokenize
from sklearn.feature_extraction.text import TfidfVectorizer
from nltk.stem.porter import *
from nltk.stem.snowball import SnowballStemmer

stemmer = SnowballStemmer("english", ignore_stopwords = True)
pos_tweets = ['I love bananas','I like pears','I eat oranges']
neg_tweets = ['I hate lettuce','I do not like tomatoes','I hate apples']
neutral_tweets = ['I buy chicken','I am boiling eggs','I am chopping vegetables']

def uni(doc):
    x = []
    y = []
    for tweet in doc:
        x.append(word_tokenize(tweet))
    for element in x:
        for word in element:
            if len(word)>2:
                word = word.lower()
                word = stemmer.stem(word)
                y.append(word)
    return y

def word_feats_uni(doc):
     return dict([(word, True) for word in uni(doc)])

def tokenizer_ngrams(document):
    all_tokens = []
    filtered_tokens = []
    for (sentence) in document:
        all_tokens.append(word_tokenize(sentence))
    return all_tokens

def get_bi (document):
    x = tokenizer_ngrams(document)
    c = []
    for sentence in x:
        c.extend([bigram for bigram in nltk.bigrams(sentence)])
    return c

def get_tri(document):
    x = tokenizer_ngrams(document)
    c = []
    for sentence in x:
        c.extend([bigram for bigram in nltk.bigrams(sentence)])
    return c


def word_feats_bi(doc): 
    return dict([(word, True) for word in get_bi(doc)])

def word_feats_tri(doc):
    return dict([(word, True) for word in get_tri(doc)])

def word_feats_test(doc):
    feats_test = {}
i = 0
for tweet in doc:
    feats_test.update(word_feats_uni(tweet))
    feats_test.update(word_feats_bi(tweet))
    feats_test.update(word_feats_tri(tweet))
return feats_test


pos_feats = [(word_feats_uni(pos_tweets),'1')] + [(word_feats_bi(pos_tweets),'1')] + [(word_feats_tri(pos_tweets),'1')]

neg_feats = [(word_feats_uni(neg_tweets),'-1')] + [(word_feats_bi(neg_tweets),'-1')] + [(word_feats_tri(neg_tweets),'-1')]

neutral_feats = [(word_feats_uni(neutral_tweets),'0')] + [(word_feats_bi(neutral_tweets),'0')] + [(word_feats_tri(neutral_tweets),'0')]

trainfeats = pos_feats + neg_feats + neutral_feats
classifier = NaiveBayesClassifier.train(trainfeats)
print (classifier.classify(word_feats_test(['I love oranges'])))

最后一行代码:实际上,您不会对多条tweet进行分类。如果您提交多个文件,会是什么样子?classifier.classification\u许多能解决你的问题吗?@charbugs我通过调整代码的逻辑解决了这个问题。现在,我会将每条推文及其情感自动存储在一个列表中。因此,每个tweet分别调用分类函数。