通过tweet列表循环Python NLTK分类器

通过tweet列表循环Python NLTK分类器,python,for-loop,twitter,nltk,Python,For Loop,Twitter,Nltk,我使用twitter_样本语料库训练了NaiveBaynes分类器。我能够在一条tweet上测试分类器,以确保它工作正常。然而,我现在正试图通过~4000条tweet的列表循环分类器,并在我的代码中获得AttributeError: test_sample = [] for (words, sentiment) in test_tweets: words_filtered = [t.lower() for t in words.split() if len(t) >= 3]

我使用twitter_样本语料库训练了NaiveBaynes分类器。我能够在一条tweet上测试分类器,以确保它工作正常。然而,我现在正试图通过~4000条tweet的列表循环分类器,并在我的代码中获得AttributeError:

test_sample = []
for (words, sentiment) in test_tweets:
     words_filtered = [t.lower() for t in words.split() if len(t) >= 3]
     sentiment = classifier.classify(extract_features(words.split()))
     test_sample.append(words_filtered, sentiment)

AttributeError: 'list' object has not attribute 'split'
test_tweets是具有以下结构的tweets列表:

('blah tweety blah', 'tbd')
我正在对tweet执行情绪分析,分类器为每条tweet生成一个posneg结果,产生如下输出:

('blah tweety blah', 'pos')

有人能告诉我我的循环出了什么问题吗

该属性错误表示您正在尝试拆分列表-因此test_tweets没有您认为的格式。必须有一个列表,其中您需要一个字符串

作为故障排除步骤,您可以临时修改循环以查找列表中的单词而不是字符串:

test_sample = []
for (words, sentiment) in test_tweets:
    if type(words) is list:
        print('This is a list, not a string ', end='') 
        print(words)
     # words_filtered = [t.lower() for t in words.split() if len(t) >= 3]
     # sentiment = classifier.classify(extract_features(words.split()))
     # test_sample.append(words_filtered, sentiment)
一旦你确定哪些单词是列表,你有几个选择。您可以使用相同的if语句跳过该数据集或清除它

test_sample = []
for (words, sentiment) in test_tweets:
    if type(words) is list:
        words_filtered = [t.lower() for t in words if len(t) >= 3] # just skip the split method
        sentiment = classifier.classify(extract_features(words))
        # continue  if you want to skip over lists, you can use continue to go to the next iteration of the loop
    else:
        words_filtered = [t.lower() for t in words.split() if len(t) >= 3]
        sentiment = classifier.classify(extract_features(words.split()))
    test_sample.append(words_filtered, sentiment)

split()
是一种用于字符串对象的方法。您确定
words
是字符串而不是列表吗?您可以使用
type()
函数进行检查。
words
是一个列表而不是字符串,因此这就是问题所在。对不起,没有先检查一下。