通过tweet列表循环Python NLTK分类器_Python_For Loop_Twitter_Nltk

通过tweet列表循环Python NLTK分类器

python for-loop twitter

通过tweet列表循环Python NLTK分类器,python,for-loop,twitter,nltk,Python,For Loop,Twitter,Nltk,我使用twitter_样本语料库训练了NaiveBaynes分类器。我能够在一条tweet上测试分类器，以确保它工作正常。然而，我现在正试图通过~4000条tweet的列表循环分类器，并在我的代码中获得AttributeError： test_sample = [] for (words, sentiment) in test_tweets: words_filtered = [t.lower() for t in words.split() if len(t) >= 3]

我使用twitter_样本语料库训练了NaiveBaynes分类器。我能够在一条tweet上测试分类器，以确保它工作正常。然而，我现在正试图通过~4000条tweet的列表循环分类器，并在我的代码中获得AttributeError：

test_sample = []
for (words, sentiment) in test_tweets:
     words_filtered = [t.lower() for t in words.split() if len(t) >= 3]
     sentiment = classifier.classify(extract_features(words.split()))
     test_sample.append(words_filtered, sentiment)

AttributeError: 'list' object has not attribute 'split'

test_tweets是具有以下结构的tweets列表：

('blah tweety blah', 'tbd')

我正在对tweet执行情绪分析，分类器为每条tweet生成一个pos或neg结果，产生如下输出：

('blah tweety blah', 'pos')

有人能告诉我我的循环出了什么问题吗

该属性错误表示您正在尝试拆分列表-因此test_tweets没有您认为的格式。必须有一个列表，其中您需要一个字符串

作为故障排除步骤，您可以临时修改循环以查找列表中的单词而不是字符串：

test_sample = []
for (words, sentiment) in test_tweets:
    if type(words) is list:
        print('This is a list, not a string ', end='') 
        print(words)
     # words_filtered = [t.lower() for t in words.split() if len(t) >= 3]
     # sentiment = classifier.classify(extract_features(words.split()))
     # test_sample.append(words_filtered, sentiment)

一旦你确定哪些单词是列表，你有几个选择。您可以使用相同的if语句跳过该数据集或清除它

test_sample = []
for (words, sentiment) in test_tweets:
    if type(words) is list:
        words_filtered = [t.lower() for t in words if len(t) >= 3] # just skip the split method
        sentiment = classifier.classify(extract_features(words))
        # continue  if you want to skip over lists, you can use continue to go to the next iteration of the loop
    else:
        words_filtered = [t.lower() for t in words.split() if len(t) >= 3]
        sentiment = classifier.classify(extract_features(words.split()))
    test_sample.append(words_filtered, sentiment)

split（）

是一种用于字符串对象的方法。您确定

words

是字符串而不是列表吗？您可以使用

type（）

函数进行检查。

words

是一个列表而不是字符串，因此这就是问题所在。对不起，没有先检查一下。