Python 3.x 使用NaiveBayesClassifier对文本进行分类_Python 3.x_Machine Learning_Scikit Learn_Nlp_Nltk

Python 3.x 使用NaiveBayesClassifier对文本进行分类

python-3.x machine-learning scikit-learn nlp

Python 3.x 使用NaiveBayesClassifier对文本进行分类,python-3.x,machine-learning,scikit-learn,nlp,nltk,Python 3.x,Machine Learning,Scikit Learn,Nlp,Nltk,我有一个文本文件，每行有一句话：例如“您是否已在您的银行帐户中注册了您的电子邮件ID？” 我想把它分为疑问句和非疑问句。仅供参考，这些句子来自银行网站。我见过使用此nltk代码块： import nltk nltk.download('nps_chat') posts = nltk.corpus.nps_chat.xml_posts()[:10000] def dialogue_act_features(post): features = {} for word in

我有一个文本文件，每行有一句话：例如“您是否已在您的银行帐户中注册了您的电子邮件ID？”

我想把它分为疑问句和非疑问句。仅供参考，这些句子来自银行网站。我见过使用此nltk代码块：

import nltk
nltk.download('nps_chat')
posts = nltk.corpus.nps_chat.xml_posts()[:10000]


def dialogue_act_features(post):
    features = {}
    for word in nltk.word_tokenize(post):
        features['contains({})'.format(word.lower())] = True
    return features

featuresets = [(dialogue_act_features(post.text), post.get('class')) for post in posts]
size = int(len(featuresets) * 0.1)
train_set, test_set = featuresets[size:], featuresets[:size]
classifier = nltk.NaiveBayesClassifier.train(train_set)
print(nltk.classify.accuracy(classifier, test_set))

因此，我对我的文本文件进行了一些预处理，即词干处理、删除停止词等，以使每个句子成为一个单词包。从上面的代码中，我有一个经过训练的分类器。我如何在我的句子文本文件（原始或预处理）上实现它

更新：是我的文本文件的一个示例。

假设您已经像我们讨论的那样预处理了文档数据，您可以执行以下操作：

对于您的数据，您可以在行中迭代并拟合、预测：

对文本文件中的所有行执行此操作：

classifier = nltk.NaiveBayesClassifier.train(featuresets)
print(classifier.classify(dialogue_act_features(line)))

您需要使用（）转换文档然后使用分类器。你能上传你的数据吗？@seralouk谢谢你的回答，我现在就看链接！我已经用我的数据的一个例子更新了这个问题。不知道为什么我会被否决，还有什么我应该提供的信息吗？@seralouk不，它们都是句子串。我已经给出了预处理的版本。如果您愿意，我可以附加处理后的版本，其中数字被删除，单词被词干化，停止单词被删除？@seralouk我不能使用nps_chat训练分类器并从中获取样本数据吗？

classifier = nltk.NaiveBayesClassifier.train(featuresets)
print(classifier.classify(dialogue_act_features(line)))

classifier = nltk.NaiveBayesClassifier.train(featuresets)
print(classifier.classify(dialogue_act_features(line)))