Python 朴素贝叶斯分类器提取综述_Python_Machine Learning_Scikit Learn_Naivebayes

Python 朴素贝叶斯分类器提取综述

python machine-learning scikit-learn

Python 朴素贝叶斯分类器提取综述,python,machine-learning,scikit-learn,naivebayes,Python,Machine Learning,Scikit Learn,Naivebayes,我试图训练一个朴素的贝叶斯分类器，但我在数据方面遇到了问题。我计划将其用于抽取文本摘要 Example_Input: It was a sunny day. The weather was nice and the birds were singing. Example_Output: The weather was nice and the birds were singing. 我有一个我计划使用的数据集，在每个文档中至少有一句话用于总结我决定使用sklearn，但我不知道如何表示我拥有

我试图训练一个朴素的贝叶斯分类器，但我在数据方面遇到了问题。我计划将其用于抽取文本摘要

Example_Input: It was a sunny day. The weather was nice and the birds were singing.
Example_Output: The weather was nice and the birds were singing.

我有一个我计划使用的数据集，在每个文档中至少有一句话用于总结

我决定使用sklearn，但我不知道如何表示我拥有的数据。即X和y

from sklearn.naive_bayes import MultinomialNB
clf = MultinomialNB().fit(X, y)

最接近我的想法是这样做：

X = [
        'It was a sunny day. The weather was nice and the birds were singing.',
        'I like trains. Hi, again.'
    ]

y = [
        [0,1],
        [1,0]
    ]

其中，目标值表示1-包含在摘要中，0-不包含。不幸的是，这会导致错误的形状异常，因为y应该是一维数组。我想不出一种表达方式，所以请帮助我

顺便说一句，我没有直接使用

中的字符串值，而是使用sklearn的

CountVectorizer

和

TfidfTransformer

将它们表示为向量。

根据您的要求，您正在对数据进行分类。也就是说，你需要把每个句子分开来预测它的类别

例如：
而不是使用：

X = [
        'It was a sunny day. The weather was nice and the birds were singing.',
        'I like trains. Hi, again.'
    ]

按如下方式使用：

X = [
        'It was a sunny day.',
        'The weather was nice and the birds were singing.',
        'I like trains.',
        'Hi, again.'
    ]

使用NLTK的句子标记器来实现这一点

现在，对于标签，使用两个类。让我们说1代表是，0代表否。

y = [
        [0,],
        [1,],
        [1,],
        [0,]
    ]

现在，使用这些数据来拟合和预测您想要的方式

希望有帮助

谢谢你的回答。它会起作用，而且肯定比我好，但是这样，分类器不会考虑文档中的句子的位置，因为所有的东西都会被看作是一个。“有没有一种方法我也可以包括这些内容。@nikola将多行句子作为输入，并使用nltk句子标记器将其拆分，并预测每个句子，但只将预测等级为1的句子打印到输出中，即“是”