Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/344.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 使用计数和tfidf作为scikit学习的功能_Python_Numpy_Nlp_Scikit Learn_Ml - Fatal编程技术网

Python 使用计数和tfidf作为scikit学习的功能

Python 使用计数和tfidf作为scikit学习的功能,python,numpy,nlp,scikit-learn,ml,Python,Numpy,Nlp,Scikit Learn,Ml,我试图使用计数和tfidf作为多项式NB模型的特征。这是我的密码: text = ["this is spam", "this isn't spam"] labels = [0,1] count_vectorizer = CountVectorizer(stop_words="english", min_df=3) tf_transformer = TfidfTransformer(use_idf=True) combined_features = FeatureUnion([("counts

我试图使用计数和tfidf作为多项式NB模型的特征。这是我的密码:

text = ["this is spam", "this isn't spam"]
labels = [0,1]
count_vectorizer = CountVectorizer(stop_words="english", min_df=3)

tf_transformer = TfidfTransformer(use_idf=True)
combined_features = FeatureUnion([("counts", self.count_vectorizer), ("tfidf", tf_transformer)]).fit(self.text)

classifier = MultinomialNB()
classifier.fit(combined_features, labels)
但我在FeatureUnion和tfidf中遇到了一个错误:

TypeError: no supported conversion for types: (dtype('S18413'),)

知道为什么会这样吗?不能同时将计数和tfidf作为功能吗?

错误不是来自
FeatureUnion
,而是来自
tfidf Transformer

您应该使用
TfidfVectorizer
而不是
TfidfTransformer
,转换器需要一个numpy数组作为输入,而不是纯文本,因此会出现TypeError

此外,您的测试语句对于Tfidf测试来说太小,因此请尝试使用更大的语句,下面是一个示例:

from nltk.corpus import brown

from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer
from sklearn.pipeline import FeatureUnion
from sklearn.naive_bayes import MultinomialNB

# Let's get more text from NLTK
text = [" ".join(i) for i in brown.sents()[:100]]
# I'm just gonna assign random tags.
labels = ['yes']*50 + ['no']*50
count_vectorizer = CountVectorizer(stop_words="english", min_df=3)
tf_transformer = TfidfVectorizer(use_idf=True)
combined_features = FeatureUnion([("counts", count_vectorizer), ("tfidf", tf_transformer)]).fit_transform(text)
classifier = MultinomialNB()
classifier.fit(combined_features, labels)