Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/321.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 使用scikit learn预测电影评论_Python_Machine Learning_Scikit Learn - Fatal编程技术网

Python 使用scikit learn预测电影评论

Python 使用scikit learn预测电影评论,python,machine-learning,scikit-learn,Python,Machine Learning,Scikit Learn,我正在使用scikit learn多项式NB和矢量器来建立一个预测模型,预测评论的好坏 在对标记数据进行培训后,我如何使用它预测新的评论(或现有评论)?我收到下面的错误消息 from sklearn.feature_extraction.text import CountVectorizer from sklearn.cross_validation import train_test_split from sklearn.naive_bayes import MultinomialNB X

我正在使用scikit learn多项式NB和矢量器来建立一个预测模型,预测评论的好坏

在对标记数据进行培训后,我如何使用它预测新的评论(或现有评论)?我收到下面的错误消息

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.cross_validation import train_test_split
from sklearn.naive_bayes import MultinomialNB

X = vectorizer.fit_transform(df.quote)
X = X.tocsc()
Y = (df.fresh == 'fresh').values.astype(np.int)

xtrain, xtest, ytrain, ytest = train_test_split(X, Y)

clf = MultinomialNB().fit(xtrain, ytrain)

new_review = ['this is a new review, movie was awesome']
new_review = vectorizer.fit_transform(new_review)

print df.quote[15]
print(clf.predict(df.quote[10])) #predict existing review in dataframe
print(clf.predict(new_review)) #predict new review


Technically, Toy Story is nearly flawless.
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-91-27a0698bbd1f> in <module>()
     15 
     16 print df.quote[15]
---> 17 print(clf.predict(df.quote[10])) #predict existing quote in dataframe
     18 print(clf.predict(new_review)) #predict new review

//anaconda/lib/python2.7/site-packages/sklearn/naive_bayes.pyc in predict(self, X)
     60             Predicted target values for X
     61         """
---> 62         jll = self._joint_log_likelihood(X)
     63         return self.classes_[np.argmax(jll, axis=1)]
     64 

//anaconda/lib/python2.7/site-packages/sklearn/naive_bayes.pyc in _joint_log_likelihood(self, X)
    439         """Calculate the posterior log probability of the samples X"""
    440         X = atleast2d_or_csr(X)
--> 441         return (safe_sparse_dot(X, self.feature_log_prob_.T)
    442                 + self.class_log_prior_)
    443 

//anaconda/lib/python2.7/site-packages/sklearn/utils/extmath.pyc in safe_sparse_dot(a, b, dense_output)
    178         return ret
    179     else:
--> 180         return fast_dot(a, b)
    181 
    182 

TypeError: Cannot cast array data from dtype('float64') to dtype('S32') according to the rule 'safe'
来自sklearn.feature\u extraction.text import countvectorier
从sklearn.cross\u验证导入序列测试\u分割
从sklearn.naive_bayes导入多项式nb
X=矢量器.拟合变换(df.引号)
X=X.tocsc()
Y=(df.fresh=='fresh').values.astype(np.int)
xtrain,xtest,ytrain,ytest=列车试验分离(X,Y)
clf=多项式Nb().拟合(xtrain,ytrain)
《新评论》=[“这是一篇新评论,电影太棒了”]
new\u review=矢量器.fit\u变换(new\u review)
打印df.quote[15]
打印(clf.predict(df.quote[10])#在dataframe中预测现有审查
打印(clf.predict(new#u review))#预测新评论
从技术上讲,《玩具总动员》几乎完美无缺。
---------------------------------------------------------------------------
TypeError回溯(最近一次调用上次)
在()
15
16打印df.报价[15]
--->17打印(clf.predict(df.quote[10])#预测数据帧中的现有报价
18打印(clf.predict(new#u review))#预测新评论
//预测中的anaconda/lib/python2.7/site-packages/sklearn/naive_bayes.pyc(self,X)
X的60个预测目标值
61         """
--->62 jll=自联合对数似然(X)
63返回自我类[np.argmax(jll,轴=1)]
64
//anaconda/lib/python2.7/site-packages/sklearn/naive_bayes.pyc in_joint_log_likelization(self,X)
439“计算样本X的后验对数概率”
440 X=至少2D\u或\u csr(X)
-->441返回(安全稀疏点(X,自特性日志问题)
442+自我分类(记录在先)
443
//anaconda/lib/python2.7/site-packages/sklearn/utils/extmath.pyc位于安全稀疏点(a、b、密集输出)
178返回ret
179其他:
-->180快速返回点(a,b)
181
182
TypeError:无法根据“安全”规则将数组数据从dtype('float64')强制转换为dtype('S32')

您需要将一袋单词表示法传递给
预测
,而不是直接传递文本。使用
new\u review
,只需更改
new\u review=vectorizer.transform(new\u review)
,(请参阅@Stergios comment)。尝试以下操作:

print(clf.predict(X[10, :]))

仅供参考,它被称为情绪分析。你的问题与你的错误没有什么关系。谢谢,是的,它被称为情绪分析。我正在尝试使用clf.predict()预测新的评论。可能遗漏了什么,请告诉我,我可以澄清。@keyserAwesome,谢谢!代码clf.predict(new_review)不起作用,它给了我一个错误“ValueError:维度不匹配”。有什么想法吗?@elyasepple请将new\u review=vectorizer.fit\u transform(new\u review)更改为new\u review=vectorizer.transform(new\u review)。模型应仅适用于训练数据。