Python 学习算法的计算核心
我对ML python环境非常陌生,我需要绘制精度/召回图,如本文所述:[您需要计算y_分数:Python 学习算法的计算核心,python,machine-learning,scikit-learn,naivebayes,Python,Machine Learning,Scikit Learn,Naivebayes,我对ML python环境非常陌生,我需要绘制精度/召回图,如本文所述:[您需要计算y_分数: # Create a simple classifier classifier = svm.LinearSVC(random_state=random_state) classifier.fit(X_train, y_train) y_score = classifier.decision_function(X_test) 所以问题是:如何使用多项式NaiveBayes或LearningTr
# Create a simple classifier
classifier = svm.LinearSVC(random_state=random_state)
classifier.fit(X_train, y_train)
y_score = classifier.decision_function(X_test)
所以问题是:如何使用多项式NaiveBayes或LearningTree计算分数?在我的代码中,我有:
print("MultinomialNB - countVectorizer")
xTrain, xTest, yTrain, yTest=countVectorizer(db)
classifier = MultinomialNB()
model = classifier.fit(xTrain, yTrain)
yPred = model.predict(xTest)
print("confusion Matrix of MNB/ cVectorizer:\n")
print(confusion_matrix(yTest, yPred))
print("\n")
print("classificationReport Matrix of MNB/ cVectorizer:\n")
print(classification_report(yTest, yPred))
elapsed_time = time.time() - start_time
print("elapsed Time: %.3fs" %elapsed_time)
绘图功能:
def plotLearningAlgorithm(yTest,yScore,algName):
precision, recall, _ = precision_recall_curve(yTest, yScore)
plt.step(recall, precision, color='b', alpha=0.2,
where='post')
plt.fill_between(recall, precision, alpha=0.2, color='b', **step_kwargs)
plt.xlabel('Recall')
plt.ylabel('Precision')
plt.ylim([0.0, 1.05])
plt.xlim([0.0, 1.0])
plt.title('2-class Precision-Recall'+ algName +'curve: AP={0:0.2f}'.format(average_precision))
绘图错误:
<ipython-input-43-d07c3365bfc2> in MultinomialNaiveBayesOPT()
11 yPred = model.predict(xTest)
12
---> 13 plotLearningAlgorithm(yTest,model.predict_proba(xTest),"MultinomialNB - countVectorizer")
14
15 print("confusion Matrix of MNB/ cVectorizer:\n")
<ipython-input-42-260aac9918f2> in plotLearningAlgorithm(yTest, yScore, algName)
1 def plotLearningAlgorithm(yTest,yScore,algName):
2
----> 3 precision, recall, _ = precision_recall_curve(yTest, yScore)
4
5 step_kwargs = ({'step': 'post'}
/opt/anaconda3/lib/python3.7/site-packages/sklearn/metrics/ranking.py in precision_recall_curve(y_true, probas_pred, pos_label, sample_weight)
522 fps, tps, thresholds = _binary_clf_curve(y_true, probas_pred,
523 pos_label=pos_label,
--> 524 sample_weight=sample_weight)
525
526 precision = tps / (tps + fps)
/opt/anaconda3/lib/python3.7/site-packages/sklearn/metrics/ranking.py in _binary_clf_curve(y_true, y_score, pos_label, sample_weight)
398 check_consistent_length(y_true, y_score, sample_weight)
399 y_true = column_or_1d(y_true)
--> 400 y_score = column_or_1d(y_score)
401 assert_all_finite(y_true)
402 assert_all_finite(y_score)
/opt/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py in column_or_1d(y, warn)
758 return np.ravel(y)
759
--> 760 raise ValueError("bad input shape {0}".format(shape))
761
762
ValueError: bad input shape (9000, 2)
因此,y_Pred需要通过以下方式进行转换:
yPred_probability = yPred_probability[:,1];
非常感谢@ignoring_gravity为我提供了正确的解决方案,我还打印了无技能线,以增加图表的可读性。他们称之为
y_score
的就是您的ML算法输出的预测概率
在多项式nb和决策树(我想这就是你所说的LearningTree?)中,你可以使用方法。predict\u proba
:
classifier = MultinomialNB()
model = classifier.fit(xTrain, yTrain)
yPred = model.predict_proba(xTest)
这个问题与jupyter笔记本无关-请不要发送不相关的标签(删除并替换为
scikit learn
和naivebayes
)。谢谢你的回复,但我得到了上面的错误,有什么提示吗?除非你发布了一个可复制的示例,否则没有-请参阅提供的解决方案带有一点线索-从你的解决方案开始出现错误;)
classifier = MultinomialNB()
model = classifier.fit(xTrain, yTrain)
yPred = model.predict_proba(xTest)
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
# countvectorizer is not used for train and test split, instead use train_test_split
train_x, test_x, train_y, test_y = train_test_split(X, y, test_size=0.33, random_state=42) # here x is going to be your textual data, whereas y will be your target
countMatrix_train = vect.fit_transform(train_x) # you have to fit with your train data
countMatrix_test = vect.transform(test_x) # now have to transform( and not fit_transform) according to your train data
classifier = MultinomialNB()
classifier.fit(countMatrix_train, train_y)
ypred = classifier.predict(countMatrix_test) # this will give you class for your test data, now use this for making classification report