Python 猜得准_Python_Scikit Learn_Classification_Text Classification

Python 猜得准

python scikit-learn

Python 猜得准,python,scikit-learn,classification,text-classification,Python,Scikit Learn,Classification,Text Classification,我目前正在尝试使用以下代码如下： import random def scramble(s): return "".join(random.sample(s, len(s))) words = [w.strip() for w in open('/usr/share/dict/words') if w == w.lower()] scrambled = [scramble(w) for w in words] X = words+scrambled y = ['word']*len

我目前正在尝试使用

以下代码如下：

import random
def scramble(s):
    return "".join(random.sample(s, len(s)))

words = [w.strip() for w in open('/usr/share/dict/words') if w == w.lower()]
scrambled = [scramble(w) for w in words]

X = words+scrambled
y = ['word']*len(words) + ['unpronounceable']*len(scrambled)

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y)

from sklearn.pipeline import Pipeline
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB

text_clf = Pipeline([
    ('vect', CountVectorizer(analyzer='char', ngram_range=(1, 3))),
    ('clf', MultinomialNB())
    ])

text_clf = text_clf.fit(X_train, y_train)
predicted = text_clf.predict(X_test)

from sklearn import metrics
print(metrics.classification_report(y_test, predicted))

这个输出带有随机单词

>>> text_clf.predict("scaroly".split())
['word']

我一直在检查，但似乎仍然无法确定如何让它打印输入单词的分数。

试试：

它返回给定输入（在本例中为“scaroly”）属于训练模型的类的可能性。因此，99.94%的几率“scaroly”是可以发音的

相反，威尔士语中“新”一词可能无法发音：

>>> text_clf.predict_proba(["newydd"])
array([[ 0.99666533,  0.00333467]])

你所说的“分数”到底是什么意思？比如分类器对给定单词的发音有多自信？@不，没错

>>> text_clf.predict_proba(["newydd"])
array([[ 0.99666533,  0.00333467]])