Python 3.x Python3文本标签
我不知道从哪里开始回答这个问题,因为我现在学习神经网络。我有一个很大的数据库,里面有句子>标签对。例如:Python 3.x Python3文本标签,python-3.x,scikit-learn,classification,Python 3.x,Scikit Learn,Classification,我不知道从哪里开始回答这个问题,因为我现在学习神经网络。我有一个很大的数据库,里面有句子>标签对。例如: i want take a photo < photo i go to take a photo < photo i go to use my camera < photo i go to eat something < eat i like my food < eat 我想拍张照片
i want take a photo < photo
i go to take a photo < photo
i go to use my camera < photo
i go to eat something < eat
i like my food < eat
我想拍张照片
如果用户写了一个新句子,我想检查所有标签的准确性分数:
“我用完相机就上床睡觉”import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.preprocessing import LabelEncoder
from sklearn import metrics
sentences = ["i want take a photo", "i go to take a photo", "i go to use my camera", "i go to eat something", "i like my food"]
labels = ["photo", "photo", "photo", "eat", "eat"]
tfv = TfidfVectorizer()
# Fit TFIDF
tfv.fit(traindata)
X = tfv.transform(traindata)
lbl = LabelEncoder()
y = lbl.fit_transform(labels)
xtrain, xtest, ytrain, ytest = cross_validation.train_test_split(X, y, stratify=y, random_state=42)
clf = LogisitcRegression()
clf.fit(xtrain, ytrain)
predictions = clf.predict(xtest)
print "Accuracy Score = ", metrics.accuracy_score(ytest, predictions)
有关新数据:
new_sentence = ["this is a new sentence"]
X_Test = tfv.transform(new_sentence)
print clf.predict_proba(X_Test)
? 好的,但是我怎么能在所有标签上检查一个新的随机句子呢?Thx很多,但我的最后一个问题:这是有效的,但是如果我搜索一个现有的句子,例如:“我去吃东西”,它的答案是:0.55 0.44,但为什么?这是eat类别的列车数据:\n第一个数字不是照片,第二个是eat类别?或者如果没有,我能得到什么数字是什么类别吗?LabelEncoder标记0和N-1之间的目标。N是目标数。在您的例子中,您可以通过使用lbl.reverse_transform([0,1])获得真实的标签。0=吃饭,1=拍照。如果答案有帮助,请将其标记为已接受。