Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/16.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 3.x Python3文本标签_Python 3.x_Scikit Learn_Classification - Fatal编程技术网

Python 3.x Python3文本标签

Python 3.x Python3文本标签,python-3.x,scikit-learn,classification,Python 3.x,Scikit Learn,Classification,我不知道从哪里开始回答这个问题,因为我现在学习神经网络。我有一个很大的数据库,里面有句子>标签对。例如: i want take a photo < photo i go to take a photo < photo i go to use my camera < photo i go to eat something < eat i like my food < eat 我想拍张照片

我不知道从哪里开始回答这个问题,因为我现在学习神经网络。我有一个很大的数据库,里面有句子>标签对。例如:

i want take a photo < photo
i go to take a photo < photo
i go to use my camera < photo
i go to eat something < eat
i like my food < eat
我想拍张照片
如果用户写了一个新句子,我想检查所有标签的准确性分数:

“我用完相机就上床睡觉” 那么问题是,我从哪里开始呢?Tensorflow和scikit learn看起来不错,但该文档分类并未显示准确性:\

import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.preprocessing import LabelEncoder
from sklearn import metrics

sentences = ["i want take a photo", "i go to take a photo", "i go to use my camera", "i go to eat something", "i like my food"]

labels = ["photo", "photo", "photo", "eat", "eat"]

tfv = TfidfVectorizer()

# Fit TFIDF
tfv.fit(traindata)
X =  tfv.transform(traindata) 

lbl = LabelEncoder()
y = lbl.fit_transform(labels)

xtrain, xtest, ytrain, ytest = cross_validation.train_test_split(X, y, stratify=y, random_state=42)

clf = LogisitcRegression()
clf.fit(xtrain, ytrain)
predictions = clf.predict(xtest)

print "Accuracy Score = ", metrics.accuracy_score(ytest, predictions)
有关新数据:

new_sentence = ["this is a new sentence"]
X_Test = tfv.transform(new_sentence)
print clf.predict_proba(X_Test)

? 好的,但是我怎么能在所有标签上检查一个新的随机句子呢?Thx很多,但我的最后一个问题:这是有效的,但是如果我搜索一个现有的句子,例如:“我去吃东西”,它的答案是:0.55 0.44,但为什么?这是eat类别的列车数据:\n第一个数字不是照片,第二个是eat类别?或者如果没有,我能得到什么数字是什么类别吗?LabelEncoder标记0和N-1之间的目标。N是目标数。在您的例子中,您可以通过使用lbl.reverse_transform([0,1])获得真实的标签。0=吃饭,1=拍照。如果答案有帮助,请将其标记为已接受。