Python 转换预测目标_Python_Machine Learning_Scikit Learn_Multilabel Classification

Python 转换预测目标

python machine-learning scikit-learn

Python 转换预测目标,python,machine-learning,scikit-learn,multilabel-classification,Python,Machine Learning,Scikit Learn,Multilabel Classification,我有一个数据集，其中每个观察可能属于不同的标签（多标签分类）我已经对它和它的工作做了SVM分类。（这里我对每个类的准确性很感兴趣，因此我对每个类应用了OneVsRestClassifier，正如您将在代码中看到的那样。）我想查看测试数据中每个项目的预测值。换言之，我想看看模型在测试样本的每次观察中预测了哪个标签例如：这是传递给模型进行预测的数据 ,sentences,ADR,WD,EF,INF,SSI,DI,others 0,"extreme weight gain, short-ter

我有一个数据集，其中每个观察可能属于不同的标签（多标签分类）

我已经对它和它的工作做了SVM分类。（这里我对每个类的准确性很感兴趣，因此我对每个类应用了

OneVsRestClassifier

，正如您将在代码中看到的那样。）

我想查看测试数据中每个项目的预测值。换言之，我想看看模型在测试样本的每次观察中预测了哪个标签

例如：这是传递给模型进行预测的数据

,sentences,ADR,WD,EF,INF,SSI,DI,others
0,"extreme weight gain, short-term memory loss, hair loss.",1,0,0,0,0,0,0
1,I am detoxing from Lexapro now.,0,0,0,0,0,0,1
2,I slowly cut my dosage over several months and took vitamin supplements to help.,0,0,0,0,0,0,1
3,I am now 10 days completely off and OMG is it rough.,0,0,0,0,0,0,1
4,"I have flu-like symptoms, dizziness, major mood swings, lots of anxiety, tiredness.",0,1,0,0,0,0,1
5,I have no idea when this will end.,1,0,0,0,0,0,1

然后，我的模型预测了这些行的标签，我想查看每一行的预测映射

我知道我们可以使用scikit学习库中的

标签二值化

问题是解释的

fit_transform

的输入参数与我准备并传递给SVM分类的目标数据不同。所以我不知道怎么弄清楚

这是我的代码：

df = pd.read_csv("finalupdatedothers.csv")
categories = ['ADR','WD','EF','INF','SSI','DI','others']

train,test = train_test_split(df,random_state=42,test_size=0.3,shuffle=True)
X_train = train.sentences
X_test = test.sentences

SVC_pipeline = Pipeline([
                ('tfidf', TfidfVectorizer(stop_words=stop_words)),
                ('clf', OneVsRestClassifier(LinearSVC(), n_jobs=1)),
            ])

for category in categories:
    print('... Processing {} '.format(category))
    SVC_pipeline.fit(X_train,train[category]
    prediction = SVC_pipeline.predict(X_test)
    print('SVM Linear Test accuracy is {} '.format(accuracy_score(test[category], prediction)))
    print 'SVM Linear f1 measurement is {} '.format(f1_score(test[category], prediction, average='weighted'))
    print "\n"

感谢您的时间。

这是您想要的，我刚才做的是，我映射了

预测

，这是一个numpy数组，表示您的

类别

列表中的类标签索引。这是完整的代码

import pandas as pd
import numpy as np
from sklearn import svm
from sklearn.datasets import samples_generator
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import f_regression
from sklearn.pipeline import Pipeline
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.multiclass import OneVsRestClassifier

from sklearn.svm import LinearSVC
from sklearn.metrics import accuracy_score
from sklearn.metrics import f1_score
df = pd.read_csv("finalupdatedothers.csv")
categories = ['ADR','WD','EF','INF','SSI','DI','others']

train,test = train_test_split(df,random_state=42,test_size=0.3,shuffle=True)
X_train = train.sentences
X_test = test.sentences

SVC_pipeline = Pipeline([
                ('tfidf', TfidfVectorizer(stop_words=[])),
                ('clf', OneVsRestClassifier(LinearSVC(), n_jobs=1)),
            ])


for category in categories:
    print('... Processing {} '.format(category))
    SVC_pipeline.fit(X_train,train[category])
    prediction = SVC_pipeline.predict(X_test)
    print([{X_test.iloc[i]:categories[prediction[i]]} for i in range(len(list(prediction)))  ])

    print('SVM Linear Test accuracy is {} '.format(accuracy_score(test[category], prediction)))
    print ('SVM Linear f1 measurement is {} '.format(f1_score(test[category], prediction, average='weighted')))
    print ("\n")

下面是示例输出：

... Processing ADR 
[{'extreme weight gain, short-term memory loss, hair loss.': 'ADR'}, {'I am detoxing from Lexapro now.': 'ADR'}]
SVM Linear Test accuracy is 0.5 
SVM Linear f1 measurement is 0.3333333333333333 


... Processing WD 
[{'extreme weight gain, short-term memory loss, hair loss.': 'ADR'}, {'I am detoxing from Lexapro now.': 'ADR'}]
SVM Linear Test accuracy is 1.0 
SVM Linear f1 measurement is 1.0

我希望这有帮助

您的意思是想知道

prediction

variable中指定的标签吗？@user2906838谢谢您的评论。我的意思是，如果在测试数据中我有一行像“我讨厌这种药物”，那么我的模型预测为ADR。所以我想看看所有测试数据的映射。我明白了吗？哦，是的，不过很抱歉，你可以分享一个YRU csv的样本，只是为了重新生成你的输出。我也许能帮上忙。当然谢谢你的帮助：）你现在能检查一下数据是否有效吗？但是想象一下，当我们通过测试时，测试数据不会有标签为什么我会出现奇怪的错误！tz=getattr（series.dtype，'tz'，None'）文件“pandas_libs\index.pyx”，第106行，熊猫中。_libs.index.IndexEngine.get_值文件“pandas_libs\index.pyx”，第114行，熊猫中。_libs.index.IndexEngine.get_值文件“pandas_libs\index.pyx”，第162行，熊猫中。_libs.index.IndexEngine.get_loc文件“pandas\libs\hashtable\index.pxi”类助手.pxi“，pandas.libs.hashtable.Int64HashTable.get_项文件“pandas\libs\hashtable\u class\u helper.pxi”第958行，pandas.libs.hashtable.Int64HashTable.get_项键错误：0l我可以知道您的pandas版本吗？它是

0.22.0

。根据错误，您的数据中没有键

OL

。没有0L在我的数据中没有意义。我想它与此链接有关。我可以知道您的df中的索引类型吗？

，句子，ADR，WD，EF，INF，SSI，DI，其他

这是我的