Python 基于支持向量机系数函数的特征重要性_Python_Scikit Learn_Svm

Python 基于支持向量机系数函数的特征重要性

python scikit-learn

Python 基于支持向量机系数函数的特征重要性,python,scikit-learn,svm,Python,Scikit Learn,Svm,我正在从事一个文本分类项目，并试图使用SVCkernel='linear'来获得特征的重要性。这是我的密码：我把代码从然而，它显示了一条错误消息，我不知道我哪里做错了。以前有人有过这样的经历吗 ValueError回溯最近的调用最后的 13.fitX，y 14 clf=管道。命名的_步骤['classifier'] -->15 f_importancesclf.coef_[0]，要素名称十六, 在f_importancescoef中，名称 5 imp=coef 6 imp，名称=zip

我正在从事一个文本分类项目，并试图使用SVCkernel='linear'来获得特征的重要性。这是我的密码：我把代码从

然而，它显示了一条错误消息，我不知道我哪里做错了。以前有人有过这样的经历吗

ValueError回溯最近的调用最后的 13.fitX，y 14 clf=管道。命名的_步骤['classifier'] -->15 f_importancesclf.coef_[0]，要素名称十六,

在f_importancescoef中，名称 5 imp=coef 6 imp，名称=zip*SORTEDZIPMP，名称 -->7 plt.barhrangelennames，imp，align='center' 8 plt.yticksrangelennames，名称 9 plt.show

/中的anaconda3/lib/python3.6/site-packages/matplotlib/pyplot.py barh*args，**kwargs 2667 mpl折旧 2668尝试： ->2669 ret=ax.barh*args，**kwargs 2670最终：2671 ax.\u hold=washold

/anaconda3/lib/python3.6/site-packages/matplotlib/axes//u axes.py in barhself，*args，**kwargs 2281 设置默认的“方向”，“水平”2282个面片 =self.barx=左侧，高度=高度，宽度=宽度， ->2283底部=y，**kwargs 2284返回补片2285

/中的anaconda3/lib/python3.6/site-packages/matplotlib/init.py innerax，*args，**kwargs 1715 warnings.warnmsg%label_namer，func.name，1716 运行时警告，堆栈级别=2 ->1717返回funcax，*args，**kwargs 1718 pre_doc=internal.doc 1719如果pre_doc为None：

/anaconda3/lib/python3.6/site-packages/matplotlib/axes//u axes.py in barself，*args，**kwargs 2091 elif方向== “水平”：2092 r.sticky_edges.x.appendl ->2093 self.add_patchr 2094 patchs.appender 2095

/anaconda3/lib/python3.6/site-packages/matplotlib/axes//u base.py in 如果p.get\u clip\u路径为无，则添加第1852页的\u patchself： 1853 p.set\u clip\u pathself.patch ->1854自我更新修补程序限制SP 1855自我修补程序附录p 1856 p.移除方法=λh： self.patches.remove

/anaconda3/lib/python3.6/site-packages/matplotlib/axes//u base.py in _更新补丁限制自身、补丁1868或高度。1869如果isinstancepatch，mpatches.矩形和 ->1870 not patch.get_宽度和not patch.get_高度：1871返回1872 顶点=patch.get_path.vertices

/anaconda3/lib/python3.6/site-packages/scipy/sparse/base.py in 布尔瑟夫 286返回self.nnz！=0 287其他： ->288 raise VALUERROR具有多个值的数组的真值 289元素不明确。使用a.any或a.all。 290非零=布尔

ValueError：包含多个元素的数组的真值为模棱两可的使用a.any或a.all

谢谢大家!

Scikit Learn的文档表明，coef_uu属性是一个shape=[n_class*n_class-1/2，n_features]数组。假设有4个类和9个特征，_coef的形状是6 x 9，6行9列。另一方面，barh期望每个特征有一个值，而不是六个，因此您将得到一个错误。如下面的示例所示，如果沿每列对系数求和，则可以消除该错误

import numpy as np
import matplotlib.pyplot as plt

def f_importances(coef, names):
    imp = coef
    imp,names = zip(*sorted(zip(imp,names)))
    plt.barh(range(len(names)), imp, align='center')
    plt.yticks(range(len(names)), names)
    plt.show()

features_names = ['title_mainText', 'upper_title', 'upper_mainText', 'punct_title', 'punct_mainText',
                  'exclamations_title', 'exclamations_text', 'title_words_not_stopword', 'text_words_not_stopword']

n_classes = 4
n_features = len(features_names)

clf_coef_ = np.random.randint(1, 30, size=(int(0.5*n_classes*(n_classes-1)), n_features))

f_importances(clf_coef_.sum(axis=0), features_names)

Scikit学习了coef_u属性是shape=[n_类*n_类-1/2，n_特征]数组的文档。假设有4个类和9个特征，_coef的形状是6 x 9，6行9列。另一方面，barh期望每个特征有一个值，而不是六个，因此您将得到一个错误。如下面的示例所示，如果沿每列对系数求和，则可以消除该错误

import numpy as np
import matplotlib.pyplot as plt

def f_importances(coef, names):
    imp = coef
    imp,names = zip(*sorted(zip(imp,names)))
    plt.barh(range(len(names)), imp, align='center')
    plt.yticks(range(len(names)), names)
    plt.show()

features_names = ['title_mainText', 'upper_title', 'upper_mainText', 'punct_title', 'punct_mainText',
                  'exclamations_title', 'exclamations_text', 'title_words_not_stopword', 'text_words_not_stopword']

n_classes = 4
n_features = len(features_names)

clf_coef_ = np.random.randint(1, 30, size=(int(0.5*n_classes*(n_classes-1)), n_features))

f_importances(clf_coef_.sum(axis=0), features_names)

请发布您的所有代码。如果您编辑您的帖子以包含错误的完整回溯，这会有所帮助。顺便说一句，我可以让您的代码在玩具二进制分类数据集上工作，但我必须传入clf.coef_0，因为coef_u返回嵌套数组。“这可能是一件让你绊倒的事。”杰里姆说。嗨，我现在已经更新了我的完整代码：@G.Anderson谢谢！我试图使用'clf.coef_u0]`但它显示了相同的错误。我还更新了错误的完整回溯：请发布您的所有代码。如果您编辑您的帖子以包含错误的完整回溯，这会有所帮助。顺便说一句，我可以让您的代码在玩具二进制分类数据集上工作，但我必须传入clf.coef_0，因为coef_u返回嵌套数组。“这可能是一件让你绊倒的事。”杰里姆说。嗨，我现在已经更新了我的完整代码：@G.Anderson谢谢！我试图使用'clf.coef_u0]`但它显示了相同的错误。我还更新了错误的完整回溯：