Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/list/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 用LinearSVC进行特征选择_Python_Machine Learning_Svm_Scikit Learn - Fatal编程技术网

Python 用LinearSVC进行特征选择

Python 用LinearSVC进行特征选择,python,machine-learning,svm,scikit-learn,Python,Machine Learning,Svm,Scikit Learn,当我尝试使用我的数据运行以下代码时(从) 我得到: "Invalid threshold: all features are discarded" 我尝试指定自己的阈值: clf = LinearSVC(C=0.01, penalty="l1", dual=False) clf.fit(X,y) X_new = clf.transform(X, threshold=my_threshold) 但我要么得到: 一个与X大小相同的数组X\u new,只要my\u threshold是以下其中一

当我尝试使用我的数据运行以下代码时(从)

我得到:

"Invalid threshold: all features are discarded"
我尝试指定自己的阈值:

clf = LinearSVC(C=0.01, penalty="l1", dual=False)
clf.fit(X,y)
X_new = clf.transform(X, threshold=my_threshold)
但我要么得到:

  • 一个与
    X
    大小相同的数组
    X\u new
    ,只要
    my\u threshold
    是以下其中一项:

    • “意思”
    • “中间值”
  • “无效阈值”
    错误(例如,将标量值传递到阈值时)

我无法发布整个矩阵
X
,但以下是数据的一些统计信息:

> X.shape 
Out: (29,312) 

> np.mean(X, axis=1)
Out: 
array([-0.30517191, -0.1147345 ,  0.03674294, -0.15926932, -0.05034101,
       -0.06357734, -0.08781186, -0.12865185,  0.14172452,  0.33640029,
        0.06778798, -0.00217696,  0.09097335, -0.17915627,  0.03701893,
       -0.1361117 ,  0.13132006,  0.14406628, -0.05081956,  0.20777349,
       -0.06028931,  0.03541849, -0.07100492,  0.05740661, -0.38585413,
        0.31837905,  0.14076042,  0.1182338 , -0.06903557])

> np.std(X, axis=1)                                               
Out: 
array([ 1.3267662 ,  0.75313658,  0.81796146,  0.79814621,  0.59175161,
        0.73149726,  0.8087903 ,  0.59901198,  1.13414141,  1.02433752,
        0.99884428,  1.11139231,  0.89254901,  1.92760784,  0.57181158,
        1.01322265,  0.66705546,  0.70248779,  1.17107696,  0.88254386,
        1.06930436,  0.91769016,  0.92915593,  0.84569395,  1.59371779,
        0.71257806,  0.94307434,  0.95083782,  0.88996455])

y = array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2,
           0, 0, 0, 0, 0, 0])

这就是scikit learn 0.14的全部内容,您应该首先分析您的
SVM
模型是否训练良好,然后再尝试将其用作转换基础。可能是您使用的太小
C
参数
,导致
sklearn
训练一个小模型,从而导致删除所有功能。您可以通过对数据执行分类测试或至少打印找到的系数(
clf.coef
)来检查它


最好运行
网格搜索
技术,在泛化属性方面获得最佳的
C
,然后将其用于转换。

阈值基于
clf.coef\ucode>。你能把它寄出去吗?
> X.shape 
Out: (29,312) 

> np.mean(X, axis=1)
Out: 
array([-0.30517191, -0.1147345 ,  0.03674294, -0.15926932, -0.05034101,
       -0.06357734, -0.08781186, -0.12865185,  0.14172452,  0.33640029,
        0.06778798, -0.00217696,  0.09097335, -0.17915627,  0.03701893,
       -0.1361117 ,  0.13132006,  0.14406628, -0.05081956,  0.20777349,
       -0.06028931,  0.03541849, -0.07100492,  0.05740661, -0.38585413,
        0.31837905,  0.14076042,  0.1182338 , -0.06903557])

> np.std(X, axis=1)                                               
Out: 
array([ 1.3267662 ,  0.75313658,  0.81796146,  0.79814621,  0.59175161,
        0.73149726,  0.8087903 ,  0.59901198,  1.13414141,  1.02433752,
        0.99884428,  1.11139231,  0.89254901,  1.92760784,  0.57181158,
        1.01322265,  0.66705546,  0.70248779,  1.17107696,  0.88254386,
        1.06930436,  0.91769016,  0.92915593,  0.84569395,  1.59371779,
        0.71257806,  0.94307434,  0.95083782,  0.88996455])

y = array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2,
           0, 0, 0, 0, 0, 0])