Python 用LinearSVC进行特征选择
当我尝试使用我的数据运行以下代码时(从) 我得到:Python 用LinearSVC进行特征选择,python,machine-learning,svm,scikit-learn,Python,Machine Learning,Svm,Scikit Learn,当我尝试使用我的数据运行以下代码时(从) 我得到: "Invalid threshold: all features are discarded" 我尝试指定自己的阈值: clf = LinearSVC(C=0.01, penalty="l1", dual=False) clf.fit(X,y) X_new = clf.transform(X, threshold=my_threshold) 但我要么得到: 一个与X大小相同的数组X\u new,只要my\u threshold是以下其中一
"Invalid threshold: all features are discarded"
我尝试指定自己的阈值:
clf = LinearSVC(C=0.01, penalty="l1", dual=False)
clf.fit(X,y)
X_new = clf.transform(X, threshold=my_threshold)
但我要么得到:
- 一个与
大小相同的数组X
,只要X\u new
是以下其中一项:my\u threshold
“意思”
“中间值”
- 或
错误(例如,将标量值传递到阈值时)“无效阈值”
X
,但以下是数据的一些统计信息:
> X.shape
Out: (29,312)
> np.mean(X, axis=1)
Out:
array([-0.30517191, -0.1147345 , 0.03674294, -0.15926932, -0.05034101,
-0.06357734, -0.08781186, -0.12865185, 0.14172452, 0.33640029,
0.06778798, -0.00217696, 0.09097335, -0.17915627, 0.03701893,
-0.1361117 , 0.13132006, 0.14406628, -0.05081956, 0.20777349,
-0.06028931, 0.03541849, -0.07100492, 0.05740661, -0.38585413,
0.31837905, 0.14076042, 0.1182338 , -0.06903557])
> np.std(X, axis=1)
Out:
array([ 1.3267662 , 0.75313658, 0.81796146, 0.79814621, 0.59175161,
0.73149726, 0.8087903 , 0.59901198, 1.13414141, 1.02433752,
0.99884428, 1.11139231, 0.89254901, 1.92760784, 0.57181158,
1.01322265, 0.66705546, 0.70248779, 1.17107696, 0.88254386,
1.06930436, 0.91769016, 0.92915593, 0.84569395, 1.59371779,
0.71257806, 0.94307434, 0.95083782, 0.88996455])
y = array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2,
0, 0, 0, 0, 0, 0])
这就是scikit learn 0.14的全部内容,您应该首先分析您的
SVM
模型是否训练良好,然后再尝试将其用作转换基础。可能是您使用的太小C
参数,导致sklearn
训练一个小模型,从而导致删除所有功能。您可以通过对数据执行分类测试或至少打印找到的系数(clf.coef
)来检查它
最好运行
网格搜索
技术,在泛化属性方面获得最佳的C
,然后将其用于转换。阈值基于clf.coef\ucode>。你能把它寄出去吗?
> X.shape
Out: (29,312)
> np.mean(X, axis=1)
Out:
array([-0.30517191, -0.1147345 , 0.03674294, -0.15926932, -0.05034101,
-0.06357734, -0.08781186, -0.12865185, 0.14172452, 0.33640029,
0.06778798, -0.00217696, 0.09097335, -0.17915627, 0.03701893,
-0.1361117 , 0.13132006, 0.14406628, -0.05081956, 0.20777349,
-0.06028931, 0.03541849, -0.07100492, 0.05740661, -0.38585413,
0.31837905, 0.14076042, 0.1182338 , -0.06903557])
> np.std(X, axis=1)
Out:
array([ 1.3267662 , 0.75313658, 0.81796146, 0.79814621, 0.59175161,
0.73149726, 0.8087903 , 0.59901198, 1.13414141, 1.02433752,
0.99884428, 1.11139231, 0.89254901, 1.92760784, 0.57181158,
1.01322265, 0.66705546, 0.70248779, 1.17107696, 0.88254386,
1.06930436, 0.91769016, 0.92915593, 0.84569395, 1.59371779,
0.71257806, 0.94307434, 0.95083782, 0.88996455])
y = array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2,
0, 0, 0, 0, 0, 0])