Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/317.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/ruby/24.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
如何用python集成支持向量机和Logistic回归_Python_Svm_Logistic Regression_Text Classification_Ensemble Learning - Fatal编程技术网

如何用python集成支持向量机和Logistic回归

如何用python集成支持向量机和Logistic回归,python,svm,logistic-regression,text-classification,ensemble-learning,Python,Svm,Logistic Regression,Text Classification,Ensemble Learning,我正在做一个文本分类的任务(由10个标签均匀分布的7000个文本)。并通过探索支持向量机和Logistic回归 clf1 = svm.LinearSVC() clf1.fit(X, y) clf1.predict(X_test) score1 = clf1.score(X_test,y_true) clf2 = linear_model.LogisticRegression() clf2.fit(X, y) clf2.predict(X_test) score2 = clf2.score(X_

我正在做一个文本分类的任务(由10个标签均匀分布的7000个文本)。并通过探索支持向量机和Logistic回归

clf1 = svm.LinearSVC()
clf1.fit(X, y)
clf1.predict(X_test)
score1 = clf1.score(X_test,y_true)

clf2 = linear_model.LogisticRegression()
clf2.fit(X, y)
clf2.predict(X_test)
score2 = clf2.score(X_test,y_true)
我得到了两个精度,
score1
score2
我想我是否可以通过开发一个集成系统来提高精度,该系统将上述两个分类器的输出结合起来。 我自学了
集成
的知识,我知道有
打包、增压和堆叠

然而,我不知道如何使用我的SVM和逻辑回归在
集合中预测的分数。谁能给我一些想法或给我看一些示例代码吗?

你可以将概率相乘,或者使用另一个组合规则

为了以更通用的方式实现这一点(尝试几种规则) 你可以用

此外,请记住,分类器应足够多样化,以提供良好的组合结果

如果您的功能较少,我会说您应该查看一些动态分类器/集成选择(brew中也提供了),但由于您可能有许多功能,欧几里德距离对于获取每个分类器的能力区域可能没有意义。最好的办法是根据混淆矩阵,手工检查每个分类器倾向于获得正确的标签类型

from brew.base import Ensemble
from brew.base import EnsembleClassifier
from brew.combination.combiner import Combiner

# create your Ensemble
clfs = [clf1, clf2]
ens = Ensemble(classifiers=clfs)

# Since you have only 2 classifiers 'majority_vote' is note an option,
# rule = ['mean', 'majority_vote', 'max', 'min', 'median']
comb = Combiner(rule='mean')

# now create your ensemble classifier
ensemble_clf = EnsembleClassifier(ensemble=ens, combiner=comb)
ensemble_clf.predict(X)