Python 如何处理多类决策树?
我对python&ML不熟悉,但我正在尝试使用sklearn构建决策树。我有很多分类特征,我把它们转换成了数值变量。然而,我的目标功能是一个多类,我遇到了一个错误。我应该如何处理多类目标 ValueError:目标为多类,但average='binary'。请选择另一个平均值设置,即[无、'微'、'宏'、'加权']中的一个Python 如何处理多类决策树?,python,machine-learning,decision-tree,sklearn-pandas,gridsearchcv,Python,Machine Learning,Decision Tree,Sklearn Pandas,Gridsearchcv,我对python&ML不熟悉,但我正在尝试使用sklearn构建决策树。我有很多分类特征,我把它们转换成了数值变量。然而,我的目标功能是一个多类,我遇到了一个错误。我应该如何处理多类目标 ValueError:目标为多类,但average='binary'。请选择另一个平均值设置,即[无、'微'、'宏'、'加权']中的一个 from sklearn.model_selection import train_test_split #SPLIT DATA INTO TRAIN AND TEST S
from sklearn.model_selection import train_test_split
#SPLIT DATA INTO TRAIN AND TEST SET
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size =0.30, #by default is 75%-25%
#shuffle is set True by default,
stratify=y, #preserve target propotions
random_state= 123) #fix random seed for replicability
print(X_train.shape, X_test.shape)
from sklearn.tree import DecisionTreeClassifier
model = DecisionTreeClassifier(criterion='gini', max_depth=3, min_samples_split=4, min_samples_leaf=2)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
# criterion : "gini", "entropy"
# max_depth : The maximum depth of the tree.
# min_samples_split : The minimum number of samples required to split an internal node:
# min_samples_leaf : The minimum number of samples required to be at a leaf node.
#DEFINE YOUR CLASSIFIER and THE PARAMETERS GRID
from sklearn.tree import DecisionTreeClassifier
import numpy as np
classifier = DecisionTreeClassifier()
parameters = {'criterion': ['entropy','gini'],
'max_depth': [3,4,5],
'min_samples_split': [5,10],
'min_samples_leaf': [2]}
from sklearn.model_selection import GridSearchCV
gs = GridSearchCV(classifier, parameters, cv=3, scoring = 'f1', verbose=50, n_jobs=-1, refit=True)
您应该手动指定分数函数:
from sklearn.metrics import f1_score, make_scorer
f1 = make_scorer(f1_score, average='weighted')
....
gs = GridSearchCV(classifier, parameters, cv=3, scoring=f1, verbose=50, n_jobs=-1, refit=True)
非常感谢你的帮助。我想出来了。实际上是在gs线上。在得分方面,我需要调整你提到的内容。所以我修改了评分=f1\U宏
gs = GridSearchCV(classifier, parameters, cv=3, scoring=f1_macro, verbose=50, n_jobs=-1, refit=True)
谢谢你的建议,我刚刚尝试了一下,也遇到了同样的错误。我刚刚调整了我的代码示例。你能试试吗?不客气!如果能把答案作为解决办法,我将不胜感激。