Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/324.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python sklearn train“测试”分割:在哪里添加平均值=无;目标为多类,但平均值=';二进制';。。。错误?_Python_Pandas_Scikit Learn - Fatal编程技术网

Python sklearn train“测试”分割:在哪里添加平均值=无;目标为多类,但平均值=';二进制';。。。错误?

Python sklearn train“测试”分割:在哪里添加平均值=无;目标为多类,但平均值=';二进制';。。。错误?,python,pandas,scikit-learn,Python,Pandas,Scikit Learn,我知道我应该在某处添加average=None,但我真的不知道,目标变量是一组数字: from sklearn.model_selection import train_test_split trainset, testset = train_test_split(df, test_size=0.2, random_state=0) def preprocessing(df): X = df.drop('log_price', axis=1) y = df

我知道我应该在某处添加
average=None
,但我真的不知道,目标变量是一组数字:

from sklearn.model_selection import train_test_split

trainset, testset = train_test_split(df, test_size=0.2, random_state=0)

def preprocessing(df):
    
    
    X = df.drop('log_price', axis=1)
    y = df['log_price'] 
    
    print(y.value_counts())
    
    return X, y

X_train, y_train = preprocessing(trainset)

X_test, y_test = preprocessing(testset)



from sklearn.metrics import confusion_matrix, classification_report
from sklearn.model_selection import learning_curve

def evaluation(model):
    
    model.fit(X_train, y_train)
    ypred = model.predict(X_test)
    
    print(confusion_matrix(y_test, ypred))
    print(classification_report(y_test, ypred))
    
    N, train_score, val_score = learning_curve(model, X_train, y_train,
                                              cv=4, scoring='f1',
                                                train_sizes=np.linspace(0.1, 1, 10))
    
    
    plt.figure(figsize=(12, 8))
    plt.plot(N, train_score.mean(axis=1), label='train score')
    plt.plot(N, val_score.mean(axis=1), label='validation score')
    plt.legend()



from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier
from sklearn.svm import SVC 
from sklearn.neighbors import KNeighborsClassifier
from sklearn.pipeline import make_pipeline
from sklearn.feature_selection import SelectKBest, f_classif
from sklearn.preprocessing import PolynomialFeatures, StandardScaler

preprocessor = make_pipeline(PolynomialFeatures(2, include_bias=False), SelectKBest(f_classif, k=10))

KNN = make_pipeline(preprocessor, StandardScaler(), KNeighborsClassifier())

dict_of_models = {
                  'KNN': KNN
                 }


for name, model in dict_of_models.items():
    print(name)
    evaluation(model)
我得到了这个错误:

ValueError: Target is multiclass but average='binary'. Please choose another average setting, one of [None, 'micro', 'macro', 'weighted'].

谢谢。

出现错误是因为您正在使用
'f1'
作为
学习曲线中的评分参数。这仅用于二进制目标。但是,正如错误消息所示,您的根本问题是一个多类问题。因此,您需要另一种具有适当平均策略的评分方法。可以找到预定义的值。使用
'f1\u宏'
的示例:

N, train_score, val_score = learning_curve(model, X_train, y_train,
                                           cv=4, 
                                           scoring='f1_macro', # <-- change here
                                           train_sizes=np.linspace(0.1, 1, 10)
)
N,列车评分,val评分=学习曲线(模型,X列车,y列车,
cv=4,

scoring='f1_macro',#请提供错误回溯。首先猜测:如果您想设置其选项,您需要提供
learning_curve
scoring
参数作为完整的记分器,而不是字符串
f1
。回溯:我无法将其添加到我的原始帖子中,因为它太长。请注意,他的代码工作完美如果我删除学习曲线,我不会得到一个错误…查看链接中的混淆矩阵…你在这里使用分类而不是回归有什么特别的原因吗?