Python 用于分类功能的OneHotEncoder“；“星期几”；导致数值误差_Python_Scikit Learn

Python 用于分类功能的OneHotEncoder“；“星期几”；导致数值误差

python scikit-learn

Python 用于分类功能的OneHotEncoder“；“星期几”；导致数值误差,python,scikit-learn,Python,Scikit Learn,我想为day\u of_week列定义一个带有OneHotEncoder的管道。我不明白为什么会出现ValueError： import pandas as pd from sklearn.compose import ColumnTransformer from sklearn.ensemble import RandomForestRegressor from sklearn.model_selection import train_test_split from sklearn.pipe

我想为

day\u of_week

列定义一个带有OneHotEncoder的管道。我不明白为什么会出现ValueError：


import pandas as pd
from sklearn.compose import ColumnTransformer
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, OneHotEncoder


if __name__ == '__main__':

    data_dict = {
        'age': [1, 2, 3],
        'day_of_week': ['monday', 'tuesday', 'wednesday'],
        'y': [5, 6, 7]
    }

    data = pd.DataFrame(data_dict, columns=data_dict)

    numeric_features = ['age']
    numeric_transformer = Pipeline(steps=[
        ('scaler', StandardScaler())])

    categorical_features = ['day_of_week']
    print(categorical_features)
    categorical_transformer = Pipeline(steps=[
        ('onehot', OneHotEncoder(handle_unknown='ignore', categories='auto'))])

    preprocessor = ColumnTransformer(
        transformers=[
            ('numerical', numeric_transformer, numeric_features),
            ('categorical', categorical_transformer, categorical_features)])

    classifier = Pipeline(
        steps=[
            ('preprocessor', preprocessor),
            ('classifier', RandomForestRegressor(n_estimators=60))])

    X = data.drop(labels=['y'], axis=1)
    y = data['y']

    X_train, y_train, X_test, y_test = train_test_split(X, y, train_size=0.8, random_state=30)

    trained_model = classifier.fit(X_train, y_train)

这一行有一个错误：

X_train, y_train, X_test, y_test = train_test_split(X, y, train_size=0.8, random_state=30)

train\u test\u split

返回X（train，test），y（train，test）。。由于您错误地分配了它们，您的分类器会抛出各种各样的错误

尝试将其更改为：

X_train,X_test, y_train,y_test = train_test_split(X, y, train_size=0.8, random_state=30)

您的代码对我运行时没有错误

此行有错误：

X_train, y_train, X_test, y_test = train_test_split(X, y, train_size=0.8, random_state=30)

train\u test\u split

返回X（train，test），y（train，test）。。由于您错误地分配了它们，您的分类器会抛出各种各样的错误

尝试将其更改为：

X_train,X_test, y_train,y_test = train_test_split(X, y, train_size=0.8, random_state=30)

您的代码对我来说运行正常