Python 用于分类功能的OneHotEncoder“;“星期几”;导致数值误差
我想为Python 用于分类功能的OneHotEncoder“;“星期几”;导致数值误差,python,scikit-learn,Python,Scikit Learn,我想为day\u of_week列定义一个带有OneHotEncoder的管道。我不明白为什么会出现ValueError: import pandas as pd from sklearn.compose import ColumnTransformer from sklearn.ensemble import RandomForestRegressor from sklearn.model_selection import train_test_split from sklearn.pipe
day\u of_week
列定义一个带有OneHotEncoder的管道。我不明白为什么会出现ValueError:
import pandas as pd
from sklearn.compose import ColumnTransformer
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, OneHotEncoder
if __name__ == '__main__':
data_dict = {
'age': [1, 2, 3],
'day_of_week': ['monday', 'tuesday', 'wednesday'],
'y': [5, 6, 7]
}
data = pd.DataFrame(data_dict, columns=data_dict)
numeric_features = ['age']
numeric_transformer = Pipeline(steps=[
('scaler', StandardScaler())])
categorical_features = ['day_of_week']
print(categorical_features)
categorical_transformer = Pipeline(steps=[
('onehot', OneHotEncoder(handle_unknown='ignore', categories='auto'))])
preprocessor = ColumnTransformer(
transformers=[
('numerical', numeric_transformer, numeric_features),
('categorical', categorical_transformer, categorical_features)])
classifier = Pipeline(
steps=[
('preprocessor', preprocessor),
('classifier', RandomForestRegressor(n_estimators=60))])
X = data.drop(labels=['y'], axis=1)
y = data['y']
X_train, y_train, X_test, y_test = train_test_split(X, y, train_size=0.8, random_state=30)
trained_model = classifier.fit(X_train, y_train)
这一行有一个错误:
X_train, y_train, X_test, y_test = train_test_split(X, y, train_size=0.8, random_state=30)
train\u test\u split
返回X(train,test),y(train,test)。。由于您错误地分配了它们,您的分类器会抛出各种各样的错误
尝试将其更改为:
X_train,X_test, y_train,y_test = train_test_split(X, y, train_size=0.8, random_state=30)
您的代码对我运行时没有错误此行有错误:
X_train, y_train, X_test, y_test = train_test_split(X, y, train_size=0.8, random_state=30)
train\u test\u split
返回X(train,test),y(train,test)。。由于您错误地分配了它们,您的分类器会抛出各种各样的错误
尝试将其更改为:
X_train,X_test, y_train,y_test = train_test_split(X, y, train_size=0.8, random_state=30)
您的代码对我来说运行正常