Python xgboost:AttributeError:&x27;DMatrix';对象没有属性';手柄';
这个问题真的很奇怪,因为这段代码与其他数据集配合得很好 完整代码:Python xgboost:AttributeError:&x27;DMatrix';对象没有属性';手柄';,python,python-3.x,machine-learning,xgboost,kaggle,Python,Python 3.x,Machine Learning,Xgboost,Kaggle,这个问题真的很奇怪,因为这段代码与其他数据集配合得很好 完整代码: import numpy as np import pandas as pd import xgboost as xgb from sklearn.cross_validation import train_test_split # # Split the Learning Set X_fit, X_eval, y_fit, y_eval= train_test_split( train, target, test_s
import numpy as np
import pandas as pd
import xgboost as xgb
from sklearn.cross_validation import train_test_split
# # Split the Learning Set
X_fit, X_eval, y_fit, y_eval= train_test_split(
train, target, test_size=0.2, random_state=1
)
clf = xgb.XGBClassifier(missing=np.nan, max_depth=6,
n_estimators=5, learning_rate=0.15,
subsample=1, colsample_bytree=0.9, seed=1400)
# fitting
clf.fit(X_fit, y_fit, early_stopping_rounds=50, eval_metric="logloss", eval_set=[(X_eval, y_eval)])
#print y_pred
y_pred= clf.predict_proba(test)[:,1]
最后一行导致以下错误(提供完整输出):
这里怎么了?我不知道如何解决这个问题
UPD1:实际上这是一个kaggle问题:这里的问题与初始数据有关:一些值是float或integer,还有一些对象。这就是为什么我们需要铸造它们:
from sklearn import preprocessing
for f in train.columns:
if train[f].dtype=='object':
lbl = preprocessing.LabelEncoder()
lbl.fit(list(train[f].values))
train[f] = lbl.transform(list(train[f].values))
for f in test.columns:
if test[f].dtype=='object':
lbl = preprocessing.LabelEncoder()
lbl.fit(list(test[f].values))
test[f] = lbl.transform(list(test[f].values))
train.fillna((-999), inplace=True)
test.fillna((-999), inplace=True)
train=np.array(train)
test=np.array(test)
train = train.astype(float)
test = test.astype(float)
您可能还想看看
分类变量
解决方案,如下所示:
for col in train.select_dtypes(include=['object']).columns:
train[col] = train[col].astype('category')
test[col] = test[col].astype('category')
# Encoding categorical features
for col in train.select_dtypes(include=['category']).columns:
train[col] = train[col].cat.codes
test[col] = test[col].cat.codes
train.fillna((-999), inplace=True)
test.fillna((-999), inplace=True)
train=np.array(train)
test=np.array(test)
X_fit.dtypes
和X_eval.dtypes
的输出是什么?这是用于X_fit.dtypes
target int64 v1 float64 v2 float64 v3 int64 v4 float64<代码>测试甚至有对象类型哇,谢谢,我不知道熊猫中有这样的数据类型
for col in train.select_dtypes(include=['object']).columns:
train[col] = train[col].astype('category')
test[col] = test[col].astype('category')
# Encoding categorical features
for col in train.select_dtypes(include=['category']).columns:
train[col] = train[col].cat.codes
test[col] = test[col].cat.codes
train.fillna((-999), inplace=True)
test.fillna((-999), inplace=True)
train=np.array(train)
test=np.array(test)