Python 3.x lightgbm | | ValueError:Series.d类型必须为int、float或bool
Dataframe已填充na值 数据集的架构没有文档中指定的对象数据类型Python 3.x lightgbm | | ValueError:Series.d类型必须为int、float或bool,python-3.x,scikit-learn,jupyter-notebook,lightgbm,Python 3.x,Scikit Learn,Jupyter Notebook,Lightgbm,Dataframe已填充na值 数据集的架构没有文档中指定的对象数据类型 df.info() 输出: <class 'pandas.core.frame.DataFrame'> Int64Index: 429 entries, 351 to 559 Data columns (total 11 columns): # Column Non-Null Count Dtype --- ------ ----------
df.info()
输出:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 429 entries, 351 to 559
Data columns (total 11 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Gender 429 non-null category
1 Married 429 non-null category
2 Dependents 429 non-null category
3 Education 429 non-null category
4 Self_Employed 429 non-null category
5 ApplicantIncome 429 non-null int64
6 CoapplicantIncome 429 non-null float64
7 LoanAmount 429 non-null float64
8 Loan_Amount_Term 429 non-null float64
9 Credit_History 429 non-null float64
10 Property_Area 429 non-null category
dtypes: category(6), float64(4), int64(1)
memory usage: 23.3 KB
获取以下错误:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-178-aaa91a2d8719> in <module>
6
7
----> 8 model= lgb.train(params, train_data, 100,categorical_feature=cat_cols)
~\Anaconda3\lib\site-packages\lightgbm\engine.py in train(params, train_set, num_boost_round, valid_sets, valid_names, fobj, feval, init_model, feature_name, categorical_feature, early_stopping_rounds, evals_result, verbose_eval, learning_rates, keep_training_booster, callbacks)
229 # construct booster
230 try:
--> 231 booster = Booster(params=params, train_set=train_set)
232 if is_valid_contain_train:
233 booster.set_train_data_name(train_data_name)
~\Anaconda3\lib\site-packages\lightgbm\basic.py in __init__(self, params, train_set, model_file, model_str, silent)
1981 break
1982 # construct booster object
-> 1983 train_set.construct()
1984 # copy the parameters from train_set
1985 params.update(train_set.get_params())
~\Anaconda3\lib\site-packages\lightgbm\basic.py in construct(self)
1319 else:
1320 # create train
-> 1321 self._lazy_init(self.data, label=self.label,
1322 weight=self.weight, group=self.group,
1323 init_score=self.init_score, predictor=self._predictor,
~\Anaconda3\lib\site-packages\lightgbm\basic.py in _lazy_init(self, data, label, reference, weight, group, init_score, predictor, silent, feature_name, categorical_feature, params)
1133 raise TypeError('Cannot initialize Dataset from {}'.format(type(data).__name__))
1134 if label is not None:
-> 1135 self.set_label(label)
1136 if self.get_label() is None:
1137 raise ValueError("Label should not be None")
~\Anaconda3\lib\site-packages\lightgbm\basic.py in set_label(self, label)
1648 self.label = label
1649 if self.handle is not None:
-> 1650 label = list_to_1d_numpy(_label_from_pandas(label), name='label')
1651 self.set_field('label', label)
1652 self.label = self.get_field('label') # original values can be modified at cpp side
~\Anaconda3\lib\site-packages\lightgbm\basic.py in list_to_1d_numpy(data, dtype, name)
88 elif isinstance(data, Series):
89 if _get_bad_pandas_dtypes([data.dtypes]):
---> 90 raise ValueError('Series.dtypes must be int, float or bool')
91 return np.array(data, dtype=dtype, copy=False) # SparseArray should be supported as well
92 else:
ValueError: Series.dtypes must be int, float or bool
---------------------------------------------------------------------------
ValueError回溯(最近一次调用上次)
在里面
6.
7.
---->8型号=轻型列车(参数,列车数据,100,分类功能=类别)
~\Anaconda3\lib\site packages\lightgbm\engine.py in train(参数、训练集、数值增强、有效集、有效名称、fobj、feval、初始模型、功能名称、分类功能、提前停止轮、评估结果、详细评估、学习率、持续培训增强、回调)
229#建造助推器
230尝试:
-->231增压器=增压器(参数=参数,机组=机组)
232如果有效,则包含列车:
233.设置列车数据名称(列车数据名称)
~\Anaconda3\lib\site packages\lightgbm\basic.py in\uuuuu init\uuuuu(self、params、train\u set、model\u文件、model\u str、silent)
1981年休息
1982#构造助推器对象
->1983年列车组构造()
1984年#从列车组复制参数
1985参数更新(train_set.get_params())
构造中的~\Anaconda3\lib\site packages\lightgbm\basic.py(self)
1319其他:
1320#创建列车
->1321 self.\u lazy\u init(self.data,label=self.label,
1322重量=自身重量,组=自身组,
1323初始分数=自我。初始分数,预测值=自我。预测值,
初始化中的~\Anaconda3\lib\site packages\lightgbm\basic.py(自身、数据、标签、引用、权重、组、初始化分数、预测值、无提示、功能名称、分类功能、参数)
1133 raise TypeError('无法从{}初始化数据集'。格式(类型(数据)。\uuuuu名称\uuuu))
1134如果标签不是无:
->1135自我设置标签(标签)
1136如果self.get_label()为无:
1137提升值错误(“标签不应为无”)
设置标签中的~\Anaconda3\lib\site packages\lightgbm\basic.py(self,label)
1648 self.label=标签
1649如果self.handle不是None:
->1650 label=列表到菜单(\u label\u来自熊猫(label),name='label')
1651自我设置字段(“标签”,标签)
1652 self.label=self.get_字段('label')#可以在cpp侧修改原始值
~\Anaconda3\lib\site packages\lightgbm\basic.py(数据、数据类型、名称)
88 elif isinstance(数据,系列):
89如果得到坏的数据类型([data.dtypes]):
--->90提升值错误('Series.dtypes必须为int、float或bool')
91返回np.array(数据,dtype=dtype,copy=False)#还应支持SparseArray
92.其他:
ValueError:Series.d类型必须为int、float或bool
有人帮过你吗?如果没有:答案就在于转换变量
转到此链接:
LightGBM的创建者曾经遇到过同样的问题。
在上面的链接中,他们(STRIKER)告诉您,您应该:使用astype(“category”)(pandas/scikit)转换变量,并将其标记为编码,因为您的功能列中需要一个INT!值,尤其是INT32
但是,标签编码和astype(“类别”)通常也应这样做:
另一个有用的链接是关于分类功能的高级文档:它们告诉您不能像处理数据中的对象(字符串)数据类型
如果您仍然对这个解释感到不舒服,下面是我从kaggle space_race_集合中提取的代码片段。如果您仍然有问题,请直接询问
cat_feats = ['Company Name', 'Night_and_Day', 'Rocket Type', 'Rocket Mission Type', 'State', 'Country']
labelencoder = LabelEncoder()
for col in cat_feats:
train_df[col] = labelencoder.fit_transform(train_df[col])
for col in cat_feats:
train_df[col] = train_df[col].astype('int')
y = train_df[["Status Mission"]]
X = train_df.drop(["Status Mission"], axis=1)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state = 42)
train_data = lgb.Dataset(X_train,
label=y_train,
categorical_feature=['Company Name', 'Night_and_Day', 'Rocket Type', 'Rocket Mission Type', 'State', 'Country'],
free_raw_data=False)
test_data = lgb.Dataset(X_test,
label=y_test,
categorical_feature=['Company Name', 'Night_and_Day', 'Rocket Type', 'Rocket Mission Type', 'State', 'Country'],
free_raw_data=False)
cat_feats = ['Company Name', 'Night_and_Day', 'Rocket Type', 'Rocket Mission Type', 'State', 'Country']
labelencoder = LabelEncoder()
for col in cat_feats:
train_df[col] = labelencoder.fit_transform(train_df[col])
for col in cat_feats:
train_df[col] = train_df[col].astype('int')
y = train_df[["Status Mission"]]
X = train_df.drop(["Status Mission"], axis=1)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state = 42)
train_data = lgb.Dataset(X_train,
label=y_train,
categorical_feature=['Company Name', 'Night_and_Day', 'Rocket Type', 'Rocket Mission Type', 'State', 'Country'],
free_raw_data=False)
test_data = lgb.Dataset(X_test,
label=y_test,
categorical_feature=['Company Name', 'Night_and_Day', 'Rocket Type', 'Rocket Mission Type', 'State', 'Country'],
free_raw_data=False)