Python 3.x lightgbm | | ValueError:Series.d类型必须为int、float或bool_Python 3.x_Scikit Learn_Jupyter Notebook_Lightgbm

Python 3.x lightgbm | | ValueError:Series.d类型必须为int、float或bool

python-3.x scikit-learn jupyter-notebook

Python 3.x lightgbm | | ValueError:Series.d类型必须为int、float或bool,python-3.x,scikit-learn,jupyter-notebook,lightgbm,Python 3.x,Scikit Learn,Jupyter Notebook,Lightgbm,Dataframe已填充na值数据集的架构没有文档中指定的对象数据类型 df.info() 输出： <class 'pandas.core.frame.DataFrame'> Int64Index: 429 entries, 351 to 559 Data columns (total 11 columns): # Column Non-Null Count Dtype --- ------ ----------

Dataframe已填充na值

数据集的架构没有文档中指定的对象数据类型

df.info()

输出：

<class 'pandas.core.frame.DataFrame'>
Int64Index: 429 entries, 351 to 559
Data columns (total 11 columns):
 #   Column             Non-Null Count  Dtype   
---  ------             --------------  -----   
 0   Gender             429 non-null    category
 1   Married            429 non-null    category
 2   Dependents         429 non-null    category
 3   Education          429 non-null    category
 4   Self_Employed      429 non-null    category
 5   ApplicantIncome    429 non-null    int64   
 6   CoapplicantIncome  429 non-null    float64 
 7   LoanAmount         429 non-null    float64 
 8   Loan_Amount_Term   429 non-null    float64 
 9   Credit_History     429 non-null    float64 
 10  Property_Area      429 non-null    category
dtypes: category(6), float64(4), int64(1)
memory usage: 23.3 KB

获取以下错误：

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-178-aaa91a2d8719> in <module>
      6 
      7 
----> 8 model= lgb.train(params, train_data, 100,categorical_feature=cat_cols)

~\Anaconda3\lib\site-packages\lightgbm\engine.py in train(params, train_set, num_boost_round, valid_sets, valid_names, fobj, feval, init_model, feature_name, categorical_feature, early_stopping_rounds, evals_result, verbose_eval, learning_rates, keep_training_booster, callbacks)
    229     # construct booster
    230     try:
--> 231         booster = Booster(params=params, train_set=train_set)
    232         if is_valid_contain_train:
    233             booster.set_train_data_name(train_data_name)

~\Anaconda3\lib\site-packages\lightgbm\basic.py in __init__(self, params, train_set, model_file, model_str, silent)
   1981                     break
   1982             # construct booster object
-> 1983             train_set.construct()
   1984             # copy the parameters from train_set
   1985             params.update(train_set.get_params())

~\Anaconda3\lib\site-packages\lightgbm\basic.py in construct(self)
   1319             else:
   1320                 # create train
-> 1321                 self._lazy_init(self.data, label=self.label,
   1322                                 weight=self.weight, group=self.group,
   1323                                 init_score=self.init_score, predictor=self._predictor,

~\Anaconda3\lib\site-packages\lightgbm\basic.py in _lazy_init(self, data, label, reference, weight, group, init_score, predictor, silent, feature_name, categorical_feature, params)
   1133                 raise TypeError('Cannot initialize Dataset from {}'.format(type(data).__name__))
   1134         if label is not None:
-> 1135             self.set_label(label)
   1136         if self.get_label() is None:
   1137             raise ValueError("Label should not be None")

~\Anaconda3\lib\site-packages\lightgbm\basic.py in set_label(self, label)
   1648         self.label = label
   1649         if self.handle is not None:
-> 1650             label = list_to_1d_numpy(_label_from_pandas(label), name='label')
   1651             self.set_field('label', label)
   1652             self.label = self.get_field('label')  # original values can be modified at cpp side

~\Anaconda3\lib\site-packages\lightgbm\basic.py in list_to_1d_numpy(data, dtype, name)
     88     elif isinstance(data, Series):
     89         if _get_bad_pandas_dtypes([data.dtypes]):
---> 90             raise ValueError('Series.dtypes must be int, float or bool')
     91         return np.array(data, dtype=dtype, copy=False)  # SparseArray should be supported as well
     92     else:

ValueError: Series.dtypes must be int, float or bool

---------------------------------------------------------------------------
ValueError回溯（最近一次调用上次）
在里面
6.
7.
---->8型号=轻型列车（参数，列车数据，100，分类功能=类别）
~\Anaconda3\lib\site packages\lightgbm\engine.py in train（参数、训练集、数值增强、有效集、有效名称、fobj、feval、初始模型、功能名称、分类功能、提前停止轮、评估结果、详细评估、学习率、持续培训增强、回调）
229#建造助推器
230尝试：
-->231增压器=增压器（参数=参数，机组=机组）
232如果有效，则包含列车：
233.设置列车数据名称（列车数据名称）
~\Anaconda3\lib\site packages\lightgbm\basic.py in\uuuuu init\uuuuu（self、params、train\u set、model\u文件、model\u str、silent）
1981年休息
1982#构造助推器对象
->1983年列车组构造（）
1984年#从列车组复制参数
1985参数更新（train_set.get_params（））
构造中的~\Anaconda3\lib\site packages\lightgbm\basic.py（self）
1319其他：
1320#创建列车
->1321 self.\u lazy\u init（self.data，label=self.label，
1322重量=自身重量，组=自身组，
1323初始分数=自我。初始分数，预测值=自我。预测值，
初始化中的~\Anaconda3\lib\site packages\lightgbm\basic.py（自身、数据、标签、引用、权重、组、初始化分数、预测值、无提示、功能名称、分类功能、参数）
1133 raise TypeError（'无法从{}初始化数据集'。格式（类型（数据）。\uuuuu名称\uuuu））
1134如果标签不是无：
->1135自我设置标签（标签）
1136如果self.get_label（）为无：
1137提升值错误（“标签不应为无”）
设置标签中的~\Anaconda3\lib\site packages\lightgbm\basic.py（self，label）
1648 self.label=标签
1649如果self.handle不是None：
->1650 label=列表到菜单（\u label\u来自熊猫（label），name='label'）
1651自我设置字段（“标签”，标签）
1652 self.label=self.get_字段（'label'）#可以在cpp侧修改原始值
~\Anaconda3\lib\site packages\lightgbm\basic.py（数据、数据类型、名称）
88 elif isinstance（数据，系列）：
89如果得到坏的数据类型（[data.dtypes]）：
--->90提升值错误（'Series.dtypes必须为int、float或bool'）
91返回np.array（数据，dtype=dtype，copy=False）#还应支持SparseArray
92.其他：
ValueError:Series.d类型必须为int、float或bool

有人帮过你吗？如果没有：答案就在于转换变量

转到此链接：

LightGBM的创建者曾经遇到过同样的问题。在上面的链接中，他们（STRIKER）告诉您，您应该：使用astype（“category”）（pandas/scikit）转换变量，并将其标记为编码，因为您的功能列中需要一个INT！值，尤其是INT32

但是，标签编码和astype（“类别”）通常也应这样做：

另一个有用的链接是关于分类功能的高级文档：它们告诉您不能像处理数据中的对象（字符串）数据类型

如果您仍然对这个解释感到不舒服，下面是我从kaggle space_race_集合中提取的代码片段。如果您仍然有问题，请直接询问

cat_feats = ['Company Name', 'Night_and_Day', 'Rocket Type', 'Rocket Mission Type', 'State', 'Country']
labelencoder = LabelEncoder()

for col in cat_feats:
    train_df[col] = labelencoder.fit_transform(train_df[col])

for col in cat_feats:
    train_df[col] = train_df[col].astype('int')
    

y = train_df[["Status Mission"]]
X = train_df.drop(["Status Mission"], axis=1)

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state = 42)    

train_data = lgb.Dataset(X_train, 
                         label=y_train, 
                         categorical_feature=['Company Name', 'Night_and_Day', 'Rocket Type', 'Rocket Mission Type', 'State', 'Country'], 
                         free_raw_data=False)
test_data = lgb.Dataset(X_test, 
                        label=y_test, 
                        categorical_feature=['Company Name', 'Night_and_Day', 'Rocket Type', 'Rocket Mission Type', 'State', 'Country'], 
                        free_raw_data=False)

cat_feats = ['Company Name', 'Night_and_Day', 'Rocket Type', 'Rocket Mission Type', 'State', 'Country']
labelencoder = LabelEncoder()

for col in cat_feats:
    train_df[col] = labelencoder.fit_transform(train_df[col])

for col in cat_feats:
    train_df[col] = train_df[col].astype('int')
    

y = train_df[["Status Mission"]]
X = train_df.drop(["Status Mission"], axis=1)

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state = 42)    

train_data = lgb.Dataset(X_train, 
                         label=y_train, 
                         categorical_feature=['Company Name', 'Night_and_Day', 'Rocket Type', 'Rocket Mission Type', 'State', 'Country'], 
                         free_raw_data=False)
test_data = lgb.Dataset(X_test, 
                        label=y_test, 
                        categorical_feature=['Company Name', 'Night_and_Day', 'Rocket Type', 'Rocket Mission Type', 'State', 'Country'], 
                        free_raw_data=False)