Python 使用LightGBM的sklearn feature_选择器中出现float32值错误
我一直在犯这个错误,这让我发疯。它肯定是由特定的变量引起的,因为如果我将列表子集到另一组列,它将成功运行。。。但我不明白为什么 我这里没有一个很好的模拟,但希望有人能提出一种方法来测试,或者看看我是否能识别问题 我将一个名为Python 使用LightGBM的sklearn feature_选择器中出现float32值错误,python,scikit-learn,feature-selection,Python,Scikit Learn,Feature Selection,我一直在犯这个错误,这让我发疯。它肯定是由特定的变量引起的,因为如果我将列表子集到另一组列,它将成功运行。。。但我不明白为什么 我这里没有一个很好的模拟,但希望有人能提出一种方法来测试,或者看看我是否能识别问题 我将一个名为clean的数据帧传递给一个函数,该函数将数据帧拆分为train和test,并使用lightGBM执行RCECV 错误指向内部功能特性_selector.fit 我可以在执行之前打印出df.dtypes,并证明所有列都是float32 错误:ValueError:输入包含Na
clean
的数据帧传递给一个函数,该函数将数据帧拆分为train和test,并使用lightGBM执行RCECV
错误指向内部功能特性_selector.fit
我可以在执行之前打印出df.dtypes
,并证明所有列都是float32
错误:ValueError:输入包含NaN、无穷大或对数据类型(“float32”)太大的值。
回溯:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-56-81e642e89134> in <module>
----> 1 export = add_features(clean, TRAINING_FLAG)
<ipython-input-54-8d69f26acf60> in add_features(df, TRAINING_FLAG)
255
256 while True: # e.g. "loop forever"
--> 257 reduced_partial_feats = reduce_feats(df_new, partial_list,[t])
258
259 if len(partial_list) <= ceil(len(reduced_partial_feats) + (0.02 * partial_feat_count)):
<ipython-input-51-414034e1b855> in reduce_feats(df, inlist, target)
210 lgb.LGBMClassifier(**params), step=step_size, scoring="roc_auc", cv=CROSSFOLDS, verbose=1
211 )
--> 212 feature_selector.fit(x_train, y_train.values.ravel())
213
214 selected_features = [f for f in x_train.columns[feature_selector.ranking_ == 1]]
/opt/conda/envs/py3/lib/python3.6/site-packages/sklearn/feature_selection/rfe.py in fit(self, X, y, groups)
479 train/test set.
480 """
--> 481 X, y = check_X_y(X, y, "csr", ensure_min_features=2)
482
483 # Initialization
/opt/conda/envs/py3/lib/python3.6/site-packages/sklearn/utils/validation.py in check_X_y(X, y, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric, warn_on_dtype, estimator)
717 ensure_min_features=ensure_min_features,
718 warn_on_dtype=warn_on_dtype,
--> 719 estimator=estimator)
720 if multi_output:
721 y = check_array(y, 'csr', force_all_finite=True, ensure_2d=False,
/opt/conda/envs/py3/lib/python3.6/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
540 if force_all_finite:
541 _assert_all_finite(array,
--> 542 allow_nan=force_all_finite == 'allow-nan')
543
544 if ensure_min_samples > 0:
/opt/conda/envs/py3/lib/python3.6/site-packages/sklearn/utils/validation.py in _assert_all_finite(X, allow_nan)
54 not allow_nan and not np.isfinite(X).all()):
55 type_err = 'infinity' if allow_nan else 'NaN, infinity'
---> 56 raise ValueError(msg_err.format(type_err, X.dtype))
57 # for object dtype data, we only check for NaNs (GH-13254)
58 elif X.dtype == np.dtype('object') and not allow_nan:
ValueError: Input contains NaN, infinity or a value too large for dtype('float32').
def reduce_feats(df, inlist, target):
temp = df[inlist + target].copy()
y = temp[target].iloc[:,0].copy()
x = temp.drop(target, axis=1).fillna(0)
step_size = ceil(len(x.columns)/STEPS_PER_FOLD)
x_train, x_valid, y_train, y_valid = train_test_split(
x, y, test_size=TEST_SIZE, random_state=RANDOM_SEED
)
print(x_train.dtypes) # THIS SHOWS THAT EVERYTHING IS FLOAT32!
params = {
"objective": "binary",
"metric": "auc",
"boosting_type": 'gbdt',
"is_unbalance": IS_UNBALANCE,
"boost_from_average": True,
"n_estimators": 100,
"num_threads": -1,
"num_leaves": 200,
"min_data_in_leaf": 25,
"max_depth": -1,
"learning_rate": 0.1,
"step": step_size
}
feature_selector = RFECV(
lgb.LGBMClassifier(**params), step=step_size, scoring="roc_auc", cv=CROSSFOLDS, verbose=1
)
feature_selector.fit(x_train, y_train.values.ravel())
selected_features = [f for f in x_train.columns[feature_selector.ranking_ == 1]]
return selected_features