Scikit learn GridSearchCV():ValueError:输入包含NaN、无穷大或对数据类型(';float64';)太大的值

Scikit learn GridSearchCV():ValueError:输入包含NaN、无穷大或对数据类型(';float64';)太大的值,scikit-learn,Scikit Learn,当我尝试在MLP分类器上执行GridsearchCV时,我在标题中得到ValueError。当然,我检查了我的数据集中是否存在任何np.inf或np.nan,但它们不存在: print(np.any(np.isnan(X))) 返回False print(np.all(np.isfinite(X))) 返回True 我还将我所有的价值观铸造到np.64 X = X.values.astype(np.float64) Y = Y.values 我的scikit学习版是0.22

当我尝试在MLP分类器上执行GridsearchCV时,我在标题中得到ValueError。当然,我检查了我的数据集中是否存在任何np.inf或np.nan,但它们不存在:

    print(np.any(np.isnan(X)))
返回False

    print(np.all(np.isfinite(X)))
返回True

我还将我所有的价值观铸造到np.64

X = X.values.astype(np.float64)
Y = Y.values
我的scikit学习版是0.22.2.post1(最新版本)

我试图执行的代码:

from scipy.stats import randint as sp_randint

hiddenlayers = [(sp_randint.rvs(100,600,1),sp_randint.rvs(100,600,1),), (sp_randint.rvs(100,600,1),)]
alpha_range = 10.0 ** np.arange(-2, 1)


param_grid_MLP = [{'solver': ['lbfgs'],
                   'hidden_layer_sizes': hiddenlayers,
                   'activation': ['identity','tanh', 'relu', 'logistic'],
                   'alpha': alpha_range
                  },
                 {'solver': ['sgd'],
                  'hidden_layer_sizes': hiddenlayers,
                   'activation': ['identity','tanh', 'relu', 'logistic'],
                   'alpha': alpha_range,
                  'learning_rate':['constant','invscaling','adaptive']
                  },
                 {'solver': ['adam'],
                  'hidden_layer_sizes': hiddenlayers,
                   'activation': ['identity','tanh', 'relu', 'logistic'],
                   'alpha': alpha_range
                  }]

mlp = MLPClassifier(random_state=0)
cross_validation = StratifiedKFold(5)

# scoring = {'AUC': 'roc_auc', 
#            'Accuracy': make_scorer(accuracy_score),
#            'Recall':make_scorer(recall_score,pos_label='crafted'),
#            'Precision': make_scorer(precision_score,pos_label='crafted')}

scoring = {'AUC': 'roc_auc', 
           'Accuracy': make_scorer(accuracy_score),
            'Recall':make_scorer(recall_score,pos_label='crafted')}

grid_search_MLP = GridSearchCV(estimator=mlp, 
            param_grid=param_grid_MLP, 
            scoring=scoring,cv=cross_validation.split(X_train,y_train),
            refit='Recall',
            n_jobs=-1,
            verbose=True)

grid_search_MLP.fit(X_train,y_train)

print('Best score: {}'.format(grid_search_MLP.best_score_))
print('Best index: {}'.format(grid_search_MLP.best_index_))
print('Best parameters: {}'.format(grid_search_MLP.best_params_))

mlp = grid_search_MLP.best_estimator_
mlp
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/joblib/externals/loky/process_executor.py", line 418, in _process_worker
    r = call_item()
  File "/usr/local/lib/python3.7/dist-packages/joblib/externals/loky/process_executor.py", line 272, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "/usr/local/lib/python3.7/dist-packages/joblib/_parallel_backends.py", line 608, in __call__
    return self.func(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/joblib/parallel.py", line 256, in __call__
    for func, args, kwargs in self.items]
  File "/usr/local/lib/python3.7/dist-packages/joblib/parallel.py", line 256, in <listcomp>
    for func, args, kwargs in self.items]
  File "/usr/local/lib/python3.7/dist-packages/sklearn/model_selection/_validation.py", line 544, in _fit_and_score
    test_scores = _score(estimator, X_test, y_test, scorer)
  File "/usr/local/lib/python3.7/dist-packages/sklearn/model_selection/_validation.py", line 591, in _score
    scores = scorer(estimator, X_test, y_test)
  File "/usr/local/lib/python3.7/dist-packages/sklearn/metrics/_scorer.py", line 87, in __call__
    *args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/sklearn/metrics/_scorer.py", line 332, in _score
    return self._sign * self._score_func(y, y_pred, **self._kwargs)
  File "/usr/local/lib/python3.7/dist-packages/sklearn/metrics/_ranking.py", line 369, in roc_auc_score
    y_score = check_array(y_score, ensure_2d=False)
  File "/usr/local/lib/python3.7/dist-packages/sklearn/utils/validation.py", line 578, in check_array
    allow_nan=force_all_finite == 'allow-nan')
  File "/usr/local/lib/python3.7/dist-packages/sklearn/utils/validation.py", line 60, in _assert_all_finite
    msg_dtype if msg_dtype is not None else X.dtype)
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
完整的错误回溯:

from scipy.stats import randint as sp_randint

hiddenlayers = [(sp_randint.rvs(100,600,1),sp_randint.rvs(100,600,1),), (sp_randint.rvs(100,600,1),)]
alpha_range = 10.0 ** np.arange(-2, 1)


param_grid_MLP = [{'solver': ['lbfgs'],
                   'hidden_layer_sizes': hiddenlayers,
                   'activation': ['identity','tanh', 'relu', 'logistic'],
                   'alpha': alpha_range
                  },
                 {'solver': ['sgd'],
                  'hidden_layer_sizes': hiddenlayers,
                   'activation': ['identity','tanh', 'relu', 'logistic'],
                   'alpha': alpha_range,
                  'learning_rate':['constant','invscaling','adaptive']
                  },
                 {'solver': ['adam'],
                  'hidden_layer_sizes': hiddenlayers,
                   'activation': ['identity','tanh', 'relu', 'logistic'],
                   'alpha': alpha_range
                  }]

mlp = MLPClassifier(random_state=0)
cross_validation = StratifiedKFold(5)

# scoring = {'AUC': 'roc_auc', 
#            'Accuracy': make_scorer(accuracy_score),
#            'Recall':make_scorer(recall_score,pos_label='crafted'),
#            'Precision': make_scorer(precision_score,pos_label='crafted')}

scoring = {'AUC': 'roc_auc', 
           'Accuracy': make_scorer(accuracy_score),
            'Recall':make_scorer(recall_score,pos_label='crafted')}

grid_search_MLP = GridSearchCV(estimator=mlp, 
            param_grid=param_grid_MLP, 
            scoring=scoring,cv=cross_validation.split(X_train,y_train),
            refit='Recall',
            n_jobs=-1,
            verbose=True)

grid_search_MLP.fit(X_train,y_train)

print('Best score: {}'.format(grid_search_MLP.best_score_))
print('Best index: {}'.format(grid_search_MLP.best_index_))
print('Best parameters: {}'.format(grid_search_MLP.best_params_))

mlp = grid_search_MLP.best_estimator_
mlp
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/joblib/externals/loky/process_executor.py", line 418, in _process_worker
    r = call_item()
  File "/usr/local/lib/python3.7/dist-packages/joblib/externals/loky/process_executor.py", line 272, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "/usr/local/lib/python3.7/dist-packages/joblib/_parallel_backends.py", line 608, in __call__
    return self.func(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/joblib/parallel.py", line 256, in __call__
    for func, args, kwargs in self.items]
  File "/usr/local/lib/python3.7/dist-packages/joblib/parallel.py", line 256, in <listcomp>
    for func, args, kwargs in self.items]
  File "/usr/local/lib/python3.7/dist-packages/sklearn/model_selection/_validation.py", line 544, in _fit_and_score
    test_scores = _score(estimator, X_test, y_test, scorer)
  File "/usr/local/lib/python3.7/dist-packages/sklearn/model_selection/_validation.py", line 591, in _score
    scores = scorer(estimator, X_test, y_test)
  File "/usr/local/lib/python3.7/dist-packages/sklearn/metrics/_scorer.py", line 87, in __call__
    *args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/sklearn/metrics/_scorer.py", line 332, in _score
    return self._sign * self._score_func(y, y_pred, **self._kwargs)
  File "/usr/local/lib/python3.7/dist-packages/sklearn/metrics/_ranking.py", line 369, in roc_auc_score
    y_score = check_array(y_score, ensure_2d=False)
  File "/usr/local/lib/python3.7/dist-packages/sklearn/utils/validation.py", line 578, in check_array
    allow_nan=force_all_finite == 'allow-nan')
  File "/usr/local/lib/python3.7/dist-packages/sklearn/utils/validation.py", line 60, in _assert_all_finite
    msg_dtype if msg_dtype is not None else X.dtype)
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
回溯(最近一次呼叫最后一次):
文件“/usr/local/lib/python3.7/dist packages/joblib/externals/loky/process\u executor.py”,第418行,in\u process\u worker
r=调用_项()
文件“/usr/local/lib/python3.7/dist packages/joblib/externals/loky/process_executor.py”,第272行,在调用中__
返回self.fn(*self.args,**self.kwargs)
文件“/usr/local/lib/python3.7/dist packages/joblb/_parallel_backends.py”,第608行,在调用中__
返回self.func(*args,**kwargs)
文件“/usr/local/lib/python3.7/dist-packages/joblib/parallel.py”,第256行,在调用中__
对于self.items中的func、args、kwargs]
文件“/usr/local/lib/python3.7/dist-packages/joblib/parallel.py”,第256行,在
对于self.items中的func、args、kwargs]
文件“/usr/local/lib/python3.7/dist-packages/sklearn/model\u-selection/\u-validation.py”,第544行,在“fit”和“score”中
测试分数=_分数(估计员、X_测试、y_测试、计分员)
文件“/usr/local/lib/python3.7/dist-packages/sklearn/model\u-selection/\u-validation.py”,第591行,in\u-score
分数=记分员(估计员、X_检验、y_检验)
文件“/usr/local/lib/python3.7/dist packages/sklearn/metrics/_scorer.py”,第87行,在调用中__
*args,**kwargs)
文件“/usr/local/lib/python3.7/dist packages/sklearn/metrics/_scorer.py”,第332行,in_score
返回self.\u签名*self.\u分数\u函数(y,y\u pred,**self.\u kwargs)
文件“/usr/local/lib/python3.7/dist packages/sklearn/metrics/_ranking.py”,第369行,roc_auc_分数
y_分数=检查数组(y_分数,确保2d=假)
文件“/usr/local/lib/python3.7/dist-packages/sklearn/utils/validation.py”,第578行,在check_数组中
allow_nan=force_all_finite==‘allow nan’)
文件“/usr/local/lib/python3.7/dist-packages/sklearn/utils/validation.py”,第60行,在assert\u all\u finite中
msg\u dtype(如果msg\u dtype不是None else X.dtype)
ValueError:输入包含NaN、无穷大或对数据类型('float64')太大的值。

在我看来,数组中的值可能已损坏,或者是非数值。在转换为float之前,尝试检查数组中是否有其他类型。还要尝试在数组中查找最小值和最大值,这可能有助于查找引发错误的值

在我看来,数组中的值可能已损坏,或者是非数值。在转换为float之前,尝试检查数组中是否有其他类型。还要尝试在数组中查找最小值和最大值,这可能有助于查找引发错误的值

试着给verbose一个大数字,或者逐个运行网格的3部分。如果您意识到sgd给出了问题,这里可能解释了这个问题

试着给verbose一个大数字,或者逐个运行网格的3个部分。如果您意识到
sgd
给出了问题,这里可能会解释它

您从哪里获得阵列?csv文件?你是如何建造它的?你能把它打印出来吗?你从哪里得到你的阵列?csv文件?你是如何建造它的?你能把它打印出来吗?