Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/361.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python Sklearn Pipelines:值错误-预期功能数_Python_Machine Learning_Scikit Learn_Pipeline_Feature Selection - Fatal编程技术网

Python Sklearn Pipelines:值错误-预期功能数

Python Sklearn Pipelines:值错误-预期功能数,python,machine-learning,scikit-learn,pipeline,feature-selection,Python,Machine Learning,Scikit Learn,Pipeline,Feature Selection,我创建了一个管道,它基本上在模型和定标器上循环,并执行递归特征消除(RFE),如下所示: def train_models(models, scalers, X_train, y_train, X_val, y_val): best_results = {'f1_score': 0} for model in models: for scaler in scalers: for n_features in list(range( len(

我创建了一个管道,它基本上在模型和定标器上循环,并执行递归特征消除(RFE),如下所示:

def train_models(models, scalers, X_train, y_train, X_val, y_val):
  best_results = {'f1_score': 0}

  for model in models:
    for scaler in scalers:
        for n_features in list(range(
            len(X_train.columns), 
            int(len(X_train.columns)/2), 
            -10
        )):
            rfe = RFE(
                estimator=model, 
                n_features_to_select=n_features, 
                step=10
            )
            
            pipe = Pipeline([
                ('scaler', scaler), 
                ('selector', rfe),
                ('model', model)
            ])

            pipe.fit(X_train, y_train)
            
            y_pred = pipe.predict(X_val)
            results = evaluate(y_val, y_pred) #Returns a dictionary of values
            results['pipeline'] = pipe
            results['y_pred'] = y_pred
            
            if results['f1_score'] > best_results['f1_score']:
                best_results = results
                print("Best F1: {}".format(best_results['f1_score']))
        
  return best_results
管道在函数内部工作良好,能够正确预测和评分结果

但是,当我在函数外部调用pipeline.predict()时,例如

best_result = train_models(models, scalers, X_train, y_train, X_val, y_val)
pipeline = best_result['pipeline']
pipeline.predict(X_val)
我得到以下错误:

以下是
管道
的外观:

Pipeline(steps=[('scaler', StandardScaler()),
                ('selector',
                 RFE(estimator=LogisticRegression(C=1, max_iter=1000,
                                                  penalty='l1',
                                                  solver='liblinear'),
                     n_features_to_select=78, step=10)),
                ('model',
                 LogisticRegression(C=1, max_iter=1000, penalty='l1',
                                    solver='liblinear'))])
我猜管道中的
模型
期望的是48个功能,而不是78个,但我不明白数字48是从哪里来的,因为
n\u features\u to\u select
在上一个RFE步骤中设置为78


任何帮助都将不胜感激

我没有你的数据。但是,根据您共享的信息进行一些计算和猜测,48似乎是嵌套循环尝试的最后一个
n_功能。这使我怀疑罪犯是个肤浅的复制品。我建议您更改以下内容:

    pipe = Pipeline([
        ('scaler', scaler), 
        ('selector', rfe),
        ('model', model)
    ])

然后再试一次(当然,在第一次执行
导入拷贝之后)

    pipe = Pipeline([
        ('scaler', scaler), 
        ('selector', rfe),
        ('model', copy.deepcopy(model))
    ])