Python 3.x 管道中的fit与fit_变换_Python 3.x_Scikit Learn_Pipeline

Python 3.x 管道中的fit与fit_变换

python-3.x scikit-learn

Python 3.x 管道中的fit与fit_变换,python-3.x,scikit-learn,pipeline,Python 3.x,Scikit Learn,Pipeline,在本页中它调用fit\u transfrom来转换数据，如下所示： from sklearn.pipeline import FeatureUnion feats = FeatureUnion([('text', text), ('length', length), ('words', words), ('words_not_stopword', words_

在本页中

它调用

fit\u transfrom

来转换数据，如下所示：

from sklearn.pipeline import FeatureUnion

feats = FeatureUnion([('text', text), 
                      ('length', length),
                      ('words', words),
                      ('words_not_stopword', words_not_stopword),
                      ('avg_word_length', avg_word_length),
                      ('commas', commas)])

feature_processing = Pipeline([('feats', feats)])
feature_processing.fit_transform(X_train)

而在特征处理训练期间，它只使用

fit

然后

predict

from sklearn.ensemble import RandomForestClassifier

pipeline = Pipeline([
    ('features',feats),
    ('classifier', RandomForestClassifier(random_state = 42)),
])

pipeline.fit(X_train, y_train)

preds = pipeline.predict(X_test)
np.mean(preds == y_test)

问题是，在第二种情况下，

fit

是否在

X\u列车上进行转换（正如transform
所实现的那样，因为我们在这里不调用fit\u transform
）呢？
sklearn pipeline
有一些很好的特性。它以非常干净的方式执行多个任务。我们定义了我们的功能
、它的转换
和分类器列表
，我们希望在一个功能中执行这些功能
在这个过程的第一步
pipeline = Pipeline([
    ('features',feats),
    ('classifier', RandomForestClassifier(random_state = 42)),
])

您已经定义了特征的名称及其转换函数（包含在feat
中），在第二步中，您已经定义了分类器的名称和分类器
现在，在调用pipeline.fit
时，它首先拟合特征并对其进行转换，然后在转换后的特征上拟合分类器。所以，它为我们做了一些步骤。你能做的更多