Python sklearn pipeline ValueError：除连接轴之外的所有输入数组维度必须完全匹配_Python_Numpy_Scikit Learn_Concatenation_Pipeline

Python sklearn pipeline ValueError：除连接轴之外的所有输入数组维度必须完全匹配

python numpy scikit-learn

Python sklearn pipeline ValueError：除连接轴之外的所有输入数组维度必须完全匹配,python,numpy,scikit-learn,concatenation,pipeline,Python,Numpy,Scikit Learn,Concatenation,Pipeline,我有一个sklearn管道，它提取了三种不同的功能 manual_feats = Pipeline([ ('FeatureUnion', FeatureUnion([ ('segmenting_pip1', Pipeline([ ('A_features', A_features()), ('segmentation', segmentation()) ])),

我有一个

sklearn

管道，它提取了三种不同的功能

manual_feats = Pipeline([
        ('FeatureUnion', FeatureUnion([
            ('segmenting_pip1', Pipeline([
                ('A_features', A_features()),
                ('segmentation', segmentation())
            ])),
            ('segmenting_pip2', Pipeline([
                ('B_features', B_features(),
                ('segmentation', segmentation())
            ])),
            ('segmenting_pip3', Pipeline([
                ('Z_features', Z_features()),
                ('segmentation', segmentation())
            ])),

        ])),
    ])

假设功能

和

都返回一个dim数组（#记录，10,20），而
Z
返回（#记录，10,15）
当我使用所有功能安装管道时，我会出现以下错误：

File "C:\Python35\lib\site-packages\sklearn\pipeline.py", line 451, in _transform Xt = transform.transform(Xt) File "C:\Python35\lib\site-packages\sklearn\pipeline.py", line 829, in transform Xs = np.hstack(Xs) File "C:\Python35\lib\site-packages\numpy\core\shape_base.py", line 340, in hstack return _nx.concatenate(arrs, 1) ValueError: all the input array dimensions except for the concatenation axis must match exactly
但是如果我排除feature
Z
管道工作，但是轴上应用的连接=1 dim（#of records，20,20）。我想得到的是一个（#of records，10，40）维数组，其中串联过程应用于
axis=2
如何在不编辑库源代码的情况下使用
管道
获取所需内容
编辑：
我提到，
A
和
B
的串联产生一个（#of records，10,40）DIM数组。这是不正确的；它生成一个DIM数组（记录的数量，20，20）。我将编辑这个问题。
我通过创建一个处理连接过程的转换器解决了这个问题

class append_split_3D(BaseEstimator, TransformerMixin): def __init__(self, segments_number=20, max_len=50, mode='append'): self.segments_number = segments_number self.max_len = max_len self.mode = mode self.appending_value = -5.123 def fit(self, X, y=None): return self def transform(self, data): if self.mode == 'append': self.max_len = self.max_len - data.shape[2] appending = np.full((data.shape[0], data.shape[1], self.max_len), self.appending_value) new = np.concatenate([data, appending], axis=2) return new elif self.mode == 'split': tmp = [] for item in range(0, data.shape[1], self.segments_number): tmp.append(data[:, item:(item + self.segments_number), :]) tmp = [item[item != self.appending_value].reshape(data.shape[0], self.segments_number, -1) for item in tmp] new = np.concatenate(tmp, axis=2) return new else: print('Error: Mode value is not defined') exit(1)
当整个管道变成这样时：

manual_feats = Pipeline([ ('FeatureUnion', FeatureUnion([ ('segmenting_pip1', Pipeline([ ('A_features', A_features()), ('segmentation', segmentation()), ('append', append_split_3D(max_len=50, mode='append')), ])), ('segmenting_pip2', Pipeline([ ('B_features', B_features(), ('segmentation', segmentation()) ('append', append_split_3D(max_len=50, mode='append')), ])), ('segmenting_pip3', Pipeline([ ('Z_features', Z_features()), ('segmentation', segmentation()) ('append', append_split_3D(max_len=50, mode='append')), ])), ])), ('split', append_split_3D(segments_number=10, mode='split')), ])
我在这个变压器中所做的如下：例如，我使用的功能
A
、
B
和
Z
返回以下数组：

A
：（#记录集，10,20）

B
：（#记录，10，20）

Z
：（#记录，10，15）

在
mode='append'
中，我附加了所有具有额外固定值的数组，最大长度值为
50
（作为示例），使其具有相同的
axis=2
dim，并允许函数
Xs=np.hstack（Xs）
工作
因此，管道将返回一个数组：
（#of records，30,50）
然后，在
模式=split'
中，我将其添加到管道的末尾，我将最终数组拆分为它们的附加形状：
（#of records，30，50）
到3个dim
（#of records，10，50）
然后删除额外的固定值，并在最后一个dim上应用串联

最终数组的dim是：
（#of records，10，55）
。55是数组第三维的串联（20+20+15），这正是我想要的。
查看
管道代码，尝试找出如何将输入转换为Xt ，然后转换为Xs 。它必须假设Xs 是一个2d数组的列表或数组，应该在最后一个轴上连接。第一个轴必须匹配，但它不匹配（出于某种原因）。谢谢@hpaulj，但我不想更改管道中的某些内容code>code，我可以使用Xs=np.hstack（Xs）更改行Xs=np.concatenate（Xs，axis=-1），它可以工作。我可以在那里更改连接过程的轴，但我希望代码中有一些内容。