Python ValueError:CountVectorizer（）的输入数组维度不正确_Python_Scikit Learn_Pipeline

Python ValueError:CountVectorizer（）的输入数组维度不正确

python scikit-learn

Python ValueError:CountVectorizer（）的输入数组维度不正确,python,scikit-learn,pipeline,Python,Scikit Learn,Pipeline,在sklearn管道中使用make_column_transformer（）时，我在尝试使用CountVectorizer时遇到错误我的数据框有两列，'desc-title'和'spchangehight'。以下是两行的片段： features = pd.DataFrame([["T. Rowe Price sells most of its Tesla shares", .002152], ["Gannett to retain all s

在sklearn管道中使用make_column_transformer（）时，我在尝试使用CountVectorizer时遇到错误

我的数据框有两列，

'desc-title'

和

'spchangehight'

。以下是两行的片段：

features = pd.DataFrame([["T. Rowe Price sells most of its Tesla shares", .002152],
                         ["Gannett to retain all seats in MNG proxy fight", 0.002152]],
                        columns=["desc-title", "SPchangeHigh"])

我能够毫无问题地运行以下管道：

preprocess = make_column_transformer(
    (StandardScaler(),['SPchangeHigh']),
    ( OneHotEncoder(),['desc-title'])
)
preprocess.fit_transform(features.head(2))

但是，当我用CountVectorizer（tokenizer=tokenize）替换OneHotEncoder（）时，它失败了：

preprocess = make_column_transformer(
    (StandardScaler(),['SPchangeHigh']),
    ( CountVectorizer(tokenizer=tokenize),['desc-title'])
)
preprocess.fit_transform(features.head(2))

我得到的错误是：

ValueError回溯（最近一次调用）
在（）
3（计数向量器（标记器=标记化），['desc-title']
4 )
---->5预处理。拟合_变换（特征。头部（2））
C:\anaconda3\lib\site packages\sklearn\compose\\u column\u transformer.py in fit\u transform（self，X，y）
488自验证输出（Xs）
489
-->490返回自我测试（列表（Xs））
491
492 def变换（自，X）：
C:\anaconda3\lib\site packages\sklearn\compose\\列\u transformer.py in_hstack（self，Xs）
545其他：
546 Xs=[f.toarray（）如果稀疏，则为f.issparse（f）否则为f表示Xs中的f]
-->547返回np.hstack（Xs）
548
549
C:\anaconda3\lib\site packages\numpy\core\shape\u base.py in hstack（tup）
338返回n串联（arrs，0）
339其他：
-->340返回串联（arrs，1）
341
342
ValueError：除连接轴之外的所有输入数组维度必须完全匹配

如果有人能帮助我，我将不胜感激。

删除“描述标题”周围的括号。您需要的是一维数组，而不是列向量

preprocess = make_column_transformer(
    (StandardScaler(),['SPchangeHigh']),
    ( CountVectorizer(),'desc-title')
)
preprocess.fit_transform(features.head(2))

将列选择器指定为“column”与一个简单的字符串）和['column']（作为一个包含一个元素的列表）是传递给转换器的阵列的形状。首先在这种情况下，将传递一维数组，而在第二种情况下它将是一个具有一列的二维数组，即一列载体

请注意，某些变压器需要一维输入（以下简称标签导向的）而其他一些，如OneHotEncoder或插补器，预期二维输入，形状[n_样本，n_特征]

你用什么做记号器？先生，你刚刚节省了我几个小时的调试时间！谢谢

preprocess = make_column_transformer(
    (StandardScaler(),['SPchangeHigh']),
    ( CountVectorizer(),'desc-title')
)
preprocess.fit_transform(features.head(2))