Python 如何将TfIdfvectorizer与其余列组合

Python 如何将TfIdfvectorizer与其余列组合,python,pipeline,heterogeneous,Python,Pipeline,Heterogeneous,我试图在Python中的一列上运行Tf Idf,并希望将输出与数据帧中的其余列相结合,以便将其提供给分类器。我对异构数据使用了功能联合,但由于某些原因,我不断地出错。我正在使用以下代码: pipecols1=[col for col in dftrf.columns if col!='Name_x'] pipecols2=['Name_x'] class MySelector(BaseEstimator, TransformerMixin): def __init__(self, key)

我试图在Python中的一列上运行Tf Idf,并希望将输出与数据帧中的其余列相结合,以便将其提供给分类器。我对异构数据使用了功能联合,但由于某些原因,我不断地出错。我正在使用以下代码:

pipecols1=[col for col in dftrf.columns if col!='Name_x']
pipecols2=['Name_x']

class MySelector(BaseEstimator, TransformerMixin):
  def __init__(self, key):
    self.key = key

  def fit(self, x, y=None):
    return self

  def transform(self, data_dict):
    return data_dict[self.key]

var=  Pipeline([
                ('var', MySelector(key=pipecols1))])

text= Pipeline([
                ('text', MySelector(key=pipecols2) ),
                ('tfidf', TfidfVectorizer())])

feats = FeatureUnion(transformer_list=[('var',var),
                      ('text',text)],transformer_weights= 
{'var':1,'text':1})

feature_processing = Pipeline([('feats', feats)])

feature_processing.fit(x,y)
我不断得到以下错误:

ValueError                                Traceback (most recent call
last)
<ipython-input-61-b17725dbe418> in <module>
----> 1 feature_processing.fit_transform(dftrf)

~/.conda/envs/test_py3/lib/python3.6/site-packages/sklearn/pipeline.py in 
fit_transform(self, X, y, **fit_params)
    298         Xt, fit_params = self._fit(X, y, **fit_params)
    299         if hasattr(last_step, 'fit_transform'):
--> 300             return last_step.fit_transform(Xt, y, **fit_params)
    301         elif last_step is None:
    302             return Xt

~/.conda/envs/test_py3/lib/python3.6/site-packages/sklearn/pipeline.py in 
fit_transform(self, X, y, **fit_params)
    799         self._update_transformer_list(transformers)
    800         if any(sparse.issparse(f) for f in Xs):
--> 801             Xs = sparse.hstack(Xs).tocsr()
    802         else:
    803             Xs = np.hstack(Xs)

~/.local/lib/python3.6/site-packages/scipy/sparse/construct.py in 
hstack(blocks, format, dtype)
    463 
    464     """
--> 465     return bmat([blocks], format=format, dtype=dtype)
    466 
    467 

~/.local/lib/python3.6/site-packages/scipy/sparse/construct.py in 
bmat(blocks, format, dtype)
    584                                                     
 exp=brow_lengths[i],
    585                                                     
got=A.shape[0]))
--> 586                     raise ValueError(msg)
    587 
    588                 if bcol_lengths[j] == 0:

ValueError: blocks[0,:] has incompatible row dimensions. Got 
blocks[0,1].shape[0] == 1, expected 999000.
ValueError回溯(最近的调用)
最后)
在里面
---->1特征处理。拟合变换(dftrf)
~/.conda/envs/test_py3/lib/python3.6/site-packages/sklearn/pipeline.py in
拟合变换(自、X、y、**拟合参数)
298 Xt,拟合参数=自拟合(X,y,**拟合参数)
299如果hasattr(最后一步“拟合变换”):
-->300返回最后一步。拟合变换(Xt,y,**拟合参数)
301如果最后一步为无:
302返回文本
~/.conda/envs/test_py3/lib/python3.6/site-packages/sklearn/pipeline.py in
拟合变换(自、X、y、**拟合参数)
799自我更新变压器列表(变压器)
800如果有(对于Xs中的f,稀疏.issparse(f)):
-->801 Xs=sparse.hstack(Xs.tocsr())
802其他:
803xs=np.hstack(Xs)
中的~/.local/lib/python3.6/site-packages/scipy/sparse/construct.py
hstack(块、格式、数据类型)
463
464     """
-->465返回bmat([blocks],format=format,dtype=dtype)
466
467
中的~/.local/lib/python3.6/site-packages/scipy/sparse/construct.py
bmat(块、格式、数据类型)
584
exp=眉毛长度[i],
585
got=A.shape[0]))
-->586提升值错误(msg)
587
588如果bcol_长度[j]==0:
ValueError:块[0,:]具有不兼容的行维度。获取
块[0,1]。形状[0]==1,应为999000。
pipecols2是我的文本列 pipecols1是我想要在不进行转换的情况下组合的列

有什么想法吗