Numpy 训练模型时发生不兼容行维度的值错误

Numpy 训练模型时发生不兼容行维度的值错误,numpy,machine-learning,scikit-learn,data-science,decision-tree,Numpy,Machine Learning,Scikit Learn,Data Science,Decision Tree,我在一个平台上实现了一个决策树。在此之前,我想用CountVectorizer转换特定列。为此,我使用管道使其更简单 但存在不兼容行维度的错误 代码 错误 --------------------------------------------------------------------------- ValueError回溯(最近一次调用上次) 在() ---->1.管道安装(x_系列、y_系列) 7帧 /bmat中的usr/local/lib/python3.6/dist-package

我在一个平台上实现了一个决策树。在此之前,我想用CountVectorizer转换特定列。为此,我使用管道使其更简单

但存在不兼容行维度的错误

代码 错误
---------------------------------------------------------------------------
ValueError回溯(最近一次调用上次)
在()
---->1.管道安装(x_系列、y_系列)
7帧
/bmat中的usr/local/lib/python3.6/dist-packages/scipy/sparse/construct.py(块、格式、数据类型)
584 exp=眉毛长度[i],
585 got=A.shape[0]))
-->586提升值错误(msg)
587
588如果bcol_长度[j]==0:
ValueError:块[0,:]的行维度不兼容。已获取块[0,1]。形状[0]==2205,应为1。

问题
  • 不兼容行维度的错误是如何形成的
  • 如何解决

  • 尝试将所需列传递给ohe as list,同时将简单字符串传递给cv

    from sklearn.feature_extraction.text import CountVectorizer as cv
    from sklearn.preprocessing import OneHotEncoder as ohe
    from sklearn.compose import ColumnTransformer as ct
    from sklearn.pipeline import make_pipeline as mp
    from sklearn.tree import DecisionTreeClassifier as dtc
    
    data = pd.DataFrame({'rating':np.random.randint(0,10,6),'variation':['a','b','c','a','b','c'],
                       'verified_reviews':['adnf asd','sdf dsa','das j s','asd jd s','sad jds a','sajd'],
                       'feedback':np.random.randint(0,2,6)})
    
    transformer=ct(transformers=[('review_counts',cv(),'verified_reviews'),
                                 ('variation_dummies', ohe(),['variation'])],
                   remainder='passthrough')
    
    pipe= mp(transformer, dtc(random_state=42))
    
    x= data[['rating','variation','verified_reviews']].copy()
    y= data.feedback
    
    pipe.fit(x,y)
    
    根据,每当转换器需要1D数组作为输入时,列都被指定为字符串(“xxx”)。对于需要2D数据的转换器,我们需要将列指定为字符串列表([“xxx”])

    ---------------------------------------------------------------------------
    ValueError                                Traceback (most recent call last)
    <ipython-input-79-a981c354b190> in <module>()
    ----> 1 pipe.fit(x_train,y_train)
    
    7 frames
    /usr/local/lib/python3.6/dist-packages/scipy/sparse/construct.py in bmat(blocks, format, dtype)
        584                                                     exp=brow_lengths[i],
        585                                                     got=A.shape[0]))
    --> 586                     raise ValueError(msg)
        587 
        588                 if bcol_lengths[j] == 0:
    
    ValueError: blocks[0,:] has incompatible row dimensions. Got blocks[0,1].shape[0] == 2205, expected 1.
    
    from sklearn.feature_extraction.text import CountVectorizer as cv
    from sklearn.preprocessing import OneHotEncoder as ohe
    from sklearn.compose import ColumnTransformer as ct
    from sklearn.pipeline import make_pipeline as mp
    from sklearn.tree import DecisionTreeClassifier as dtc
    
    data = pd.DataFrame({'rating':np.random.randint(0,10,6),'variation':['a','b','c','a','b','c'],
                       'verified_reviews':['adnf asd','sdf dsa','das j s','asd jd s','sad jds a','sajd'],
                       'feedback':np.random.randint(0,2,6)})
    
    transformer=ct(transformers=[('review_counts',cv(),'verified_reviews'),
                                 ('variation_dummies', ohe(),['variation'])],
                   remainder='passthrough')
    
    pipe= mp(transformer, dtc(random_state=42))
    
    x= data[['rating','variation','verified_reviews']].copy()
    y= data.feedback
    
    pipe.fit(x,y)