Numpy 训练模型时发生不兼容行维度的值错误
我在一个平台上实现了一个决策树。在此之前,我想用CountVectorizer转换特定列。为此,我使用管道使其更简单 但存在不兼容行维度的错误 代码 错误Numpy 训练模型时发生不兼容行维度的值错误,numpy,machine-learning,scikit-learn,data-science,decision-tree,Numpy,Machine Learning,Scikit Learn,Data Science,Decision Tree,我在一个平台上实现了一个决策树。在此之前,我想用CountVectorizer转换特定列。为此,我使用管道使其更简单 但存在不兼容行维度的错误 代码 错误 --------------------------------------------------------------------------- ValueError回溯(最近一次调用上次) 在() ---->1.管道安装(x_系列、y_系列) 7帧 /bmat中的usr/local/lib/python3.6/dist-package
---------------------------------------------------------------------------
ValueError回溯(最近一次调用上次)
在()
---->1.管道安装(x_系列、y_系列)
7帧
/bmat中的usr/local/lib/python3.6/dist-packages/scipy/sparse/construct.py(块、格式、数据类型)
584 exp=眉毛长度[i],
585 got=A.shape[0]))
-->586提升值错误(msg)
587
588如果bcol_长度[j]==0:
ValueError:块[0,:]的行维度不兼容。已获取块[0,1]。形状[0]==2205,应为1。
问题
尝试将所需列传递给ohe as list,同时将简单字符串传递给cv
from sklearn.feature_extraction.text import CountVectorizer as cv
from sklearn.preprocessing import OneHotEncoder as ohe
from sklearn.compose import ColumnTransformer as ct
from sklearn.pipeline import make_pipeline as mp
from sklearn.tree import DecisionTreeClassifier as dtc
data = pd.DataFrame({'rating':np.random.randint(0,10,6),'variation':['a','b','c','a','b','c'],
'verified_reviews':['adnf asd','sdf dsa','das j s','asd jd s','sad jds a','sajd'],
'feedback':np.random.randint(0,2,6)})
transformer=ct(transformers=[('review_counts',cv(),'verified_reviews'),
('variation_dummies', ohe(),['variation'])],
remainder='passthrough')
pipe= mp(transformer, dtc(random_state=42))
x= data[['rating','variation','verified_reviews']].copy()
y= data.feedback
pipe.fit(x,y)
根据,每当转换器需要1D数组作为输入时,列都被指定为字符串(“xxx”)。对于需要2D数据的转换器,我们需要将列指定为字符串列表([“xxx”])
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-79-a981c354b190> in <module>()
----> 1 pipe.fit(x_train,y_train)
7 frames
/usr/local/lib/python3.6/dist-packages/scipy/sparse/construct.py in bmat(blocks, format, dtype)
584 exp=brow_lengths[i],
585 got=A.shape[0]))
--> 586 raise ValueError(msg)
587
588 if bcol_lengths[j] == 0:
ValueError: blocks[0,:] has incompatible row dimensions. Got blocks[0,1].shape[0] == 2205, expected 1.
from sklearn.feature_extraction.text import CountVectorizer as cv
from sklearn.preprocessing import OneHotEncoder as ohe
from sklearn.compose import ColumnTransformer as ct
from sklearn.pipeline import make_pipeline as mp
from sklearn.tree import DecisionTreeClassifier as dtc
data = pd.DataFrame({'rating':np.random.randint(0,10,6),'variation':['a','b','c','a','b','c'],
'verified_reviews':['adnf asd','sdf dsa','das j s','asd jd s','sad jds a','sajd'],
'feedback':np.random.randint(0,2,6)})
transformer=ct(transformers=[('review_counts',cv(),'verified_reviews'),
('variation_dummies', ohe(),['variation'])],
remainder='passthrough')
pipe= mp(transformer, dtc(random_state=42))
x= data[['rating','variation','verified_reviews']].copy()
y= data.feedback
pipe.fit(x,y)