Python 2.7 Countvectorizer scikit learn中的TypeError:应为字符串或缓冲区

Python 2.7 Countvectorizer scikit learn中的TypeError:应为字符串或缓冲区,python-2.7,pandas,dataframe,scikit-learn,text-classification,Python 2.7,Pandas,Dataframe,Scikit Learn,Text Classification,我试图解决一个分类问题。当我将文本馈送到CountVectorizer时,它会给出错误: 应为字符串或缓冲区 我的数据集有什么问题吗?因为它包含数字和单词的混合消息,甚至消息中也包含特殊字符 消息的外观示例如下所示: 0 I have not received my gifts which I ordered ok 1 hth her wells idyll McGill kooky bbc.co 2

我试图解决一个分类问题。当我将文本馈送到CountVectorizer时,它会给出错误:

应为字符串或缓冲区

我的数据集有什么问题吗?因为它包含数字和单词的混合消息,甚至消息中也包含特殊字符

消息的外观示例如下所示:

0         I have not received my gifts which I ordered ok
1                 hth her wells idyll McGill kooky bbc.co
2                                   test test test 1 test
3                                                    test
4                         hello where is my reward points
5       hi, can you get koovs coupons or vouchers here...
下面是我用来分类的代码:

import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
df = pd.read_excel('training_data.xlsx')
X_train = df.message
print X_train.shape
map_class_label = {'checkin':0, 'greeting':1,'more reward options':2,'noclass':3, 'other':4,'points':5,
                           'referral points':6,'snapbill':7, 'thanks':8,'voucher not working':9,'voucher':10}
df['label_num'] = df['Final Category'].map(map_class_label)
y_train = df.label_num
vectorizer = CountVectorizer(lowercase=False,decode_error='ignore')
X_train_dtm = vectorizer.fit_transform(X_train)

您需要通过将列
消息
转换为
字符串
,因为数据中有一些数值:

df = pd.read_excel('training_data.xlsx')
df['message'] = df['message'].values.astype('unicode')
...
...

通过只传递一个字符串,我得到了相同的错误,如下所示:

cv.fit_transform('Making my way down,')
cv.fit_transform(['Making my way down,', ])
相反,您必须传递带有字符串的列表,如下所示:

cv.fit_transform('Making my way down,')
cv.fit_transform(['Making my way down,', ])

@jezrael Final Category是与每条消息相对应的类标签(文本数据),我通过映射到label_num列将其更改为数值。它并没有在我没有展示的数据集中丢失。当我尝试使用countvectorizer拟合和转换消息时出现问题。我的解决方案是否有效?由于UnicodeEncodeError错误,无法转换。我还尝试了df.message.apply(str).Hmmm,我有一个想法-可能在Excel列中设置
message
to string?比您要多。