Python CountVectorizer错误：没有这样的文件或目录_Python_Dataframe_Scikit Learn

Python CountVectorizer错误：没有这样的文件或目录

python dataframe scikit-learn

Python CountVectorizer错误：没有这样的文件或目录,python,dataframe,scikit-learn,Python,Dataframe,Scikit Learn,我试图在文档上使用CountVectorizer，但我一直遇到一个问题没有这样的文件或目录：“id”错误我的代码： ##%%time ## Creating a 2-level index for goog_s and amaz_s goog_s['dataset_name'] = 'goog_s' amaz_s['dataset_name'] = 'amaz_s' amaz_s.rename(columns = {'title':'name'}, inplace = True) ## C

我试图在文档上使用

CountVectorizer

，但我一直遇到一个问题

没有这样的文件或目录：“id”错误

我的代码：

##%%time
## Creating a 2-level index for goog_s and amaz_s
goog_s['dataset_name'] = 'goog_s'
amaz_s['dataset_name'] = 'amaz_s'
amaz_s.rename(columns = {'title':'name'}, inplace = True)

## Creating a new Dataframe containing both goog_s and amaz_s 
df_s = pd.concat([goog_s, amaz_s], axis = 0, join = 'outer', keys = ['goog_s', 'amaz_s'])

## Creating column info
df_s ["info"] = df_s["name"].astype(str) + " " + df_s["description"]


## Creating countVectorizer
cv = CountVectorizer(input='filename', encoding='iso-8859-1', 
                     decode_error='ignore', analyzer='word',
                    ngram_range=(1,1), stop_words='english',
                    binary=True)

cvRaw = cv.fit_transform(df_s)

我在

cvRaw=cv.fit\u transform（df\u s）

行中不断收到一个错误，该行读取

FileNotFoundError:[Errno 2]没有这样的文件或目录：“id”

我的数据帧

df_s

包含一个名为

id

的列。我不知道为什么会出现这个错误

您应该将

输入

参数作为

'content'

输入，否则它将使用

fit（）

中的fed列表给出的名称查找文件。在您的例子中，它将是

df_s

dataframe的列名，因为如果您使用dataframe进行迭代，它将返回列名

有关更多说明，请参阅

另外，对于

fit（）

，您需要给出一个迭代器，这意味着

pd.Series

我不确定你的意图，以下是我的建议

cvRaw = cv.fit_transform(df_s['info'])

您应该将

输入

参数作为

'content'

输入，否则它将使用

fit（）

中的fed列表指定的名称查找文件。在您的例子中，它将是

df_s

dataframe的列名，因为如果您使用dataframe进行迭代，它将返回列名

有关更多说明，请参阅

另外，对于

fit（）

，您需要给出一个迭代器，这意味着

pd.Series

我不确定你的意图，以下是我的建议

cvRaw = cv.fit_transform(df_s['info'])

不能将数据帧传递给countvectorizer。从您拥有的数据帧，特别是您想要使用的列数据？@vb\u我想要创建一个二进制计数为1克的CountVectorizer，避免使用英文stopwords，即，一个将记录表示为一组标记的向量器。我的数据集中的列有：id、name、description、data_set（无论它属于谷歌还是亚马逊数据集，因为我将两者结合起来）、maufacturer、price和info。我不确定要传递给countVectorizer的特定列。countVectorizer的输入应为字符串。因此，您应该首先选择包含数据的列，然后合并所有值并传递它。您不能将数据帧传递给countvectorizer。从您拥有的数据帧，特别是您想要使用的列数据？@vb\u我想要创建一个二进制计数为1克的CountVectorizer，避免使用英文stopwords，即，一个将记录表示为一组标记的向量器。我的数据集中的列有：id、name、description、data_set（无论它属于谷歌还是亚马逊数据集，因为我将两者结合起来）、maufacturer、price和info。我不确定要传递给countVectorizer的特定列。countVectorizer的输入应为字符串。因此，您应该首先选择包含数据的列，然后合并所有值并传递它。谢谢！我的意图如下：我想创建一个二进制计数为1克的CountVectorizer，避免使用英文stopwords，即，一个将记录表示为一组标记的向量器。我的数据集中的列有：id、name、description、data_set（无论它属于谷歌还是亚马逊数据集，因为我将两者结合起来）、maufacturer、price和info。我不确定要将哪个列传递给countVectorizer。谢谢！我的意图如下：我想创建一个二进制计数为1克的CountVectorizer，避免使用英文stopwords，即，一个将记录表示为一组标记的向量器。我的数据集中的列有：id、name、description、data_set（无论它属于谷歌还是亚马逊数据集，因为我将两者结合起来）、maufacturer、price和info。我不确定要传递给countVectorizer的特定列。