Python 将CountVectorizer结果设置为pandas.DataFrame_Python_Pandas_Dataframe_Text Mining_Countvectorizer

Python 将CountVectorizer结果设置为pandas.DataFrame

python pandas dataframe

Python 将CountVectorizer结果设置为pandas.DataFrame,python,pandas,dataframe,text-mining,countvectorizer,Python,Pandas,Dataframe,Text Mining,Countvectorizer,我需要使用CountVectorizer生成的矩阵特征设置pandas.DataFrame count_vect = CountVectorizer() count_vect.fit(text) xtrain_count = count_vect.transform(train_x) SaveTxt = pandas.DataFrame() SaveTxt['text']=xtrain_count 但是在最后一行SaveTxt['text']=xtrain\u count中，我得到了以下错误

我需要使用CountVectorizer生成的矩阵特征设置pandas.DataFrame

count_vect = CountVectorizer()
count_vect.fit(text)

xtrain_count = count_vect.transform(train_x)
SaveTxt = pandas.DataFrame()
SaveTxt['text']=xtrain_count

但是在最后一行

SaveTxt['text']=xtrain\u count

中，我得到了以下错误

 raise ValueError('Cannot set a frame with no defined index '
ValueError: Cannot set a frame with no defined index and a value that cannot be converted to a Series

我想知道如何将CountVectorizer的结果矩阵设置为dataframe？ CountVectorizer结果是一个约20000行200000列的csr_矩阵，内容为整数（1到6）

pd.DataFrame（my_csr_matrix.todense（））

以下是概念证明：

随机导入
进口洛雷姆
作为pd进口熊猫
从sklearn.feature\u extraction.text导入countvectorier
m=10
随机种子（0）
数据=[lorem.段落（）表示范围（m）]
cv=计数向量器（）
cv.拟合（数据）
df=pd.DataFrame（data=cv.transform（data.todense（））
打印（df.形状）
打印（df.head（））

结果：

（10,27）
0  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26
0  1  2  2  3  3  0  2  0  3  1   2   2   2   1   1   5   3   2   1   3   1   0   2   2   1   4   4
1  0  0  4  1  0  0  1  3  0  3   2   0   1   0   1   1   1   5   3   2   0   0   1   0   0   3   1
2  0  2  3  1  1  1  2  0  2  0   1   1   1   1   1   3   2   0   1   2   1   4   3   0   1   2   5
3  3  3  4  7  1  2  4  2  2  0   1   2   1   1   0   0   0   2   1   3   2   2   2   2   0   3   4
4  2  3  1  2  3  4  1  1  4  3   2   4   2   2   3   3   2   0   2   3   2   5   4   3   2   1   2

你能分享一下

xtrain\u count

内容的一个小例子吗？它是一个大约20000行200000列的矩阵，内容是整数（1到6）它是一个numpy数组？它是csr\u矩阵你试图将一个（大量）矩阵保存到一个空数据帧的新列中？是的，我知道将这种规模的数据放入数据框架的后果，但规模/效率并不是@Sina问题的主题。我们在这里得到XY了吗？大概