Pandas 带熊猫阵列的hstack csr矩阵_Pandas_Numpy_Scipy_Sparse Matrix

Pandas 带熊猫阵列的hstack csr矩阵

pandas numpy

Pandas 带熊猫阵列的hstack csr矩阵,pandas,numpy,scipy,sparse-matrix,Pandas,Numpy,Scipy,Sparse Matrix,我正在做一个关于亚马逊评论的练习，下面是代码。基本上，我不能将列pandas数组添加到CSR矩阵中，这是我在应用BoW之后得到的。即使两个矩阵中的行数匹配，我也无法通过 import sqlite3 import pandas as pd import numpy as np import nltk import string import matplotlib.pyplot as plt import seaborn as sns import scipy from sklearn.fea

我正在做一个关于亚马逊评论的练习，下面是代码。基本上，我不能将列pandas数组添加到CSR矩阵中，这是我在应用BoW之后得到的。即使两个矩阵中的行数匹配，我也无法通过

import sqlite3
import pandas as pd
import numpy as np
import nltk
import string
import matplotlib.pyplot as plt
import seaborn as sns
import scipy
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics import confusion_matrix
from sklearn import metrics
from sklearn.metrics import roc_curve, auc
from nltk.stem.porter import PorterStemmer
from sklearn.manifold import TSNE

#Create Connection to sqlite3
con = sqlite3.connect('C:/Users/609316120/Desktop/Python/Amazon_Review_Exercise/database/database.sqlite')

filtered_data = pd.read_sql_query("""select * from Reviews where Score != 3""", con)
def partition(x):
    if x < 3:
       return 'negative'
    return 'positive'

actualScore = filtered_data['Score']
actualScore.head()
positiveNegative = actualScore.map(partition)
positiveNegative.head(10)
filtered_data['Score'] = positiveNegative
filtered_data.head(1)
filtered_data.shape

display = pd.read_sql_query("""select * from Reviews where Score !=3 and Userid="AR5J8UI46CURR" ORDER BY PRODUCTID""", con)

sorted_data = filtered_data.sort_values('ProductId', axis=0, ascending=True, inplace=False, kind='quicksort', na_position='last')

final=sorted_data.drop_duplicates(subset={"UserId","ProfileName","Time","Text"}, keep='first', inplace=False)

final.shape

display = pd.read_sql_query(""" select * from reviews where score != 3 and id=44737 or id = 64422 order by productid""", con)

final=final[final.HelpfulnessNumerator<=final.HelpfulnessDenominator]

final['Score'].value_counts()

count_vect = CountVectorizer()

final_counts = count_vect.fit_transform(final['Text'].values)

final_counts.shape

type(final_counts)

positive_negative = final['Score']

#Below is giving error
final_counts = hstack((final_counts,positive_negative))

sparse.hstack将输入的coo格式矩阵合并为新的coo格式矩阵

final_counts是csr矩阵，因此sparse.coo_matrixfinal_counts转换非常简单

正\负是数据帧的一列。看

 sparse.coo_matrix(positive_negative)

它可能是一个1，n稀疏矩阵。但要将它与最终的_计数结合起来，它需要是1，n形状

尝试创建稀疏矩阵并对其进行转置：

sparse.hstack((final_counts, sparse.coo_matrix(positive_negative).T))

就连我在稀疏矩阵上也面临着同样的问题。您可以通过todense将CSR矩阵转换为稠密矩阵，然后可以使用np.hstackdataframe.values、converted\u dense\u矩阵。它会很好用的。无法使用numpy.hstack处理稀疏矩阵然而，对于非常大的数据集，转换为密集矩阵并不是一个好主意。在您的情况下，scipy hstack将不起作用，因为hstackint、object中的数据类型不同。

尝试正数\负数=最终的['Score'].值，然后单击scipy.sparse.hs。如果它不起作用，你能给我你的正片和负片的输出吗。数据类型

它有什么错误？我错过了一些东西，所以出错了。但现在的问题是在csr_矩阵中添加了一列之后，我的最终形状是364172，我预计是364171115282。下面是上述代码的扩展>>>最终计数。形状364171115281>>>类型最终计数>>>正/负。形状364171，>>>类型正/负>>>最终计数=np.hstack最终计数，正/负>>最终计数。形状364172，np.hstack？？？这不是用于稀疏矩阵的正确hstack！。它将稀疏矩阵包装在一个具有形状1的对象数据类型数组中。尝试合并的\u data=scipy.sparse.hstackfinal\u counts，scipy.sparse.coo\u matrixpositive\u negative.T但错误类型错误：不支持类型转换：dtype'int64'，dtype'O'看起来您的数据帧具有对象数据类型。它成功地创建了coo_矩阵，但sparse.hstack无法从int64矩阵和O矩阵的混合中创建新矩阵。稀疏代码没有针对对象数据类型的特殊规定，所以还有其他解决方法吗？我的最终要求是从Pandas dataframe向CSR矩阵添加一列。您不能更改数据类型吗？使用astype方法？

Used Below but still getting error

merged_data = scipy.sparse.hstack((final_counts, scipy.sparse.coo_matrix(positive_negative).T))

Below is the error

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'sparse' is not defined
>>> merged_data = scipy.sparse.hstack((final_counts, sparse.coo_matrix(positive_
negative).T))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'sparse' is not defined
>>> merged_data = scipy.sparse.hstack((final_counts, scipy.sparse.coo_matrix(pos
itive_negative).T))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python34\lib\site-packages\scipy\sparse\construct.py", line 464, in h
stack
    return bmat([blocks], format=format, dtype=dtype)
  File "C:\Python34\lib\site-packages\scipy\sparse\construct.py", line 600, in b
mat
    dtype = upcast(*all_dtypes) if all_dtypes else None
  File "C:\Python34\lib\site-packages\scipy\sparse\sputils.py", line 52, in upca
st
    raise TypeError('no supported conversion for types: %r' % (args,))
TypeError: no supported conversion for types: (dtype('int64'), dtype('O'))