Scikit learn 在大型稀疏矩阵上执行SVD特征分解_Scikit Learn_Nlp_Sparse Matrix_Svd_Lsa

Scikit learn 在大型稀疏矩阵上执行SVD特征分解

scikit-learn nlp

Scikit learn 在大型稀疏矩阵上执行SVD特征分解,scikit-learn,nlp,sparse-matrix,svd,lsa,Scikit Learn,Nlp,Sparse Matrix,Svd,Lsa,我用pickle以稀疏矩阵格式保存了文本数据中的特征，形状为（323549419259）。我试图使用sklearn库对它们执行奇异值分解，但是，我不断得到一个内存错误，这表明我的计算机功能不够强大，无法执行功能缩减。有没有更有效的方法？这是我使用的代码 import numpy as np from sklearn.decomposition import TruncatedSVD import pickle with open('vectors.pickle') as f: # Pyth

我用pickle以稀疏矩阵格式保存了文本数据中的特征，形状为（323549419259）。我试图使用sklearn库对它们执行奇异值分解，但是，我不断得到一个内存错误，这表明我的计算机功能不够强大，无法执行功能缩减。有没有更有效的方法？这是我使用的代码

import numpy as np
from sklearn.decomposition import TruncatedSVD
import pickle


with open('vectors.pickle') as f:  # Python 3: open(..., 'wb')
    matrix = pickle.load(f) #Loads the TF-IDF Vectors



print('LSAING . . .')
lsa = TruncatedSVD(n_components=300, n_iter=10) #Perform the Feature Reduction
lsa.fit(matrix)

matrix_T = lsa.fit_transform(matrix)

print('Dumping')
with open('lsa.pickle', 'w') as f:
    pickle.dump(lsa, f) #Dumps the new features

我使用的是8GB内存的i7-5500处理器。

在我的情况下，我使用大交换空间作为解决办法。因此，您建议我增加交换分区吗？也许可以尝试使用sklearn.decomposition.NMF，以防它更好地处理稀疏性，因为它本质上是一种稀疏方法。看到这一点，你也可以尝试减少组件的数量，例如150个