Python 3.x 为了保持恒定的内存使用量，是否需要增量alpca的for循环？_Python 3.x_Scikit Learn_Pca

Python 3.x 为了保持恒定的内存使用量，是否需要增量alpca的for循环？

python-3.x scikit-learn

Python 3.x 为了保持恒定的内存使用量，是否需要增量alpca的for循环？,python-3.x,scikit-learn,pca,Python 3.x,Scikit Learn,Pca,在过去，我曾尝试使用scikit learn的IncrementalPCA来减少内存使用。我将其用作代码的模板。但正如@aarslan在评论部分所说：“我注意到解释的方差似乎在每次迭代中都会减少。”我一直怀疑给定答案中最后一个for loop。所以，我的问题是：我是否需要一个for循环，以便在partial\u fitstep期间保持恒定的内存使用量，或者batch\u size就足够了？您可以在下面找到代码： import h5py import numpy as np from sklear

在过去，我曾尝试使用scikit learn的IncrementalPCA来减少内存使用。我将其用作代码的模板。但正如@aarslan在评论部分所说：“我注意到解释的方差似乎在每次迭代中都会减少。”我一直怀疑给定答案中最后一个

for loop

。所以，我的问题是：我是否需要一个for循环，以便在

partial\u fit

step期间保持恒定的内存使用量，或者

batch\u size

就足够了？您可以在下面找到代码：

import h5py
import numpy as np
from sklearn.decomposition import IncrementalPCA

h5 = h5py.File('rand-1Mx1K.h5')
data = h5['data'] # it's ok, the dataset is not fetched to memory yet

n = data.shape[0] # how many rows we have in the dataset
chunk_size = 1000 # how many rows we feed to IPCA at a time, the divisor of n
icpa = IncrementalPCA(n_components=10, batch_size=16)

for i in range(0, n//chunk_size):
    ipca.partial_fit(data[i*chunk_size : (i+1)*chunk_size])

一个老问题，但是是的，需要for循环。

batch\u size=

参数仅用于

.fit（）

方法，而不用于

.partial\u fit（）

Scikit学习：

批量大小：int，默认值=None

每批要使用的样本数。仅在调用fit时使用