Scikit学习PCA中的Bug还是Numpy特征分解中的Bug?
我有一个包含400个特征的数据集 我所做的:Scikit学习PCA中的Bug还是Numpy特征分解中的Bug?,numpy,scikit-learn,linear-algebra,pca,Numpy,Scikit Learn,Linear Algebra,Pca,我有一个包含400个特征的数据集 我所做的: # approach 1 d_cov = np.cov(d_train.transpose()) eigens, mypca = LA.eig(d_cov) # assume sort by eigen value also/ LA = numpy linear algebra # approach 2 pca = PCA(n_components=300) d_fit = pca.fit_transform(d_train) pc = pca.
# approach 1
d_cov = np.cov(d_train.transpose())
eigens, mypca = LA.eig(d_cov) # assume sort by eigen value also/ LA = numpy linear algebra
# approach 2
pca = PCA(n_components=300)
d_fit = pca.fit_transform(d_train)
pc = pca.components_
现在,这两个应该是一样的,对吗?因为PCA只是协方差矩阵的特征分解
但在我的情况下,这些是非常不同的
怎么可能,我犯了上面的任何错误
比较差异:
import numpy as np
LA = np.linalg
d_train = np.random.randn(100, 10)
d_cov = np.cov(d_train.transpose())
eigens, mypca = LA.eig(d_cov)
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
pca = PCA(n_components=10)
d_fit = pca.fit_transform(d_train)
pc = pca.components_
ve = pca.explained_variance_
#mypca[0,:], pc[0,:] pc.transpose()[0,:]
plt.plot(list(range(len(eigens))), [ x.transpose().dot(d_cov).dot(x) for x,y in zip(mypca, eigens) ])
plt.plot(list(range(len(ve))), ve)
plt.show()
print(mypca, '\n---\n' , pc)
我不是PCA方面的专家,但如果我转置其中一个矩阵,我似乎会得到类似的值
>>> import numpy as np
>>> LA = np.linalg
>>> d_train = np.random.randn(100, 10)
>>> d_cov = np.cov(d_train.transpose())
>>> eigens, mypca = LA.eig(d_cov)
>>> from sklearn.decomposition import PCA
>>> pca = PCA(n_components=10)
>>> d_fit = pca.fit_transform(d_train)
>>> pc = pca.components_
>>> mypca[0,:]
array([-0.44255435, -0.77430549, -0.14479638, -0.06459874, 0.24772212,
0.20780185, 0.22388151, -0.05069543, -0.14515676, -0.03385801])
>>> pc[0,:]
array([-0.44255435, -0.24050535, -0.17313927, 0.07182494, 0.09748632,
0.17910516, 0.26125107, 0.71309764, 0.17276004, 0.25095447])
>>> pc.transpose()[0,:]
array([-0.44255435, 0.77430549, 0.14479638, -0.06459874, 0.24772212,
-0.20780185, 0.22388151, -0.03385801, 0.14515676, 0.05069543])
>>> list(zip(pc.transpose()[:,0], mypca[:,0]))
[(-0.44255435328718207, -0.44255435328718096),
(-0.24050535133912765, -0.2405053513391287),
(-0.17313926714559819, -0.17313926714559785),
(0.07182494253930383, 0.0718249425393035),
(0.09748631534772645, 0.09748631534772684),
(0.17910516453826955, 0.17910516453826758),
(0.2612510722861703, 0.2612510722861689),
(0.7130976419217306, 0.7130976419217326),
(0.17276004381786172, 0.17276004381786136),
(0.25095447415020183, 0.2509544741502009)]
你需要更仔细地阅读这份文件。numpy的文档很棒,而且非常全面,通常你只有通过阅读它才能找到问题的解决方案 下面是代码的修改版本(在代码段顶部导入,使用.T代替.transpose(),pep8) 这两条曲线完全相同。 重要的是我迭代我的_pca.T,而不是我的_pca
Signature: np.linalg.eig(a)
Docstring:
Compute the eigenvalues and right eigenvectors of a square array.
Parameters
----------
a : (..., M, M) array
Matrices for which the eigenvalues and right eigenvectors will
be computed
Returns
-------
w : (..., M) array
# not important for you
v : (..., M, M) array
The normalized (unit "length") eigenvectors, such that the
column ``v[:,i]`` is the eigenvector corresponding to the
eigenvalue ``w[i]``.
特征向量作为
my_pca
的列而不是行返回<在我的pca中,x的code>正在对行进行迭代。能否显示您的导入语句?您是否尝试过pc.transpose()
?可能重复@p.Camilleri,但即使是差异也不相同?至少由PCs和特征向量解释的方差应该是sam,对吗?请参见edit.FYI,使用pc.T代替pc.transpose(),它较短,并且可以避免创建新阵列。它们不一样,请参见:0.2078185和-0。20780185@mourinho请看我标记的重复问题,执行PCA时存在符号歧义
Signature: np.linalg.eig(a)
Docstring:
Compute the eigenvalues and right eigenvectors of a square array.
Parameters
----------
a : (..., M, M) array
Matrices for which the eigenvalues and right eigenvectors will
be computed
Returns
-------
w : (..., M) array
# not important for you
v : (..., M, M) array
The normalized (unit "length") eigenvectors, such that the
column ``v[:,i]`` is the eigenvector corresponding to the
eigenvalue ``w[i]``.