Python numpy和sklearn上PCA、截断的_svd和svd的不同结果
在SKL中,有不同的方法来计算第一主成分。 对于每种方法,我得到了不同的结果。为什么?Python numpy和sklearn上PCA、截断的_svd和svd的不同结果,python,numpy,machine-learning,scikit-learn,svd,Python,Numpy,Machine Learning,Scikit Learn,Svd,在SKL中,有不同的方法来计算第一主成分。 对于每种方法,我得到了不同的结果。为什么? import matplotlib.pyplot as pl from sklearn import decomposition import scipy as sp import sklearn.preprocessing import numpy as np import sklearn as sk def gen_data_3_1(): #### generate the data 3.1
import matplotlib.pyplot as pl
from sklearn import decomposition
import scipy as sp
import sklearn.preprocessing
import numpy as np
import sklearn as sk
def gen_data_3_1():
#### generate the data 3.1
m=1000 # number of samples
n=10 # number of variables
d1=np.random.normal(loc=0,scale=100,size=(m,1))
d2=np.random.normal(loc=0,scale=121,size=(m,1))
d3=-0.2*d1+0.9*d2
z=np.zeros(shape=(m,1))
for i in range(4):
z=np.hstack([z,d1+np.random.normal(size=(m,1))])
for i in range(4):
z=np.hstack([z,d2+np.random.normal(size=(m,1))])
for i in range(2):
z=np.hstack([z,d3+np.random.normal(size=(m,1))])
z=z[:,1:11]
z=sk.preprocessing.scale(z,axis=0)
return z
x=gen_data_3_1() #generate the sample dataset
x=sk.preprocessing.scale(x) #normalize the data
pca=sk.decomposition.PCA().fit(x) #compute the PCA of x and print the first princ comp.
print "first pca components=",pca.components_[:,0]
u,s,v=sp.sparse.linalg.svds(x) # the first column of v.T is the first princ comp
print "first svd components=",v.T[:,0]
trsvd=sk.decomposition.TruncatedSVD(n_components=3).fit(x) #the first components is the
#first princ comp
print "first component TruncatedSVD=",trsvd.components_[0,]
--
因为PCA、SVD和截断SVD的方法不同。
PCA调用SVD,但它也在以前将数据集中。截断SVD截断向量
svds
与svd
是一种不同的方法,因为它是稀疏的。BTW,sk.decomposition.PCA
返回按解释的方差递减值排序的结果(即按奇异值递减的顺序),而sparse.linalg.svds
按奇异值递增的顺序返回,因此print“first pca components=,pca.components_[:,0]
应该是打印“first pca components=,pca.components_[:,-1]
。
first pca components= [-0.04201262 0.49555992 0.53885401 -0.67007959 0.0217131 -0.02535204
0.03105254 -0.07313795 -0.07640555 -0.00442718]
first svd components= [ 0.02535204 -0.1317925 0.12071112 -0.0323422 0.20165568 -0.25104996
-0.0278177 0.17856688 -0.69344318 0.59089451]
first component TruncatedSVD= [-0.04201262 -0.04230353 -0.04213402 -0.04221069 0.4058159 0.40584108
0.40581564 0.40584842 0.40872029 0.40870925]