Python PCA的手动实现产生了一个错误的图,其中特征向量不是正交的
我需要画出我的特征向量,我这样计算:Python PCA的手动实现产生了一个错误的图,其中特征向量不是正交的,python,numpy,machine-learning,pca,covariance,Python,Numpy,Machine Learning,Pca,Covariance,我需要画出我的特征向量,我这样计算: def fit(self, X): ''' fits sorted eigenvalues and eigenvectors to class attributes. same goes for variance and explained variance. ''' n_samples = X.shape[0] # We center the data and compute the sample
def fit(self, X):
'''
fits sorted eigenvalues and eigenvectors to class attributes. same goes for variance and explained variance.
'''
n_samples = X.shape[0]
# We center the data and compute the sample covariance matrix.
X -= np.mean(X, axis=0)
self.cov_matrix_ = np.dot(X.T, X) / (n_samples-1)
#test = np.cov(X)
#Negative values are ignored with eigh
(self.eigvalues_, self.components_) = np.linalg.eigh(self.cov_matrix_)
idx = self.eigvalues_.argsort()[::-1]
self.eigvalues_ = self.eigvalues_[idx]
self.components_ = self.components_[:,idx]
self.variance_ = np.sum(self.eigvalues_)
self.explained_variance_ = self.eigvalues_ / self.variance_
def transform(self, X):
#project data onto eigenvectors
print(self.components_.shape, X.shape)
self.projected_ = X @ self.components_.T
return self.projected_
pca = PCA()
pca.fit(subsample)
#pca.transform(subsample)
plt.scatter(subsample[:,0], subsample[:,1], edgecolor='none', alpha=0.5)
plt.quiver(pca.components_[0,0], pca.components_[0,1],
angles='xy', scale_units='xy', scale=1, width=0.002 )
plt.quiver(pca.components_[1,0], pca.components_[1,1],
angles='xy', scale_units='xy', scale=1, width=0.002 )
进入我的数据集前2个特征的绘图
我的self.components(100x240数据集的240个特征向量)的形状为240x240。
用最大特征值绘制我的2个特征向量的前两个值后,结果如下:
def fit(self, X):
'''
fits sorted eigenvalues and eigenvectors to class attributes. same goes for variance and explained variance.
'''
n_samples = X.shape[0]
# We center the data and compute the sample covariance matrix.
X -= np.mean(X, axis=0)
self.cov_matrix_ = np.dot(X.T, X) / (n_samples-1)
#test = np.cov(X)
#Negative values are ignored with eigh
(self.eigvalues_, self.components_) = np.linalg.eigh(self.cov_matrix_)
idx = self.eigvalues_.argsort()[::-1]
self.eigvalues_ = self.eigvalues_[idx]
self.components_ = self.components_[:,idx]
self.variance_ = np.sum(self.eigvalues_)
self.explained_variance_ = self.eigvalues_ / self.variance_
def transform(self, X):
#project data onto eigenvectors
print(self.components_.shape, X.shape)
self.projected_ = X @ self.components_.T
return self.projected_
pca = PCA()
pca.fit(subsample)
#pca.transform(subsample)
plt.scatter(subsample[:,0], subsample[:,1], edgecolor='none', alpha=0.5)
plt.quiver(pca.components_[0,0], pca.components_[0,1],
angles='xy', scale_units='xy', scale=1, width=0.002 )
plt.quiver(pca.components_[1,0], pca.components_[1,1],
angles='xy', scale_units='xy', scale=1, width=0.002 )
我做错了什么?您应该按行而不是列对特征向量进行排序,也就是说
self.components_ = self.components_[:,idx]
应该是
self.components_ = self.components_[idx]
此外,还应确保绘制的纵横比相等,因为箭图可能未对齐:
plt.gca().set_aspect('equal')
最好在代码中包含最少的工作示例,所以下次请记住:)。为了得到一个最小的工作示例,我必须推断出您的代码的其余部分可能是什么。无论如何,以下是我提议的代码:
import numpy as np
from matplotlib import pyplot as plt
class PCA:
def fit(self, X):
'''
fits sorted eigenvalues and eigenvectors to class attributes. same goes for variance and explained variance.
'''
n_samples = X.shape[0]
# We center the data and compute the sample covariance matrix.
X -= np.mean(X, axis=0)
self.cov_matrix_ = np.dot(X.T, X) / (n_samples-1)
#test = np.cov(X)
#Negative values are ignored with eigh
(self.eigvalues_, self.components_) = np.linalg.eigh(self.cov_matrix_)
idx = self.eigvalues_.argsort()[::-1]
self.eigvalues_ = self.eigvalues_[idx]
self.components_ = self.components_[idx]
self.variance_ = np.sum(self.eigvalues_)
self.explained_variance_ = self.eigvalues_ / self.variance_
def transform(self, X):
#project data onto eigenvectors
print(self.components_.shape, X.shape)
self.projected_ = X @ self.components_.T
return self.projected_
pca = PCA()
# Generate some dummy data
subsample = np.random.randn(69,2)*0.1
subsample[:,0] = subsample[:,0]*8
subsample[:,1] = subsample[:,0]*2 + subsample[:,1] # Add some correlations
pca.fit(subsample)
plt.scatter(subsample[:,0], subsample[:,1], edgecolor='none', alpha=0.5)
plt.quiver(pca.components_[0,0]*2, pca.components_[0,1]*2, # *2 to make arrows larger
angles='xy', scale_units='xy', scale=1, width=0.006)
plt.quiver(pca.components_[1,0]*2, pca.components_[1,1]*2,
angles='xy', scale_units='xy', scale=1, width=0.006)
plt.gca().set_aspect('equal')
plt.show()
你好我只是想提醒你,你应该使用反标记将代码粘贴为实际文本,这样更容易复制粘贴,以便其他人调试你的代码,因为我不太擅长numpy切片lol,所以我错过了其中的一个小细节。我还发现,二维数据中的pca在2D空间中是正交的,但在我的多维数据中,分解到2D时它们不是正交的,并且花了一些时间才弄清楚这一点_(ツ)_/“没问题!嗯,这有点奇怪,因为它们应该相互正交。