Python 制作一个数据集来测试Sklearn中的PCA？_Python_Numpy_Dataframe_Pca

Python 制作一个数据集来测试Sklearn中的PCA？

python numpy dataframe

Python 制作一个数据集来测试Sklearn中的PCA？,python,numpy,dataframe,pca,Python,Numpy,Dataframe,Pca,我想测试我的PCA工作流程，为此我想创建一个数据集，其中包含3个特征，这些特征之间有一组关系。然后应用PCA并检查这些关系是否被捕获，在Python中最简单的方法是什么谢谢大家! 您可以创建样本，其中两个特征相互独立，第三个特征是其他两个特征的线性组合例如： import numpy as np from numpy.random import random N_SAMPLES = 1000 samples = random((N_SAMPLES, 3)) # Let us suppo

我想测试我的PCA工作流程，为此我想创建一个数据集，其中包含3个特征，这些特征之间有一组关系。然后应用PCA并检查这些关系是否被捕获，在Python中最简单的方法是什么

谢谢大家!

您可以创建样本，其中两个特征相互独立，第三个特征是其他两个特征的线性组合

例如：

import numpy as np
from numpy.random import random

N_SAMPLES = 1000

samples = random((N_SAMPLES, 3))

# Let us suppose that the column `1` will have the dependent feature, the other two being independent

samples[:, 1] = 3 * samples[:, 0] - 2 * samples[:, 2]

现在，如果您运行PCA在该样本上找到两个主成分，“解释方差”应该等于1

例如：

from sklearn.decomposition import PCA

pca2 = PCA(2)
pca2.fit(samples)

assert sum(pca2.explained_variance_ratio_) == 1.0 # this should be true

谢谢，但当我进行PCA时，我如何找到关系？给出线性组合的分量是否与用于样本的权重相同？不，这是不可能确定的。根据样本的大小，主成分分析可能以随机方式实施，如果这些不同的答案具有相同的解释方差比，则可能会发现不同的主成分。