如何使用Python计算因子分析分数(scikit学习)?

如何使用Python计算因子分析分数(scikit学习)?,python,r,scikit-learn,factor-analysis,Python,R,Scikit Learn,Factor Analysis,我需要执行探索性因素分析,并使用Python计算每个观察的分数,假设只有一个基本因素。似乎sklearn.decomposition.factorananalysis()是一条路要走,但不幸的是,和(不幸的是,我找不到其他示例)还不够清楚,我无法弄清楚如何完成这项工作 我有以下测试文件,其中包含29个变量的41个观察值(test.csv): 使用我根据官方示例和 我得到了奇怪的结果。代码: from sklearn import decomposition, preprocessing from

我需要执行探索性因素分析,并使用Python计算每个观察的分数,假设只有一个基本因素。似乎
sklearn.decomposition.factorananalysis()
是一条路要走,但不幸的是,和(不幸的是,我找不到其他示例)还不够清楚,我无法弄清楚如何完成这项工作

我有以下测试文件,其中包含29个变量的41个观察值(
test.csv
):

使用我根据官方示例和 我得到了奇怪的结果。代码:

from sklearn import decomposition, preprocessing
from sklearn.cross_validation import cross_val_score
import csv
import numpy as np

data = np.genfromtxt('test.csv', delimiter=',')

def compute_scores(X):
    n_components = np.arange(0, len(X), 1)
    X = preprocessing.scale(X) # data normalisation attempt
    pca = decomposition.PCA()
    fa = decomposition.FactorAnalysis(n_components=1)

    pca_scores, fa_scores = [], []
    for n in n_components:
        pca.n_components = n
        fa.n_components = n
        #pca_scores.append(np.mean(cross_val_score(pca, X))) # if I attempt to compute pca_scores I get the error.
        fa_scores.append(np.mean(cross_val_score(fa, X)))

    print pca_scores, fa_scores
compute_scores(data)
代码输出:

[],
 [-947738125363.77405,
  -947738145459.86035,
  -947738159924.70471,
  -947738174662.89746,
  -947738206142.62854,
  -947738179314.44739,
  -947738220921.50684,
  -947738223447.3678,
  -947738277298.33545,
  -947738383772.58606,
  -947738415104.84912,
  -947738406361.44482,
  -947738394379.30359,
  -947738456528.69275,
  -947738501001.14319,
  -947738991338.98291,
  -947739381280.06506,
  -947739389033.33557,
  -947739434992.48047,
  -947739549511.2655,
  -947739355699.70959,
  -947739879828.51514,
  -947739898216.39099,
  -947739905804.71033,
  -947739902618.47791,
  -947738564594.54639,
  -948816122907.87366,
  -947744046601.55029,
  -947738624937.61292,
  -947738625325.73486,
  -947738626111.14441,
  -947738624973.92188,
  -947738625200.06946,
  -947738625568.65027,
  -947738625528.69666,
  -947738625359.41992,
  -947738624906.67529,
  -947738625652.12439,
  -947739509002.01868,
  -947738625426.81946,
  -947738625380.45837]
这一结果与预期相差甚远。这是此任务的
R
代码和相同的数据。它的输出正常(结果接近能够执行FA的IBM程序的输出):


因此,我希望在Python中得到类似的结果(我知道我不会得到确切的数字),但我不知道如何获得分数。

似乎我知道了如何获得分数

from sklearn import decomposition, preprocessing
import numpy as np

data = np.genfromtxt('rangir_test.csv', delimiter=',')
data = data[~np.isnan(data).any(axis=1)]
data_normal = preprocessing.scale(data)
fa = decomposition.FactorAnalysis(n_components = 1)
fa.fit(data_normal)
for score in fa.score_samples(data_normal):
    print score 
不幸的是,输出(见下文)与
factanal()
中的输出非常不同。如有任何关于分解的建议,我们将不胜感激

Scikit学习成绩输出:

-69.8587183816
-116.353511148
-24.1529840248
-36.5366398005
-7.87165586175
-24.9012815104
-23.9148486368
-10.047780535
-4.03376369723
-7.07428842783
-7.44222705099
-6.25705487929
-13.2313513762
-13.3253819521
-9.23993173528
-7.141616656
-5.57915693405
-6.82400483045
-15.0906961724
-3.37447211233
-5.41032267015
-5.75224753811
-19.7230390792
-6.75268922909
-4.04911793705
-10.6062761691
-3.17417070498
-9.95916350005
-3.25893428094
-3.88566777358
-3.30908856716
-3.58141292341
-3.90778368669
-4.01462493538
-11.6683969455
-5.30068548445
-24.3400870389
-7.66035331181
-13.8321672858
-8.93461397086
-17.4068326999

这已经很晚了,但对于OP或其他从谷歌来到这里的人来说可能仍然很有趣

对于使用R factanal的每个人,都有一个python包,它包装了R factanal函数,这样您就可以使用如下数据框从python调用它:

from factanal.wrapper import factanal

fa_res = factanal(pdf, factors=4, scores='regression', rotation='promax', 
                  verbose=True, return_dict=True)
更多信息:

安装时使用:

pip install factanal

只是一个评论——但当使用29个变量进行因子分析时,我会小心地解释仅基于41个观察结果的结果。你应该查阅Meyer Kaiser-Olkin测试,看看你的数据是否适合进行因子分析。@jalapic,是的,我知道我需要的观察值至少是变量的两倍,但我的客户希望这样(我告诉他结果会误导他)。您遇到的问题是python输出负对数可能性,而factanal输出z分数。这是一个3年前的问题,但我想知道是否也可以使用命令
scores=fa.fit_transform(data_normal)
-69.8587183816
-116.353511148
-24.1529840248
-36.5366398005
-7.87165586175
-24.9012815104
-23.9148486368
-10.047780535
-4.03376369723
-7.07428842783
-7.44222705099
-6.25705487929
-13.2313513762
-13.3253819521
-9.23993173528
-7.141616656
-5.57915693405
-6.82400483045
-15.0906961724
-3.37447211233
-5.41032267015
-5.75224753811
-19.7230390792
-6.75268922909
-4.04911793705
-10.6062761691
-3.17417070498
-9.95916350005
-3.25893428094
-3.88566777358
-3.30908856716
-3.58141292341
-3.90778368669
-4.01462493538
-11.6683969455
-5.30068548445
-24.3400870389
-7.66035331181
-13.8321672858
-8.93461397086
-17.4068326999
from factanal.wrapper import factanal

fa_res = factanal(pdf, factors=4, scores='regression', rotation='promax', 
                  verbose=True, return_dict=True)
pip install factanal