Python索引器：索引1超出LDA的范围_Python_Matplotlib_Scikit Learn_Pca

Python索引器：索引1超出LDA的范围

python matplotlib scikit-learn

Python索引器：索引1超出LDA的范围,python,matplotlib,scikit-learn,pca,Python,Matplotlib,Scikit Learn,Pca,我的数据集如下所示： Out Revolver Ratio Num ... 0 1 0.766127 0.802982 0 ... 1 0 0.957151 0.121876 1 2 0 0.658180 0.085113 0 3 0 0.233810 0.036050 3 4 1 0.907239 0.024926 5 ... Out只能获取值0和1。然后，我尝试使用下面类似于

我的数据集如下所示：

    Out  Revolver   Ratio     Num ...
0   1    0.766127   0.802982  0   ...
1   0    0.957151   0.121876  1 
2   0    0.658180   0.085113  0 
3   0    0.233810   0.036050  3 
4   1    0.907239   0.024926  5 
...

Out

只能获取值0和1。然后，我尝试使用下面类似于此处的代码生成PCA和LCA图：

我可以让PCA图工作。然而，它没有意义，因为它只显示了2个点。一个在（-4000,30）左右，另一个在（2400,23.7）左右。我看不到一堆数据点，就像链接中的图一样

LDA绘图不起作用，并给出错误信息

索引器：索引1超出大小为1的轴1的界限

我还尝试了下面的代码来生成LDA图，但得到了相同的错误

for c, i, name in zip("rgb", [0, 1], names):
    plt.scatter(x=X_LDA_sklearn[:, 0][yf==i], y=X_LDA_sklearn[:, 1][yf==i], c=c, label=name)
plt.legend()

有人知道这是怎么回事吗

编辑：这是我的导入

import pandas as pd
from pandas import Series,DataFrame
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import csv

from sklearn.linear_model import LogisticRegression
from sklearn.cross_validation import train_test_split
from sklearn import metrics
from sklearn.decomposition import PCA
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.lda import LDA

至于发生错误的地方：

我明白了

FutureWarning: in the future, boolean array-likes will be handled as a boolean array index
plt.scatter(X_r[yf == i,0], X_r[yf == i, 1], c=c, label=name)

FutureWarning: in the future, boolean array-likes will be handled as a boolean array index

在PCA绘图的for循环内的行

至于线路上的LDA

plt.scatter(X_r2[yf == i, 0], X_r2[yf == i, 1], c=c, label=name)

我明白了

FutureWarning: in the future, boolean array-likes will be handled as a boolean array index
plt.scatter(X_r[yf == i,0], X_r[yf == i, 1], c=c, label=name)

FutureWarning: in the future, boolean array-likes will be handled as a boolean array index

及

您看到此错误的原因是

X_r2

只包含一列（至少给定您提供的数据）。但是，在命令

y=X\u LDA\u sklearn[：，1][yf==i]

中，您尝试访问第二列，因此会抛出您观察到的错误

我在您提供的示例数据中添加了第三个类（使用两个类，降维不是很合理），并将数据帧转换为数组。它现在运行良好，并生成以下曲线图（由于数据量较小，信息量不大）：

以下是更新的代码：

import pandas as pd
from pandas import Series,DataFrame
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import csv

from sklearn.linear_model import LogisticRegression
from sklearn.cross_validation import train_test_split
from sklearn import metrics
from sklearn.decomposition import PCA
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis

trainDF = pd.DataFrame({'Out': [1, 0, 0, 0, 1, 3, 3],
                        'Revolver': [0.766, 0.957, 0.658, 0.233, 0.907, 0.1, 0.15],
                        'Ratio': [0.803, 0.121, 0.085, 0.036, 0.024, 0.6, 0.8],
                        'Num': [0, 1, 0, 3, 5, 4, 4]})
#drop NA values
trainDF = trainDF.dropna()

trainDF['Num'].loc[(trainDF['Num']==8) | (trainDF['Num']==17)] = trainDF['Num'].median()

# convert dataframe to numpy array
y = trainDF['Out'].as_matrix()

# convert dataframe to numpy array
X = trainDF.drop('Out', 1).as_matrix()

target_names = ['out', 'in']

pca = PCA(n_components=2)
X_r = pca.fit(X).transform(X)

lda = LinearDiscriminantAnalysis(n_components=2)
X_r2 = lda.fit(X, y).transform(X)

# Percentage of variance explained for each components
print('explained variance ratio (first two components): %s'
      % str(pca.explained_variance_ratio_))

plt.figure()
for c, i, target_name in zip("rgb", [0, 1], target_names):
    plt.scatter(X_r[y == i, 0], X_r[y == i, 1], c=c, label=target_name)
plt.legend()
plt.title('PCA of Out')

plt.figure()
for c, i, target_name in zip("rgb", [0, 1], target_names):
    plt.scatter(X_r2[y == i, 0], X_r2[y == i, 1], c=c, label=target_name)
plt.legend()
plt.title('LDA of Out')

plt.show()

因此，当您遇到这些“索引越界”错误时，请始终首先检查数组的维度。

您可以添加

import

语句并让我们知道错误发生的行吗？以及

Train

和

newTrain

是如何定义的？你如何读取数据？这显然是一个维度问题，因此如果您告诉我们您是如何创建您正在使用的数据的，这将非常有帮助。：）你可以看到我的代码，有一件事我忘了放在粘贴箱里：我忘了在这行的正上方加上一行

X=trainDF

，我找到了错误并更正了代码。现在请告诉我这是否解决了你的问题。非常感谢你的帮助。我只关心一件事。我非常重视我的隐私，因此如果您能将

目标名称

更改为不同的名称，例如

[out，in]

，将

plt.title

更改为类似

plt.title（'PCA of out'）

，我将不胜感激。谢谢