Python Sklearn矩阵中inf或NAN的谱聚类错误
我用的是相似矩阵,它的主要论点。我的矩阵看起来像:Python Sklearn矩阵中inf或NAN的谱聚类错误,python,numpy,scikit-learn,cluster-analysis,Python,Numpy,Scikit Learn,Cluster Analysis,我用的是相似矩阵,它的主要论点。我的矩阵看起来像: [[ 1.00000000e+00 8.47085137e-01 8.49644498e-01 8.49746438e-01 2.96473454e-01 8.50540412e-01 8.49462072e-01 8.50839475e-01 8.45951343e-01 5.76448265e-01 8.48265736e-01 8.43378943e-01 3.75348067e-01 1.176
[[ 1.00000000e+00 8.47085137e-01 8.49644498e-01 8.49746438e-01
2.96473454e-01 8.50540412e-01 8.49462072e-01 8.50839475e-01
8.45951343e-01 5.76448265e-01 8.48265736e-01 8.43378943e-01
3.75348067e-01 1.17626480e-01 2.50357519e-01 8.50495202e-01
9.97541755e-01 8.49835674e-01 8.48770171e-01 8.45869271e-01
-5.97205241e-02]
[ 8.47085137e-01 1.00000000e+00 9.98547894e-01 9.98803332e-01
2.22305018e-01 9.98755219e-01 9.98502380e-01 9.98402601e-01
9.98778885e-01 5.66416311e-01 9.98639207e-01 9.98452172e-01
-6.10479042e-02 2.46741344e-02 -4.14116930e-03 9.98357419e-01
8.48955204e-01 9.98525354e-01 9.98900440e-01 9.98426618e-01
-6.51839614e-02]
[ 8.49644498e-01 9.98547894e-01 1.00000000e+00 9.98764222e-01
1.59017501e-01 9.98777492e-01 9.98797005e-01 9.98756310e-01
9.98785822e-01 5.71955127e-01 9.98834038e-01 9.98652820e-01
-5.95467715e-02 1.98107829e-02 -3.88527970e-03 9.98810942e-01
8.51337460e-01 9.98882675e-01 9.98815975e-01 9.98789494e-01
-6.69662309e-02]
[ 8.49746438e-01 9.98803332e-01 9.98764222e-01 1.00000000e+00
4.73518047e-01 9.98684853e-01 9.98839959e-01 9.99029920e-01
9.98804479e-01 5.67855583e-01 9.98759386e-01 9.98796277e-01
-6.07517782e-02 1.71388383e-02 -3.20996100e-03 9.98669121e-01
8.51600753e-01 9.98681806e-01 9.99072484e-01 9.98702177e-01
-6.29855810e-02]
[ 3.52784328e-01 2.41076867e-01 2.01621082e-01 4.11538647e-01
9.92999574e-01 2.09351787e-01 2.12464918e-01 1.84566399e-01
2.82162287e-01 8.88835155e-01 1.90613041e-01 2.12150578e-01
2.92104260e-01 6.25221827e-02 8.70607365e-01 2.88645877e-01
3.09283827e-01 2.81253950e-01 1.80307149e-01 2.49082955e-01
5.46192492e-02]
...
[ -5.97205241e-02 -6.51839614e-02 -6.69662309e-02 -6.29855810e-02
7.86918277e-02 -6.49002943e-02 -6.12003747e-02 -6.34500592e-02
-6.75593439e-02 7.23869691e-02 -6.20686862e-02 -5.94039824e-02
-1.00101778e-01 -1.14667128e-01 5.57606897e-02 -6.32884559e-02
-5.33734526e-02 -5.90822523e-02 -6.17068052e-02 -5.76615359e-02
1.00000000e+00]]
我的代码与文档示例类似:
cl = SpectralClustering(n_clusters=4,affinity='precomputed')
y = cl.fit_predict(matrix)
但出现以下错误:
/usr/local/lib/python2.7/dist-packages/scikit_learn-0.17.1-py2.7-linux-x86_64.egg/sklearn/utils/validation.py:629: UserWarning: Array is not symmetric, and will be converted to symmetric by average with its transpose.
warnings.warn("Array is not symmetric, and will be converted "
/usr/local/lib/python2.7/dist-packages/scikit_learn-0.17.1-py2.7-linux-x86_64.egg/sklearn/utils/graph.py:172: RuntimeWarning: invalid value encountered in sqrt
w = np.sqrt(w)
Traceback (most recent call last):
File "/home/mahmood/PycharmProjects/sentence2vec/graphClustering.py", line 23, in <module>
y = cl.fit_predict(matrix)
File "/usr/local/lib/python2.7/dist-packages/scikit_learn-0.17.1-py2.7-linux-x86_64.egg/sklearn/base.py", line 371, in fit_predict
self.fit(X)
File "/usr/local/lib/python2.7/dist-packages/scikit_learn-0.17.1-py2.7-linux-x86_64.egg/sklearn/cluster/spectral.py", line 454, in fit
assign_labels=self.assign_labels)
File "/usr/local/lib/python2.7/dist-packages/scikit_learn-0.17.1-py2.7-linux-x86_64.egg/sklearn/cluster/spectral.py", line 258, in spectral_clustering
eigen_tol=eigen_tol, drop_first=False)
File "/usr/local/lib/python2.7/dist-packages/scikit_learn-0.17.1-py2.7-linux-x86_64.egg/sklearn/manifold/spectral_embedding_.py", line 254, in spectral_embedding
tol=eigen_tol)
File "/usr/lib/python2.7/dist-packages/scipy/sparse/linalg/eigen/arpack/arpack.py", line 1545, in eigsh
symmetric=True, tol=tol)
File "/usr/lib/python2.7/dist-packages/scipy/sparse/linalg/eigen/arpack/arpack.py", line 1033, in get_OPinv_matvec
return LuInv(A).matvec
File "/usr/lib/python2.7/dist-packages/scipy/sparse/linalg/interface.py", line 142, in __new__
obj.__init__(*args, **kwargs)
File "/usr/lib/python2.7/dist-packages/scipy/sparse/linalg/eigen/arpack/arpack.py", line 922, in __init__
self.M_lu = lu_factor(M)
File "/usr/lib/python2.7/dist-packages/scipy/linalg/decomp_lu.py", line 58, in lu_factor
a1 = asarray_chkfinite(a)
File "/usr/lib/python2.7/dist-packages/numpy/lib/function_base.py", line 1022, in asarray_chkfinite
"array must not contain infs or NaNs")
ValueError: array must not contain infs or NaNs
/usr/local/lib/python2.7/dist packages/scikit_learn-0.17.1-py2.7-linux-x86_64.egg/sklearn/utils/validation.py:629:UserWarning:数组不是对称的,将通过转置平均值转换为对称的。
warnings.warn(“数组不是对称的,将被转换”
/usr/local/lib/python2.7/dist软件包/scikit_learn-0.17.1-py2.7-linux-x86_64.egg/sklearn/utils/graph.py:172:运行时警告:在sqrt中遇到无效值
w=np.sqrt(w)
回溯(最近一次呼叫最后一次):
文件“/home/mahmood/PycharmProjects/sentence2vec/graphClustering.py”,第23行,在
y=cl.fit_预测(矩阵)
文件“/usr/local/lib/python2.7/dist packages/scikit_learn-0.17.1-py2.7-linux-x86_64.egg/sklearn/base.py”,第371行,在fit_中
自我适应(X)
文件“/usr/local/lib/python2.7/dist packages/scikit_learn-0.17.1-py2.7-linux-x86_64.egg/sklearn/cluster/spectrum.py”,第454行
分配\u标签=自身。分配\u标签)
文件“/usr/local/lib/python2.7/dist packages/scikit_learn-0.17.1-py2.7-linux-x86_64.egg/sklearn/cluster/spectrum.py”,第258行,spectrum_clustering
本征值=本征值,先下降值=假)
文件“/usr/local/lib/python2.7/dist packages/scikit_-learn-0.17.1-py2.7-linux-x86_64.egg/sklearn/manifold/spectrum_-embedding_u.py”,第254行,spectrum_-embedding中
tol=本征值(tol)
eigsh中的文件“/usr/lib/python2.7/dist packages/scipy/sparse/linalg/eigen/arpack/arpack.py”,第1545行
对称=真,tol=tol)
文件“/usr/lib/python2.7/dist packages/scipy/sparse/linalg/eigen/arpack/arpack.py”,第1033行,在get_OPinv_matvec中
返回LuInv(A).matvec
文件“/usr/lib/python2.7/dist-packages/scipy/sparse/linalg/interface.py”,第142行,新__
对象初始化(*args,**kwargs)
文件“/usr/lib/python2.7/dist packages/scipy/sparse/linalg/eigen/arpack/arpack.py”,第922行,在__
self.M_lu=lu_系数(M)
文件“/usr/lib/python2.7/dist packages/scipy/linalg/decomp_lu.py”,第58行,以lu_系数表示
a1=asarray_chkfinite(a)
asarray\u chkfinite中的文件“/usr/lib/python2.7/dist packages/numpy/lib/function\u base.py”,第1022行
“数组不能包含INF或NAN”)
ValueError:数组不能包含INF或NAN
第一个警告是可以接受的,因为矩阵不是对称的,但矩阵中没有INF或NaN。NaN值的出现是因为矩阵不是相似性矩阵:您的数据包含负相似性。!当取这些值的sqrt
时,会得到NaN
,因此会出现错误
这些警告不仅仅是为了好玩——矩阵分解技术有着严格的要求,允许它们工作并返回有意义的结果
首先修复负面相似性,然后重试。如何检查矩阵是否包含NaN/Inf值?我对打印的矩阵提出了质疑,似乎没有NaN/Inf值。这只是矩阵的一部分,并不准确。您必须检查它的每个元素以确保。numpy.any(numpy.isnan(matrix))和numpy.any(numpy.isinf(matrix))都返回False。我想这已经足够了。我接受了你的建议,但是第一个警告就消失了。第二次警告和其他错误仍然强烈!也不能有负值<代码>sqrt(-1)=nan