Python ValueError：输入包含NaN、无穷大或一个对于使用scikit学习的数据类型（'；float64'；）太大的值_Python_Scikit Learn

Python ValueError：输入包含NaN、无穷大或一个对于使用scikit学习的数据类型（'；float64'；）太大的值

python scikit-learn

Python ValueError：输入包含NaN、无穷大或一个对于使用scikit学习的数据类型（'；float64'；）太大的值,python,scikit-learn,Python,Scikit Learn,cocluster.fit（X）我选择SpectralClustering对大约30k条推文进行聚类，在将数据X放入“cocluster”之前，一切都进行得很顺利它引发的错误如下所示 from sklearn.cluster.bicluster import SpectralCoclustering from sklearn.feature_extraction.text import TfidfVectorizer def number_normalizer(tokens): &q

cocluster.fit（X）

我选择SpectralClustering对大约30k条推文进行聚类，在将数据X放入“cocluster”之前，一切都进行得很顺利

它引发的错误如下所示

from sklearn.cluster.bicluster import SpectralCoclustering
from sklearn.feature_extraction.text import TfidfVectorizer
def number_normalizer(tokens):
    """ Map all numeric tokens to a placeholder.
    For many applications, tokens that begin with a number are not directly
    useful, but the fact that such a token exists can be relevant.  By applying
    this form of dimensionality reduction, some methods may perform better.
    """
    return ("#NUMBER" if token[0].isdigit() else token for token in tokens)


class NumberNormalizingVectorizer(TfidfVectorizer):

    def build_tokenizer(self):
        tokenize = super(NumberNormalizingVectorizer, self).build_tokenizer()
        return lambda doc: list(number_normalizer(tokenize(doc)))

vectorizer = NumberNormalizingVectorizer(stop_words='english', min_df=5)
cocluster = SpectralCoclustering(n_clusters=5, svd_method='arpack', random_state=0)
X = vectorizer.fit_transform(data)

当我以错误报告的形式键入代码时，它是“False”。发生错误时应该是真的，对吗

那么还有什么可以找到这个bug的吗？谢谢

False

您应该发布一些产生此错误的数据示例。在最后一行中，将

np.isfinite（X）.all（）

更改为

np.isfinite（X）.any（）

@VivekKumar源代码说它是.all（），那么为什么要将其更改为any（）？啊，是的。糟糕的是，我忽略了前面的

，而不是。你是正确的。你应该发布一些产生此错误的数据示例。同样在最后一行中，将np.isfinite（X）.all（）
更改为np.isfinite（X）.any（）
@VivekKumar源代码说它是.all（），那么为什么要将其更改为any（）？啊，是的。糟糕的是，我忽略了前面的，而不是。你说得对。
.env/lib/python3.5/site-packages/sklearn/utils/validation.py", line 43, in _assert_all_finite
    " or a value too large for %r." % X.dtype)
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

X.dtype.char in np.typecodes['AllFloat'] and not np.isfinite(X.sum()) and not np.isfinite(X).all()