Python 不同长度的决策树错误

Python 不同长度的决策树错误,python,Python,我试图计算不同深度的决策树的测试和训练误差 train_error = [] test_error = [] for i in range (3,21): X_train, X_test, y_train, y_test = train_test_split(womendata, womeny, test_size=0.4, random_state=1 ) decitiontree = tree.DecisionTreeClassifier(criterion='gin

我试图计算不同深度的决策树的测试和训练误差

train_error = []
test_error = []    
for i in range (3,21):
    X_train, X_test, y_train, y_test = train_test_split(womendata, womeny, test_size=0.4, random_state=1 )
    decitiontree = tree.DecisionTreeClassifier(criterion='gini', splitter='best', max_depth=i, class_weight = 'balanced', min_samples_split=i)
    clf = decitiontree.fit(X_train, y_train)
    train_error.append( 1 -  clf.score(X_train, y_train)  )     
    test_error.append( 1 -  clf.score(X_test, y_test)  )
在python 3中,我得到一个错误:

Traceback (most recent call last):
  File "<stdin>", line 4, in <module>
  File "/usr/local/lib/python3.4/dist-packages/sklearn/tree/tree.py", line 154, in fit
    X = check_array(X, dtype=DTYPE, accept_sparse="csc")
  File "/usr/local/lib/python3.4/dist-packages/sklearn/utils/validation.py", line 398, in check_array
    _assert_all_finite(array)
  File "/usr/local/lib/python3.4/dist-packages/sklearn/utils/validation.py", line 54, in _assert_all_finite
    " or a value too large for %r." % X.dtype)

ValueError: Input contains NaN, infinity or a value too large for dtype('float32').
回溯(最近一次呼叫最后一次):
文件“”,第4行,在
文件“/usr/local/lib/python3.4/dist-packages/sklearn/tree/tree.py”,第154行,适合
X=检查数组(X,dtype=dtype,accept\u sparse=“csc”)
文件“/usr/local/lib/python3.4/dist packages/sklearn/utils/validation.py”,第398行,在check_数组中
_断言所有有限(数组)
文件“/usr/local/lib/python3.4/dist-packages/sklearn/utils/validation.py”,第54行,在assert\u all\u finite中
“或对%r而言太大的值。%X.dtype)
ValueError:输入包含NaN、无穷大或对数据类型('float32')太大的值。

两个womendata和women y的长度相同,集合中没有丢失的数据。

由于错误,您提供的数据数组包含无效值

ValueError:输入包含NaN、无穷大或太大的值 数据类型('float32')

请检查您的数据是否有效,意思是:

  • womendata或womeny上没有NaN值
  • womendata或womeny上没有Inf值
  • 值在float32 min和float32 max范围内
  • 您可以使用以下代码:

    import numpy as np
    info = np.finfo(np.float64)
    
    for x in [womendata, womeny]:
        assert np.all(x <= info.max) and np.all(x >= info.min), 'not all values in range'
        assert np.all(x != np.inf) and np.all(x != -np.inf), 'data contains infinity value'
        assert np.all(x is not np.nan), 'data contains Nan value'
    
    将numpy导入为np
    info=np.finfo(np.float64)
    对于[womendata,womeny]中的x:
    断言np.all(x=info.min),“不是范围内的所有值”
    断言np.all(x!=np.inf)和np.all(x!=-np.inf),“数据包含无穷大值”
    断言np.all(x不是np.nan),“数据包含nan值”