Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/346.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python Pandas和scikit学习:ValueError:输入包含NaN、无穷大或对数据类型(';float64';)太大的值_Python_Pandas_Scikit Learn - Fatal编程技术网

Python Pandas和scikit学习:ValueError:输入包含NaN、无穷大或对数据类型(';float64';)太大的值

Python Pandas和scikit学习:ValueError:输入包含NaN、无穷大或对数据类型(';float64';)太大的值,python,pandas,scikit-learn,Python,Pandas,Scikit Learn,我正在尝试使Knearest Neigbors模型适合我的数据。但是,我得到了这个错误: ValueError: Input contains NaN, infinity or a value too large for dtype('float64'). 以下是我的Knearest Neigbors算法的代码: def knn_train_test_new(training_col, target_col, df): np.random.seed(1) df = df.loc

我正在尝试使Knearest Neigbors模型适合我的数据。但是,我得到了这个错误:

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
以下是我的Knearest Neigbors算法的代码:

def knn_train_test_new(training_col, target_col, df):
    np.random.seed(1)
    df = df.loc[np.random.permutation(len(df))]
#    shuffled_index = np.random.permutation(df.index)
#   df = df.reindex(shuffled_index)

    train_df = df.iloc[0:150] #training set has 75% of the data
    test_df = df.iloc[150:] #test set has 25% of the data  
    k = [5]

    rmse = {}

    for k_val in k:

        model = KNeighborsRegressor(n_neighbors = k_val)
        model.fit(train_df[training_col], train_df[target_col])

        predictions = model.predict(test_df[training_col])

        mse = mean_squared_error(test_df[target_col], predictions)

        rmse[k_val] = (mse ** 0.5)

    return rmse

two_features = ["width", "wheel-base"]
rmse_val = knn_train_test(two_features, 'price', numeric_cars)
以及我的数据框的前五行:

numeric_cars.head()

当我使用shuffled_索引(我已经评论过)而不是np.random.permutation时,我没有得到这个错误。我不清楚两者之间的区别

完整的错误跟踪:

ValueError                                Traceback (most recent call last)
<ipython-input-13-03e0dd7acfd0> in <module>()
     25 
     26 two_features = ["width", "wheel-base"]
---> 27 rmse_val = knn_train_test_new(two_features, 'price', numeric_cars)
     28 
     29 #rmse_results = {}

<ipython-input-13-03e0dd7acfd0> in knn_train_test_new(training_col, target_col, df)
     14 
     15         model = KNeighborsRegressor(n_neighbors = k_val)
---> 16         model.fit(train_df[training_col], train_df[target_col])
     17 
     18         predictions = model.predict(test_df[training_col])

~\Anaconda3\lib\site-packages\sklearn\neighbors\base.py in fit(self, X, y)
    743         """
    744         if not isinstance(X, (KDTree, BallTree)):
--> 745             X, y = check_X_y(X, y, "csr", multi_output=True)
    746         self._y = y
    747         return self._fit(X)

~\Anaconda3\lib\site-packages\sklearn\utils\validation.py in check_X_y(X, y, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric, warn_on_dtype, estimator)
    571     X = check_array(X, accept_sparse, dtype, order, copy, force_all_finite,
    572                     ensure_2d, allow_nd, ensure_min_samples,
--> 573                     ensure_min_features, warn_on_dtype, estimator)
    574     if multi_output:
    575         y = check_array(y, 'csr', force_all_finite=True, ensure_2d=False,

~\Anaconda3\lib\site-packages\sklearn\utils\validation.py in check_array(array, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
    451                              % (array.ndim, estimator_name))
    452         if force_all_finite:
--> 453             _assert_all_finite(array)
    454 
    455     shape_repr = _shape_repr(array.shape)

~\Anaconda3\lib\site-packages\sklearn\utils\validation.py in _assert_all_finite(X)
     42             and not np.isfinite(X).all()):
     43         raise ValueError("Input contains NaN, infinity"
---> 44                          " or a value too large for %r." % X.dtype)
     45 
     46 

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
ValueError回溯(最近一次调用)
在()
25
26双_特征=[“宽度”,“轴距”]
--->27 rmse_val=knn_列车测试_新(两个特征,‘价格’,数字车)
28
29#rmse_结果={}
在knn中训练测试新(训练列、目标列、测向)
14
15模型=Kneighbors回归器(n_近邻=k_值)
--->16模型拟合(训练方向[训练方向],训练方向[目标方向])
17
18预测=模型预测(测试df[训练col])
~\Anaconda3\lib\site packages\sklearn\neights\base.py适合(self,X,y)
743         """
744如果不存在(X,(KDTree,BallTree)):
-->745 X,y=检查X_y(X,y,“csr”,多输出=真)
746自身_y=y
747返回自适配(X)
~\Anaconda3\lib\site packages\sklearn\utils\validation.py in check\u X\u y(X,y,accept\u sparse,dtype,order,copy,force\u all\u finite,sure\u 2d,allow\u nd,multi\u output,sure\u min\u samples,sure\u min\u features,y\u numeric,warn\u on\u dtype,estimator)
571 X=检查数组(X,接受稀疏,数据类型,顺序,复制,强制所有有限,
572确保2d,允许nd,确保最小样本,
-->573确保功能、警告(数据类型、估计器)
574如果多输出:
575 y=检查数组(y,'csr',强制所有有限=真,确保2d=假,
检查数组中的~\Anaconda3\lib\site packages\sklearn\utils\validation.py(数组、接受稀疏、数据类型、顺序、复制、强制所有有限、确保2d、允许nd、确保最小样本、确保最小特征、警告数据类型、估计器)
451%(array.ndim,估计器名称))
452如果力是有限的:
-->453断言所有有限(数组)
454
455 shape_repr=_shape_repr(array.shape)
有限(X)中的~\Anaconda3\lib\site packages\sklearn\utils\validation.py
42而不是np.isfinite(X.all()):
43提升值错误(“输入包含NaN,无穷大”
--->44“或对%r.而言太大的值。%X.dtype)
45
46
ValueError:输入包含NaN、无穷大或对数据类型('float64')太大的值。

Post the full error traceback.@Denziloe我编辑了问题看第二个箭头。回溯显示您运行以获取错误的代码与您发布在此处的代码不同。将其更改为您发布在此处的代码,您将不会出现此错误。您应该清楚代码错误的原因。错误相对来说是可以解释的ry-您的数据点不好。当您洗牌索引时,可能无法获得该数据点,因为您没有选择该行。请查找与其余数据点不同的数据点(null、无穷大等)@MJP只需将导致错误的值存储在一个变量中,然后检查它们。sklearn没有撒谎,如果它说你给了它坏值,你就给了它坏值。