Python ValueError:输入包含NaN、无穷大或对数据类型太大的值(';float32';)
cs-training.csv类似于:Python ValueError:输入包含NaN、无穷大或对数据类型太大的值(';float32';),python,pandas,numpy,Python,Pandas,Numpy,cs-training.csv类似于: +----+------------------+--------------------------------------+-----+--------------------------------------+-------------+---------------+---------------------------------+-------------------------+------------------------------+
+----+------------------+--------------------------------------+-----+--------------------------------------+-------------+---------------+---------------------------------+-------------------------+------------------------------+--------------------------------------+--------------------+
| | SeriousDlqin2yrs | RevolvingUtilizationOfUnsecuredLines | age | NumberOfTime30-59DaysPastDueNotWorse | DebtRatio | MonthlyIncome | NumberOfOpenCreditLinesAndLoans | NumberOfTimes90DaysLate | NumberRealEstateLoansOrLines | NumberOfTime60-89DaysPastDueNotWorse | NumberOfDependents |
+----+------------------+--------------------------------------+-----+--------------------------------------+-------------+---------------+---------------------------------+-------------------------+------------------------------+--------------------------------------+--------------------+
| 1 | 1 | 0.766126609 | 45 | 2 | 0.802982129 | 9120 | 13 | 0 | 6 | 0 | 2 |
| 2 | 0 | 0.957151019 | 40 | 0 | 0.121876201 | 2600 | 4 | 0 | 0 | 0 | 1 |
| 3 | 0 | 0.65818014 | 38 | 1 | 0.085113375 | 3042 | 2 | 1 | 0 | 0 | 0 |
| 4 | 0 | 0.233809776 | 30 | 0 | 0.036049682 | 3300 | 5 | 0 | 0 | 0 | 0 |
| 5 | 0 | 0.9072394 | 49 | 1 | 0.024925695 | 63588 | 7 | 0 | 1 | 0 | 0 |
| 6 | 0 | 0.213178682 | 74 | 0 | 0.375606969 | 3500 | 3 | 0 | 1 | 0 | 1 |
| 7 | 0 | 0.305682465 | 57 | 0 | 5710 | NA | 8 | 0 | 3 | 0 | 0 |
| 8 | 0 | 0.754463648 | 39 | 0 | 0.209940017 | 3500 | 8 | 0 | 0 | 0 | 0 |
| 9 | 0 | 0.116950644 | 27 | 0 | 46 | NA | 2 | 0 | 0 | 0 | NA |
| 10 | 0 | 0.189169052 | 57 | 0 | 0.606290901 | 23684 | 9 | 0 | 4 | 0 | 2 |
| 11 | 0 | 0.644225962 | 30 | 0 | 0.30947621 | 2500 | 5 | 0 | 0 | 0 | 0 |
| 12 | 0 | 0.01879812 | 51 | 0 | 0.53152876 | 6501 | 7 | 0 | 2 | 0 | 2 |
| 13 | 0 | 0.010351857 | 46 | 0 | 0.298354075 | 12454 | 13 | 0 | 2 | 0 | 2 |
| 14 | 1 | 0.964672555 | 40 | 3 | 0.382964747 | 13700 | 9 | 3 | 1 | 1 | 2 |
| 15 | 0 | 0.019656581 | 76 | 0 | 477 | 0 | 6 | 0 | 1 | 0 | 0 |
| 16 | 0 | 0.548458062 | 64 | 0 | 0.209891754 | 11362 | 7 | 0 | 1 | 0 | 2 |
| 17 | 0 | 0.061086118 | 78 | 0 | 2058 | NA | 10 | 0 | 2 | 0 | 0 |
| 18 | 0 | 0.166284079 | 53 | 0 | 0.18827406 | 8800 | 7 | 0 | 0 | 0 | 0 |
| 19 | 0 | 0.221812771 | 43 | 0 | 0.527887839 | 3280 | 7 | 0 | 1 | 0 | 2 |
| 20 | 0 | 0.602794411 | 25 | 0 | 0.065868263 | 333 | 2 | 0 | 0 | 0 | 0 |
| 21 | 0 | 0.200923382 | 43 | 0 | 0.430046338 | 12300 | 10 | 0 | 2 | 0 | 0 |
+----+------------------+--------------------------------------+-----+--------------------------------------+-------------+---------------+---------------------------------+-------------------------+------------------------------+--------------------------------------+--------------------+
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestRegressor
# using RF to predict and fill null
def set_missing(df):
process_df = df.ix[:,[5,0,1,2,3,4,6,7,8,9]]
known = process_df[process_df.MonthlyIncome.notnull()].as_matrix()
unknown = process_df[process_df.MonthlyIncome.isnull()].as_matrix()
X = known[:, 1:]
y = known[:, 0]
rfr = RandomForestRegressor(random_state=0, n_estimators=200,max_depth=3,n_jobs=-1)
rfr.fit(X,y)
predicted = rfr.predict(unknown[:, 1:]).round(0)
print(predicted)
# fill null,and this line goes wrong
df.loc[(df.MonthlyIncome.isnull()), 'MonthlyIncome'] = predicted
return df
if __name__ == '__main__':
data = pd.read_csv('cs-training.csv')
data.describe().to_csv('DataDescribe.csv')
data=set_missing(data)
data=data.dropna()
data = data.drop_duplicates()
data.to_csv('MissingData.csv',index=False)
data.describe().to_csv('MissingDataDescribe.csv')
我已经检查了关于“ValueError:Input包含NaN、无穷大或对于dtype('float32')来说太大的值”的页面,但是我的情况似乎不同。希望有人知道为什么以及如何修复,请提供帮助。谢谢
---------------------------------------------------------------------------ValueError回溯(最近的调用
最后)在()
---->1数据=集合_缺失(数据)
集合中缺少(df)
13 rfr配合(X,y)
14
--->15预测=rfr.predict(未知[:,1:])。四舍五入(0)
16份打印(预计)
十七,
D:\程序文件
中的(x86)\Anaconda3\lib\site packages\sklearn\employee\forest.py
预测(自我,X)
683 """
684#检查数据
-->685 X=自我验证X预测(X)
686
687#为作业分配一块树
D:\程序文件
中的(x86)\Anaconda3\lib\site packages\sklearn\employee\forest.py
_验证X预测(self,X)
353“在使用模型之前调用fit
”)
354
-->355返回自估计量[0]。\u验证\u X\u预测(X,检查\u输入=真)
356
357@property
D:\程序文件
中的(x86)\Anaconda3\lib\site packages\sklearn\tree\tree.py
_验证\u X\u预测(self、X、check\u输入)
363
364如果检查_输入:
-->365 X=检查数组(X,dtype=dtype,accept\u sparse=“csr”)
366如果issparse(X)和(X.index.dtype!=np.intc或
367 X.indptr.dtype!=np.intc):
D:\程序文件
中的(x86)\Anaconda3\lib\site packages\sklearn\utils\validation.py
检查数组(数组、接受稀疏、数据类型、顺序、副本、,
强制所有有限,确保2d,允许nd,确保最小样本,
确保\u最小\u功能,警告\u数据类型,估计器)
405%(array.ndim,估计器名称))
406如果力是有限的:
-->407断言所有有限(数组)
408
409 shape_repr=_shape_repr(array.shape)
D:\程序文件
中的(x86)\Anaconda3\lib\site packages\sklearn\utils\validation.py
_断言所有有限(X)
56而不是np.isfinite(X.all()):
57提升值错误(“输入包含NaN,无穷大”
--->58“或对于%r.%X.dtype而言太大的值)
59
六十
ValueError:输入包含NaN、无穷大或太大的值
数据类型('float32')
“Monthlyncome”列包含值“NA”-您的代码似乎无法处理该值。您的数据框中有NA值。@Juliusz我需要预测它,所以它包含NA?