Python “如何修复”;找到样本数不一致的输入变量:[100,50]”;错误?
我得到了这个错误,但不知道如何解决它。我想有两个x变量用于我的回归,所以我把它们放在代码中。然而,我得到这个错误,不知道如何重塑我的数组来解决这个问题Python “如何修复”;找到样本数不一致的输入变量:[100,50]”;错误?,python,arrays,numpy,scikit-learn,jupyter-notebook,Python,Arrays,Numpy,Scikit Learn,Jupyter Notebook,我得到了这个错误,但不知道如何解决它。我想有两个x变量用于我的回归,所以我把它们放在代码中。然而,我得到这个错误,不知道如何重塑我的数组来解决这个问题 from sklearn.model_selection import train_test_split from sklearn.tree import DecisionTreeRegressor from sklearn.metrics import r2_score,mean_squared_error X = maindf[['Grad
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import r2_score,mean_squared_error
X = maindf[['Graduate Degree','Asian American Population']].values.reshape(-1,1)
Y = maindf["Democrats 2016"].values.reshape(-1,1)
x_train, x_test, y_train, y_test, = train_test_split(X, Y,train_size=49, random_state=np.random)
DecisionTreeRegModel = DecisionTreeRegressor(max_depth=3).fit(x_train, y_train)
y_pred = DecisionTreeRegModel.predict(x_test)
from sklearn import tree
这里是错误
ValueError Traceback (most recent call last)
<ipython-input-85-9aaccff5b23d> in <module>
5 X = maindf[['Graduate Degree','Asian American Population']].values.reshape(-1,1)
6 Y = maindf["Democrats 2016"].values.reshape(-1,1)
----> 7 x_train, x_test, y_train, y_test, = train_test_split(X, Y,train_size=49, random_state=np.random)
8 DecisionTreeRegModel = DecisionTreeRegressor(max_depth=3).fit(x_train, y_train)
9 y_pred = DecisionTreeRegModel.predict(x_test)
~\anaconda3\lib\site-packages\sklearn\model_selection\_split.py in train_test_split(*arrays, **options)
2125 raise TypeError("Invalid parameters passed: %s" % str(options))
2126
-> 2127 arrays = indexable(*arrays)
2128
2129 n_samples = _num_samples(arrays[0])
~\anaconda3\lib\site-packages\sklearn\utils\validation.py in indexable(*iterables)
291 """
292 result = [_make_indexable(X) for X in iterables]
--> 293 check_consistent_length(*result)
294 return result
295
~\anaconda3\lib\site-packages\sklearn\utils\validation.py in check_consistent_length(*arrays)
254 uniques = np.unique(lengths)
255 if len(uniques) > 1:
--> 256 raise ValueError("Found input variables with inconsistent numbers of"
257 " samples: %r" % [int(l) for l in lengths])
258
ValueError: Found input variables with inconsistent numbers of samples: [100, 50]
ValueError回溯(最近一次调用)
在里面
5 X=maindf[[‘研究生学位’,‘亚裔美国人’]]。值。重塑(-1,1)
6 Y=maindf[“民主党2016”]。价值观。重塑(-1,1)
---->7 x_序列,x_测试,y_序列,y_测试,=序列测试分割(x,y,序列大小=49,随机状态=np.随机)
8 DecisionTreeRegModel=DecisionTreeRegressor(最大深度=3)。装配(x\U系列,y\U系列)
9 y_pred=决策树模型预测(x_检验)
~\anaconda3\lib\site packages\sklearn\model\u selection\\u split.py in train\u test\u split(*数组,**选项)
2125 raise TypeError(“传递的参数无效:%s”%str(选项))
2126
->2127数组=可转位(*数组)
2128
2129 n_samples=_num_samples(数组[0])
可索引(*iterables)中的~\anaconda3\lib\site packages\sklearn\utils\validation.py
291 """
292结果=[[u使X在iterables中可索引(X)]
-->293检查长度是否一致(*结果)
294返回结果
295
检查长度(*数组)中的~\anaconda3\lib\site packages\sklearn\utils\validation.py
254唯一性=np.唯一性(长度)
255如果len(uniques)>1:
-->256 raise VALUERROR(“找到的输入变量的数量不一致”)
257“样本:%r”%[int(l)表示长度为l的样本])
258
ValueError:找到样本数不一致的输入变量:[100,50]
您不需要重塑预测值,这样做会使矩阵变平,因此,您不需要:
X = maindf[['Graduate Degree','Asian American Population']].values.reshape(-1,1)
做:
下面是使用示例数据集运行代码:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import r2_score,mean_squared_error
maindf = pd.DataFrame({'Graduate Degree':np.random.choice([0,1],100),
'Asian American Population':np.random.choice([0,1],100),
"Democrats 2016":np.random.choice([0,1],100)})
X = maindf[['Graduate Degree','Asian American Population']]
Y = maindf["Democrats 2016"].values.reshape(-1,1)
x_train, x_test, y_train, y_test, = train_test_split(X, Y,train_size=49, random_state=np.random)
DecisionTreeRegModel = DecisionTreeRegressor(max_depth=3).fit(x_train, y_train)
y_pred = DecisionTreeRegModel.predict(x_test)
您是否检查了
X.shape
和y.shape
以了解为什么train\u test\u split
会看到一个输入有100行,而另一个只有50行?除此之外:。建议在值上使用to\u numpy()
。请参阅。
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import r2_score,mean_squared_error
maindf = pd.DataFrame({'Graduate Degree':np.random.choice([0,1],100),
'Asian American Population':np.random.choice([0,1],100),
"Democrats 2016":np.random.choice([0,1],100)})
X = maindf[['Graduate Degree','Asian American Population']]
Y = maindf["Democrats 2016"].values.reshape(-1,1)
x_train, x_test, y_train, y_test, = train_test_split(X, Y,train_size=49, random_state=np.random)
DecisionTreeRegModel = DecisionTreeRegressor(max_depth=3).fit(x_train, y_train)
y_pred = DecisionTreeRegModel.predict(x_test)