Python “如何修复”;找到样本数不一致的输入变量:[100,50]”;错误?

Python “如何修复”;找到样本数不一致的输入变量:[100,50]”;错误?,python,arrays,numpy,scikit-learn,jupyter-notebook,Python,Arrays,Numpy,Scikit Learn,Jupyter Notebook,我得到了这个错误,但不知道如何解决它。我想有两个x变量用于我的回归,所以我把它们放在代码中。然而,我得到这个错误,不知道如何重塑我的数组来解决这个问题 from sklearn.model_selection import train_test_split from sklearn.tree import DecisionTreeRegressor from sklearn.metrics import r2_score,mean_squared_error X = maindf[['Grad

我得到了这个错误,但不知道如何解决它。我想有两个x变量用于我的回归,所以我把它们放在代码中。然而,我得到这个错误,不知道如何重塑我的数组来解决这个问题

from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import r2_score,mean_squared_error

X = maindf[['Graduate Degree','Asian American Population']].values.reshape(-1,1)
Y = maindf["Democrats 2016"].values.reshape(-1,1)
x_train, x_test, y_train, y_test, = train_test_split(X, Y,train_size=49, random_state=np.random)
DecisionTreeRegModel = DecisionTreeRegressor(max_depth=3).fit(x_train, y_train)
y_pred = DecisionTreeRegModel.predict(x_test)
from sklearn import tree
这里是错误

ValueError                                Traceback (most recent call last)
<ipython-input-85-9aaccff5b23d> in <module>
      5 X = maindf[['Graduate Degree','Asian American Population']].values.reshape(-1,1)
      6 Y = maindf["Democrats 2016"].values.reshape(-1,1)
----> 7 x_train, x_test, y_train, y_test, = train_test_split(X, Y,train_size=49, random_state=np.random)
      8 DecisionTreeRegModel = DecisionTreeRegressor(max_depth=3).fit(x_train, y_train)
      9 y_pred = DecisionTreeRegModel.predict(x_test)

~\anaconda3\lib\site-packages\sklearn\model_selection\_split.py in train_test_split(*arrays, **options)
   2125         raise TypeError("Invalid parameters passed: %s" % str(options))
   2126 
-> 2127     arrays = indexable(*arrays)
   2128 
   2129     n_samples = _num_samples(arrays[0])

~\anaconda3\lib\site-packages\sklearn\utils\validation.py in indexable(*iterables)
    291     """
    292     result = [_make_indexable(X) for X in iterables]
--> 293     check_consistent_length(*result)
    294     return result
    295 

~\anaconda3\lib\site-packages\sklearn\utils\validation.py in check_consistent_length(*arrays)
    254     uniques = np.unique(lengths)
    255     if len(uniques) > 1:
--> 256         raise ValueError("Found input variables with inconsistent numbers of"
    257                          " samples: %r" % [int(l) for l in lengths])
    258 

ValueError: Found input variables with inconsistent numbers of samples: [100, 50]
ValueError回溯(最近一次调用)
在里面
5 X=maindf[[‘研究生学位’,‘亚裔美国人’]]。值。重塑(-1,1)
6 Y=maindf[“民主党2016”]。价值观。重塑(-1,1)
---->7 x_序列,x_测试,y_序列,y_测试,=序列测试分割(x,y,序列大小=49,随机状态=np.随机)
8 DecisionTreeRegModel=DecisionTreeRegressor(最大深度=3)。装配(x\U系列,y\U系列)
9 y_pred=决策树模型预测(x_检验)
~\anaconda3\lib\site packages\sklearn\model\u selection\\u split.py in train\u test\u split(*数组,**选项)
2125 raise TypeError(“传递的参数无效:%s”%str(选项))
2126
->2127数组=可转位(*数组)
2128
2129 n_samples=_num_samples(数组[0])
可索引(*iterables)中的~\anaconda3\lib\site packages\sklearn\utils\validation.py
291     """
292结果=[[u使X在iterables中可索引(X)]
-->293检查长度是否一致(*结果)
294返回结果
295
检查长度(*数组)中的~\anaconda3\lib\site packages\sklearn\utils\validation.py
254唯一性=np.唯一性(长度)
255如果len(uniques)>1:
-->256 raise VALUERROR(“找到的输入变量的数量不一致”)
257“样本:%r”%[int(l)表示长度为l的样本])
258
ValueError:找到样本数不一致的输入变量:[100,50]

您不需要重塑预测值,这样做会使矩阵变平,因此,您不需要:

X = maindf[['Graduate Degree','Asian American Population']].values.reshape(-1,1)
做:

下面是使用示例数据集运行代码:

import numpy as np
import pandas as pd

from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import r2_score,mean_squared_error

maindf = pd.DataFrame({'Graduate Degree':np.random.choice([0,1],100),
                      'Asian American Population':np.random.choice([0,1],100),
                      "Democrats 2016":np.random.choice([0,1],100)})

X = maindf[['Graduate Degree','Asian American Population']]
Y = maindf["Democrats 2016"].values.reshape(-1,1)
x_train, x_test, y_train, y_test, = train_test_split(X, Y,train_size=49, random_state=np.random)
DecisionTreeRegModel = DecisionTreeRegressor(max_depth=3).fit(x_train, y_train)
y_pred = DecisionTreeRegModel.predict(x_test)

您是否检查了
X.shape
y.shape
以了解为什么
train\u test\u split
会看到一个输入有100行,而另一个只有50行?除此之外:
。建议在
值上使用to\u numpy()
。请参阅。
import numpy as np
import pandas as pd

from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import r2_score,mean_squared_error

maindf = pd.DataFrame({'Graduate Degree':np.random.choice([0,1],100),
                      'Asian American Population':np.random.choice([0,1],100),
                      "Democrats 2016":np.random.choice([0,1],100)})

X = maindf[['Graduate Degree','Asian American Population']]
Y = maindf["Democrats 2016"].values.reshape(-1,1)
x_train, x_test, y_train, y_test, = train_test_split(X, Y,train_size=49, random_state=np.random)
DecisionTreeRegModel = DecisionTreeRegressor(max_depth=3).fit(x_train, y_train)
y_pred = DecisionTreeRegModel.predict(x_test)