Python sklearn问题：在执行回归时发现样本数不一致的数组_Python_Arrays_Numpy_Machine Learning_Scikit Learn

Python sklearn问题：在执行回归时发现样本数不一致的数组

python arrays numpy machine-learning scikit-learn

Python sklearn问题：在执行回归时发现样本数不一致的数组,python,arrays,numpy,machine-learning,scikit-learn,Python,Arrays,Numpy,Machine Learning,Scikit Learn,这个问题以前似乎有人问过，但我似乎无法对接受的答案作出进一步澄清，我也无法找出提供的解决方案我正在尝试学习如何使用sklearn与我自己的数据。我基本上只得到了过去100年中两个不同国家GDP的年变化%。我现在只是想学习使用单个变量。我基本上是想用sklearn来预测A国GDP%的变化将与B国GDP的变化百分比相比较问题是我收到一个错误，上面说： ValueError:找到样本数不一致的数组：[1 107] 这是我的密码： import sklearn.linear_model as lm

这个问题以前似乎有人问过，但我似乎无法对接受的答案作出进一步澄清，我也无法找出提供的解决方案

我正在尝试学习如何使用sklearn与我自己的数据。我基本上只得到了过去100年中两个不同国家GDP的年变化%。我现在只是想学习使用单个变量。我基本上是想用sklearn来预测A国GDP%的变化将与B国GDP的变化百分比相比较

问题是我收到一个错误，上面说：

ValueError:找到样本数不一致的数组：[1 107]

这是我的密码：

import sklearn.linear_model as lm
import numpy as np
import scipy.stats as st
import matplotlib.pyplot as plt
import matplotlib.dates as mdates


def bytespdate2num(fmt, encoding='utf-8'):#function to convert bytes to string for the dates.
    strconverter = mdates.strpdate2num(fmt)
    def bytesconverter(b):
        s = b.decode(encoding)
        return strconverter(s)
    return bytesconverter

dataCSV = open('combined_data.csv')

comb_data = []

for line in dataCSV:
    comb_data.append(line)

date, chngdpchange, ausgdpchange = np.loadtxt(comb_data, delimiter=',', unpack=True, converters={0: bytespdate2num('%d/%m/%Y')})


chntrain = chngdpchange[:-1]
chntest = chngdpchange[-1:]

austrain = ausgdpchange[:-1]
austest = ausgdpchange[-1:]

regr = lm.LinearRegression()
regr.fit(chntrain, austrain)

print('Coefficients: \n', regr.coef_)

print("Residual sum of squares: %.2f"
      % np.mean((regr.predict(chntest) - austest) ** 2))

print('Variance score: %.2f' % regr.score(chntest, austest))

plt.scatter(chntest, austest,  color='black')
plt.plot(chntest, regr.predict(chntest), color='blue')

plt.xticks(())
plt.yticks(())

plt.show()

我做错了什么？我基本上尝试将sklearn教程（他们使用了一些糖尿病数据集）应用到我自己的简单数据中。我的数据只包含日期、A国在该特定年份的GDP百分比变化以及B国在同一年份的GDP百分比变化

我尝试了解决方案，但只收到了完全相同的错误

以下是完整的回溯，以备您查看：

Traceback (most recent call last):
  File "D:\My Stuff\Dropbox\Python\Python projects\test regression\tester.py", line 34, in <module>
    regr.fit(chntrain, austrain)
  File "D:\Programs\Installed\Python34\lib\site-packages\sklearn\linear_model\base.py", line 376, in fit
    y_numeric=True, multi_output=True)
  File "D:\Programs\Installed\Python34\lib\site-packages\sklearn\utils\validation.py", line 454, in check_X_y
    check_consistent_length(X, y)
  File "D:\Programs\Installed\Python34\lib\site-packages\sklearn\utils\validation.py", line 174, in check_consistent_length
    "%s" % str(uniques))
ValueError: Found arrays with inconsistent numbers of samples: [  1 107]

回溯（最近一次呼叫最后一次）：
文件“D:\My Stuff\Dropbox\Python\Python projects\test regression\tester.py”，第34行，在
重新装配（chntrain，澳大利亚）
文件“D:\Programs\Installed\Python34\lib\site packages\sklearn\linear\u model\base.py”，第376行
y_数值=真，多输出=真）
文件“D:\Programs\Installed\Python34\lib\site packages\sklearn\utils\validation.py”，第454行，在check\X\u y中
检查长度（X，y）是否一致
文件“D:\Programs\Installed\Python34\lib\site packages\sklearn\utils\validation.py”，第174行，检查长度是否一致
%s“%str（唯一性））
ValueError:找到样本数不一致的数组：[1 107]

这看起来不对。

fit

的第一个参数应该是一个

，它指的是一个特征向量。第二个参数应该是

，它是与

关联的正确答案（目标）向量

例如，如果你有GDP，你可能有：

X[0] = [43, 23, 52] -> y[0] = 5
# meaning the first year had the features [43, 23, 52] (I just made them up)
# and the change that year was 5

从你的名字判断，

chntrain

和

austrain

都是特征向量。从加载数据的方式判断，最后一列可能就是目标

也许你需要做一些事情，比如：

chntrain_X, chntrain_y = chntrain[:, :-1], chntrain[:, -1]
# you can do the same with austrain and concatenate them or test on them if this part works
regr.fit(chntrain_X, chntrain_y)

但是，如果不知道数据的确切存储格式，我们无法判断。

尝试将

chntrain

更改为二维数组而不是一维数组，也就是说，将其重塑为

（len（chntrain），1）

对于预测，也将

chntest

更改为二维数组。

在fit（X，y）中，输入参数X应为二维数组。但是，如果数据中的X仅为一维，则可以将其重塑为二维数组，如下所示：

regr.fit（chntrain\u X.reforme（len（chntrain\u X），1），chntrain\u Y）

我也遇到过类似的问题，并找到了解决方案

其中出现以下错误：

ValueError: Found arrays with inconsistent numbers of samples: [  1 107]

[1 107]部分基本上是说您的数组是错误的。Sklearn认为您有107列数据和一行

要解决此问题，请尝试将X数据进行如下转换：

chntrain.T

重新训练你的体能：

regr.fit(chntrain, austrain)

根据“austrain”数据的外观，您可能也需要对其进行转置。

您也可以使用

np.newaxis

。示例可以是

X=X[：，np.newaxis]

。我在

检查

chntrain

和

austrain

的形状，然后再将其分解为训练集/测试集。它们应该有相同的形状；这个错误似乎表明大小不一样。我该怎么做？我一直在谷歌上搜索，但让我重新塑造或找到形状的每个解决方案都会给出错误：索引器错误：数组的索引太多，例如，

print chngdpchange.shape，ausgdpchange.shape

regr.fit(chntrain, austrain)