Python ValueError:模型的特征数必须与输入匹配。模型n_特征为11,输入n_特征为2

Python ValueError:模型的特征数必须与输入匹配。模型n_特征为11,输入n_特征为2,python,numpy,machine-learning,jupyter-notebook,data-science,Python,Numpy,Machine Learning,Jupyter Notebook,Data Science,在jupyter笔记本中运行下面的代码时,我得到了值错误 ValueError:模型的特征数必须与输入匹配。模型n_特征为11,输入n_特征为2 如何解决这个问题 # Visualising the Training set results from matplotlib.colors import ListedColormap X_set, y_set = X_train, y_train X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].m

在jupyter笔记本中运行下面的代码时,我得到了值错误

ValueError:模型的特征数必须与输入匹配。模型n_特征为11,输入n_特征为2

如何解决这个问题

# Visualising the Training set results
from matplotlib.colors import ListedColormap
X_set, y_set = X_train, y_train
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() + 1, step = 0.01),
                     np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01))
plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape),
             alpha = 0.75, cmap = ListedColormap(('red', 'green')))
我得到以下错误:

ValueError                                Traceback (most recent call last)
<ipython-input-42-bc13e66e79fe> in <module>
      4 X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() + 1, step = 0.01),
      5                      np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01))
----> 6 plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape),
      7              alpha = 0.75, cmap = ListedColormap(('red', 'green')))
      8 plt.xlim(X1.min(), X1.max())

~\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py in predict(self, X)
    627             The predicted classes.
    628         """
--> 629         proba = self.predict_proba(X)
    630 
    631         if self.n_outputs_ == 1:

~\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py in predict_proba(self, X)
    671         check_is_fitted(self)
    672         # Check data
--> 673         X = self._validate_X_predict(X)
    674 
    675         # Assign chunk of trees to jobs

~\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py in _validate_X_predict(self, X)
    419         check_is_fitted(self)
    420 
--> 421         return self.estimators_[0]._validate_X_predict(X, check_input=True)
    422 
    423     @property

~\anaconda3\lib\site-packages\sklearn\tree\_classes.py in _validate_X_predict(self, X, check_input)
    394         n_features = X.shape[1]
    395         if self.n_features_ != n_features:
--> 396             raise ValueError("Number of features of the model must "
    397                              "match the input. Model n_features is %s and "
    398                              "input n_features is %s "

ValueError: Number of features of the model must match the input. Model n_features is 11 and input n_features is 2 
ValueError回溯(最近一次调用)
在里面
4x1,X2=np.meshgrid(np.arange(开始=X_集[:,0].min()-1,停止=X_集[:,0].max()+1,步长=0.01),
5 np.arange(开始=X_集[:,1].min()-1,停止=X_集[:,1].max()+1,步长=0.01))
---->6 plt.contourf(X1,X2,分类器.predict(np.array([X1.ravel(),X2.ravel()]).T)。重塑(X1.shape),
7 alpha=0.75,cmap=ListedColormap((“红色”、“绿色”))
8 plt.xlim(X1.min(),X1.max())
预测中的~\anaconda3\lib\site packages\sklearn\employee\\u forest.py(self,X)
627预测类。
628         """
-->629概率=自我预测概率(X)
630
631如果self.n_输出=1:
预测概率中的~\anaconda3\lib\site packages\sklearn\employee\\u forest.py(self,X)
671检查是否已安装(自身)
672#检查数据
-->673 X=自我验证X预测(X)
674
675#为作业分配树块
~\anaconda3\lib\site packages\sklearn\employee\\u forest.py in\u validate\u X\u predict(self,X)
419检查是否已安装(自身)
420
-->421返回自.估计量[0]。\u验证\u X\u预测(X,检查\u输入=真)
422
423@property
~\anaconda3\lib\site packages\sklearn\tree\\u classes.py in\u validate\u X\u predict(self,X,check\u输入)
394 n_特征=X.形状[1]
395如果self.n_特征!=n_特征:
-->396 raise VALUE ERROR(“模型的特征数量必须”
397“匹配输入。型号n_功能为%s和”
398“输入n_特征为%s”
ValueError:模型的特征数必须与输入匹配。模型n_特征数为11,输入n_特征数为2

完整的模型代码:

我将按照我理解问题的方式修复代码,添加了几行额外的代码。主要问题是,您只为预测输入第1列和第2列,但predictor需要11列1-11。因此,第3-11列应该以某种方式填充。至少您可以用零填充它们

在我的解决方案中,我按第1列对训练集进行排序,然后在使用构建网格网格时,我试图通过从网格网格中找到值接近X1的最近的第1列值来近似预测所需的第3-11列。也就是说,我试图找到第3-11列的最佳近似值,仅给出第1列,这并不是用ze填充第3-11列ros,这也是可以做到的

此外,我还注释了sklearn.cross\u validation import train\u test\u split中的行
,并将其替换为sklearn.model\u selection import train\u test\u split中的
,因为第一行使用旧的sklearn库,在新版本中,只有第二行工作,子模块名称已更改。请自行选择此行的正确变体

# Random Forest Classification

# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset
dataset = pd.read_csv('finalplacementdata3.csv')
X = dataset.iloc[:, range(1, 12)].values
y = dataset.iloc[:, 12].values

siX = np.lexsort((X[:, 1], X[:, 0]))
sX, sy = X[siX], y[siX]

# Splitting the dataset into the Training set and Test set
#from sklearn.cross_validation import train_test_split
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)

# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

# Fitting Random Forest Classification to the Training set
from sklearn.ensemble import RandomForestClassifier
classifier = RandomForestClassifier(n_estimators = 10, criterion = 'entropy', random_state = 0)
classifier.fit(X_train, y_train)

# Predicting the Test set results
y_pred = classifier.predict(X_test)

# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)

# Visualising the Training set results
from matplotlib.colors import ListedColormap
X_set, y_set = X_train, y_train
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() + 1, step = 0.01),
                     np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01))
                     
riX = np.minimum(sX.shape[0] - 1, np.searchsorted(sX[:, 0], X1.ravel()))
rX = X[riX]

plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(), X2.ravel()] + list(rX[:, 2:].T)).T).reshape(X1.shape),
             alpha = 0.75, cmap = ListedColormap(('red', 'green')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
    plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
                c = ListedColormap(('red', 'green'))(i), label = j)
plt.title('Random Forest Classification (Training set)')
plt.xlabel('Quants')
plt.ylabel('CGPA')
plt.legend()
plt.show()

# Visualising the Test set results
from matplotlib.colors import ListedColormap
X_set, y_set = X_test, y_test
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() + 1, step = 0.01),
                     np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01))

riX = np.minimum(sX.shape[0] - 1, np.searchsorted(sX[:, 0], X1.ravel()))
rX = X[riX]

plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(), X2.ravel()] + list(rX[:, 2:].T)).T).reshape(X1.shape),
             alpha = 0.75, cmap = ListedColormap(('red', 'green')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
    plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
                c = ListedColormap(('red', 'green'))(i), label = j)
plt.title('Random Forest Classification (Test set)')
plt.xlabel('Quants')
plt.ylabel('CGPA')
plt.legend()
plt.show()

您的模型(
分类器
)经过训练,每个X输入中有11个数字。但您为其提供了2个数字。即,您的预测数组
np.array([X1.ravel(),X2.ravel()])).T
只有两列,但应该有11列。如果您提供模型的代码,我们可以调查问题。或者,您可以创建11列,使用与上面相同的11个X,如X1、X2、X3…X11,更好地作为原因数组。@Arty请从这里检查完整的模型代码-->是的,在您的代码中,您正在训练模型进行预测第12列由第1-11列组成。因此,在代码的最后一部分,当您可视化和预测(当您有异常时)时,您只提供了两列X1、X2,但需要提供11列。