Python ValueError:模型的特征数必须与输入匹配。模型n_特征为11,输入n_特征为2
在jupyter笔记本中运行下面的代码时,我得到了值错误 ValueError:模型的特征数必须与输入匹配。模型n_特征为11,输入n_特征为2 如何解决这个问题Python ValueError:模型的特征数必须与输入匹配。模型n_特征为11,输入n_特征为2,python,numpy,machine-learning,jupyter-notebook,data-science,Python,Numpy,Machine Learning,Jupyter Notebook,Data Science,在jupyter笔记本中运行下面的代码时,我得到了值错误 ValueError:模型的特征数必须与输入匹配。模型n_特征为11,输入n_特征为2 如何解决这个问题 # Visualising the Training set results from matplotlib.colors import ListedColormap X_set, y_set = X_train, y_train X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].m
# Visualising the Training set results
from matplotlib.colors import ListedColormap
X_set, y_set = X_train, y_train
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() + 1, step = 0.01),
np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01))
plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape),
alpha = 0.75, cmap = ListedColormap(('red', 'green')))
我得到以下错误:
ValueError Traceback (most recent call last)
<ipython-input-42-bc13e66e79fe> in <module>
4 X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() + 1, step = 0.01),
5 np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01))
----> 6 plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape),
7 alpha = 0.75, cmap = ListedColormap(('red', 'green')))
8 plt.xlim(X1.min(), X1.max())
~\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py in predict(self, X)
627 The predicted classes.
628 """
--> 629 proba = self.predict_proba(X)
630
631 if self.n_outputs_ == 1:
~\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py in predict_proba(self, X)
671 check_is_fitted(self)
672 # Check data
--> 673 X = self._validate_X_predict(X)
674
675 # Assign chunk of trees to jobs
~\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py in _validate_X_predict(self, X)
419 check_is_fitted(self)
420
--> 421 return self.estimators_[0]._validate_X_predict(X, check_input=True)
422
423 @property
~\anaconda3\lib\site-packages\sklearn\tree\_classes.py in _validate_X_predict(self, X, check_input)
394 n_features = X.shape[1]
395 if self.n_features_ != n_features:
--> 396 raise ValueError("Number of features of the model must "
397 "match the input. Model n_features is %s and "
398 "input n_features is %s "
ValueError: Number of features of the model must match the input. Model n_features is 11 and input n_features is 2
ValueError回溯(最近一次调用)
在里面
4x1,X2=np.meshgrid(np.arange(开始=X_集[:,0].min()-1,停止=X_集[:,0].max()+1,步长=0.01),
5 np.arange(开始=X_集[:,1].min()-1,停止=X_集[:,1].max()+1,步长=0.01))
---->6 plt.contourf(X1,X2,分类器.predict(np.array([X1.ravel(),X2.ravel()]).T)。重塑(X1.shape),
7 alpha=0.75,cmap=ListedColormap((“红色”、“绿色”))
8 plt.xlim(X1.min(),X1.max())
预测中的~\anaconda3\lib\site packages\sklearn\employee\\u forest.py(self,X)
627预测类。
628 """
-->629概率=自我预测概率(X)
630
631如果self.n_输出=1:
预测概率中的~\anaconda3\lib\site packages\sklearn\employee\\u forest.py(self,X)
671检查是否已安装(自身)
672#检查数据
-->673 X=自我验证X预测(X)
674
675#为作业分配树块
~\anaconda3\lib\site packages\sklearn\employee\\u forest.py in\u validate\u X\u predict(self,X)
419检查是否已安装(自身)
420
-->421返回自.估计量[0]。\u验证\u X\u预测(X,检查\u输入=真)
422
423@property
~\anaconda3\lib\site packages\sklearn\tree\\u classes.py in\u validate\u X\u predict(self,X,check\u输入)
394 n_特征=X.形状[1]
395如果self.n_特征!=n_特征:
-->396 raise VALUE ERROR(“模型的特征数量必须”
397“匹配输入。型号n_功能为%s和”
398“输入n_特征为%s”
ValueError:模型的特征数必须与输入匹配。模型n_特征数为11,输入n_特征数为2
完整的模型代码:我将按照我理解问题的方式修复代码,添加了几行额外的代码。主要问题是,您只为预测输入第1列和第2列,但predictor需要11列1-11。因此,第3-11列应该以某种方式填充。至少您可以用零填充它们 在我的解决方案中,我按第1列对训练集进行排序,然后在使用构建网格网格时,我试图通过从网格网格中找到值接近X1的最近的第1列值来近似预测所需的第3-11列。也就是说,我试图找到第3-11列的最佳近似值,仅给出第1列,这并不是用ze填充第3-11列ros,这也是可以做到的 此外,我还注释了sklearn.cross\u validation import train\u test\u split中的行
,并将其替换为sklearn.model\u selection import train\u test\u split中的,因为第一行使用旧的sklearn库,在新版本中,只有第二行工作,子模块名称已更改。请自行选择此行的正确变体
# Random Forest Classification
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Importing the dataset
dataset = pd.read_csv('finalplacementdata3.csv')
X = dataset.iloc[:, range(1, 12)].values
y = dataset.iloc[:, 12].values
siX = np.lexsort((X[:, 1], X[:, 0]))
sX, sy = X[siX], y[siX]
# Splitting the dataset into the Training set and Test set
#from sklearn.cross_validation import train_test_split
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)
# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
# Fitting Random Forest Classification to the Training set
from sklearn.ensemble import RandomForestClassifier
classifier = RandomForestClassifier(n_estimators = 10, criterion = 'entropy', random_state = 0)
classifier.fit(X_train, y_train)
# Predicting the Test set results
y_pred = classifier.predict(X_test)
# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
# Visualising the Training set results
from matplotlib.colors import ListedColormap
X_set, y_set = X_train, y_train
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() + 1, step = 0.01),
np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01))
riX = np.minimum(sX.shape[0] - 1, np.searchsorted(sX[:, 0], X1.ravel()))
rX = X[riX]
plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(), X2.ravel()] + list(rX[:, 2:].T)).T).reshape(X1.shape),
alpha = 0.75, cmap = ListedColormap(('red', 'green')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
c = ListedColormap(('red', 'green'))(i), label = j)
plt.title('Random Forest Classification (Training set)')
plt.xlabel('Quants')
plt.ylabel('CGPA')
plt.legend()
plt.show()
# Visualising the Test set results
from matplotlib.colors import ListedColormap
X_set, y_set = X_test, y_test
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() + 1, step = 0.01),
np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01))
riX = np.minimum(sX.shape[0] - 1, np.searchsorted(sX[:, 0], X1.ravel()))
rX = X[riX]
plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(), X2.ravel()] + list(rX[:, 2:].T)).T).reshape(X1.shape),
alpha = 0.75, cmap = ListedColormap(('red', 'green')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
c = ListedColormap(('red', 'green'))(i), label = j)
plt.title('Random Forest Classification (Test set)')
plt.xlabel('Quants')
plt.ylabel('CGPA')
plt.legend()
plt.show()
您的模型(分类器
)经过训练,每个X输入中有11个数字。但您为其提供了2个数字。即,您的预测数组np.array([X1.ravel(),X2.ravel()])).T
只有两列,但应该有11列。如果您提供模型的代码,我们可以调查问题。或者,您可以创建11列,使用与上面相同的11个X,如X1、X2、X3…X11,更好地作为原因数组。@Arty请从这里检查完整的模型代码-->是的,在您的代码中,您正在训练模型进行预测第12列由第1-11列组成。因此,在代码的最后一部分,当您可视化和预测(当您有异常时)时,您只提供了两列X1、X2,但需要提供11列。