Python 模型的特征数量必须与输入匹配。型号n_特征为16,输入n_特征为1
我正在使用一个Kaggle中风数据集,在使用Python 模型的特征数量必须与输入匹配。型号n_特征为16,输入n_特征为1,python,python-3.x,dataframe,scikit-learn,random-forest,Python,Python 3.x,Dataframe,Scikit Learn,Random Forest,我正在使用一个Kaggle中风数据集,在使用RandomForestClassifier制作之后,我使用了RandomSearchCV。我不明白为什么它会显示n_功能16,这让我很困惑,我是数据科学的新手,所以我甚至不知道我做错了什么 import pandas as pd df = pd.read_csv("healthcare-dataset-stroke-data.csv") print(df) df.dropna(inplace=True) df.isnull()
RandomForestClassifier
制作之后,我使用了RandomSearchCV。我不明白为什么它会显示n_功能16,这让我很困惑,我是数据科学的新手,所以我甚至不知道我做错了什么
import pandas as pd
df = pd.read_csv("healthcare-dataset-stroke-data.csv")
print(df)
df.dropna(inplace=True)
df.isnull().sum()
df.corr()
final_dataset=pd.get_dummies(df,drop_first=True)
print(final_dataset)
import seaborn as sns
import matplotlib.pyplot as plt
corrmat= final_dataset.corr()
top_corr_features = corrmat.index
plt.figure(figsize=(20,20))
g=sns.heatmap(final_dataset[top_corr_features].corr(),annot=True,cmap="RdYlGn")
final_dataset.columns
X = final_dataset[['age', 'hypertension', 'heart_disease', 'avg_glucose_level',
'bmi','gender_Male', 'gender_Other', 'ever_married_Yes',
'work_type_Never_worked', 'work_type_Private',
'work_type_Self-employed', 'work_type_children', 'Residence_type_Urban',
'smoking_status_formerly smoked', 'smoking_status_never smoked',
'smoking_status_smokes']]
y = final_dataset[['stroke']]
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.2)
y_train.shape
y_test.shape
from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier()
"""Hyperparameters"""
import numpy as np
n_estimators = [int(x) for x in np.linspace(100,1200,12)]
max_features = ["auto", "sqrt"]
max_depth = [int(x) for x in np.linspace(5,30,6)]
min_samples_split = [2,5,10,15,100]
min_samples_leaf = [1,2,5,10]
# Create the random grid
random_grid = {'n_estimators': n_estimators,
'max_features': max_features,
'max_depth': max_depth,
'min_samples_split': min_samples_split,
'min_samples_leaf': min_samples_leaf}
print(random_grid)
from sklearn.model_selection import RandomizedSearchCV
rf_random = RandomizedSearchCV(estimator = rf, param_distributions = random_grid,scoring='neg_mean_squared_error', n_iter = 10, cv = 5, verbose=2, random_state=42, n_jobs = 1)
rf_random.fit(X_train,y_train.values.ravel())
predictions = rf_random.predict(X_test)
print(rf.score(y_test,predictions))
我犯的错误
Traceback (most recent call last):
File "d:/object_detection/untitled3.py", line 82, in <module>
print(rf.score(y_test,predictions))
File "C:\Users\Amit\anaconda3\lib\site-packages\sklearn\base.py", line 499, in score
return accuracy_score(y, self.predict(X), sample_weight=sample_weight)
File "C:\Users\Amit\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 629, in predict
proba = self.predict_proba(X)
File "C:\Users\Amit\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 673, in predict_proba
X = self._validate_X_predict(X)
File "C:\Users\Amit\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 421, in _validate_X_predict
return self.estimators_[0]._validate_X_predict(X, check_input=True)
File "C:\Users\Amit\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 396, in _validate_X_predict
raise ValueError("Number of features of the model must "
ValueError: Number of features of the model must match the input. Model n_features is 16 and input n_features is 1
回溯(最近一次呼叫最后一次):
文件“d:/object\u detection/untitled3.py”,第82行,在
打印(射频分数(y_测试,预测))
文件“C:\Users\Amit\anaconda3\lib\site packages\sklearn\base.py”,第499行,在score中
返回准确度得分(y,自我预测(X),样本权重=样本权重)
文件“C:\Users\Amit\anaconda3\lib\site packages\sklearn\employee\\ u forest.py”,第629行,在predict中
概率=自我预测概率(X)
文件“C:\Users\Amit\anaconda3\lib\site packages\sklearn\employ\\ u forest.py”,第673行,在predict\u proba中
X=自我验证X预测(X)
文件“C:\Users\Amit\anaconda3\lib\site packages\sklearn\employee\\ u forest.py”,第421行,位于\u validate\u X\u predict中
返回自估计值[0]。\u验证\u预测(X,检查输入=真)
文件“C:\Users\Amit\anaconda3\lib\site packages\sklearn\tree\\u classes.py”,第396行,位于\u validate\u X\u predict中
raise VALUE ERROR(“模型的特征数量必须”
ValueError:模型的特征数必须与输入匹配。模型n_特征数为16,输入n_特征数为1
输出
0.9663951120162932
0.9663951120162932