Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/18.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 模型的特征数量必须与输入匹配。型号n_特征为16,输入n_特征为1_Python_Python 3.x_Dataframe_Scikit Learn_Random Forest - Fatal编程技术网

Python 模型的特征数量必须与输入匹配。型号n_特征为16,输入n_特征为1

Python 模型的特征数量必须与输入匹配。型号n_特征为16,输入n_特征为1,python,python-3.x,dataframe,scikit-learn,random-forest,Python,Python 3.x,Dataframe,Scikit Learn,Random Forest,我正在使用一个Kaggle中风数据集,在使用RandomForestClassifier制作之后,我使用了RandomSearchCV。我不明白为什么它会显示n_功能16,这让我很困惑,我是数据科学的新手,所以我甚至不知道我做错了什么 import pandas as pd df = pd.read_csv("healthcare-dataset-stroke-data.csv") print(df) df.dropna(inplace=True) df.isnull()

我正在使用一个Kaggle中风数据集,在使用
RandomForestClassifier
制作之后,我使用了RandomSearchCV。我不明白为什么它会显示n_功能16,这让我很困惑,我是数据科学的新手,所以我甚至不知道我做错了什么

import pandas as pd
df = pd.read_csv("healthcare-dataset-stroke-data.csv")
print(df)

df.dropna(inplace=True)

df.isnull().sum()

df.corr()

final_dataset=pd.get_dummies(df,drop_first=True)

print(final_dataset)

import seaborn as sns
import matplotlib.pyplot as plt

corrmat= final_dataset.corr()
top_corr_features = corrmat.index
plt.figure(figsize=(20,20)) 
g=sns.heatmap(final_dataset[top_corr_features].corr(),annot=True,cmap="RdYlGn")

final_dataset.columns

X = final_dataset[['age', 'hypertension', 'heart_disease', 'avg_glucose_level',
       'bmi','gender_Male', 'gender_Other', 'ever_married_Yes',
       'work_type_Never_worked', 'work_type_Private',
       'work_type_Self-employed', 'work_type_children', 'Residence_type_Urban',
       'smoking_status_formerly smoked', 'smoking_status_never smoked',
       'smoking_status_smokes']]

y = final_dataset[['stroke']]

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.2)
y_train.shape
y_test.shape

from sklearn.ensemble import  RandomForestClassifier
rf = RandomForestClassifier()

"""Hyperparameters"""

import numpy as np
n_estimators = [int(x) for x in np.linspace(100,1200,12)]
max_features = ["auto", "sqrt"]
max_depth = [int(x) for x in np.linspace(5,30,6)]
min_samples_split = [2,5,10,15,100]
min_samples_leaf = [1,2,5,10]

# Create the random grid
random_grid = {'n_estimators': n_estimators,
               'max_features': max_features,
               'max_depth': max_depth,
               'min_samples_split': min_samples_split,
               'min_samples_leaf': min_samples_leaf}

print(random_grid)

from sklearn.model_selection import RandomizedSearchCV
rf_random = RandomizedSearchCV(estimator = rf, param_distributions = random_grid,scoring='neg_mean_squared_error', n_iter = 10, cv = 5, verbose=2, random_state=42, n_jobs = 1)

rf_random.fit(X_train,y_train.values.ravel())

predictions = rf_random.predict(X_test)



print(rf.score(y_test,predictions))
我犯的错误

Traceback (most recent call last):
  File "d:/object_detection/untitled3.py", line 82, in <module>
    print(rf.score(y_test,predictions))
  File "C:\Users\Amit\anaconda3\lib\site-packages\sklearn\base.py", line 499, in score
    return accuracy_score(y, self.predict(X), sample_weight=sample_weight)
  File "C:\Users\Amit\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 629, in predict
    proba = self.predict_proba(X)
  File "C:\Users\Amit\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 673, in predict_proba
    X = self._validate_X_predict(X)
  File "C:\Users\Amit\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 421, in _validate_X_predict
    return self.estimators_[0]._validate_X_predict(X, check_input=True)
  File "C:\Users\Amit\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 396, in _validate_X_predict
    raise ValueError("Number of features of the model must "
ValueError: Number of features of the model must match the input. Model n_features is 16 and input n_features is 1
回溯(最近一次呼叫最后一次):
文件“d:/object\u detection/untitled3.py”,第82行,在
打印(射频分数(y_测试,预测))
文件“C:\Users\Amit\anaconda3\lib\site packages\sklearn\base.py”,第499行,在score中
返回准确度得分(y,自我预测(X),样本权重=样本权重)
文件“C:\Users\Amit\anaconda3\lib\site packages\sklearn\employee\\ u forest.py”,第629行,在predict中
概率=自我预测概率(X)
文件“C:\Users\Amit\anaconda3\lib\site packages\sklearn\employ\\ u forest.py”,第673行,在predict\u proba中
X=自我验证X预测(X)
文件“C:\Users\Amit\anaconda3\lib\site packages\sklearn\employee\\ u forest.py”,第421行,位于\u validate\u X\u predict中
返回自估计值[0]。\u验证\u预测(X,检查输入=真)
文件“C:\Users\Amit\anaconda3\lib\site packages\sklearn\tree\\u classes.py”,第396行,位于\u validate\u X\u predict中
raise VALUE ERROR(“模型的特征数量必须”
ValueError:模型的特征数必须与输入匹配。模型n_特征数为16,输入n_特征数为1
输出

0.9663951120162932
0.9663951120162932