Python 3.x 如何绘制最佳参数对应的随机森林树_Python 3.x_Machine Learning_Scikit Learn_Random Forest

Python 3.x 如何绘制最佳参数对应的随机森林树

python-3.x machine-learning scikit-learn

Python 3.x 如何绘制最佳参数对应的随机森林树,python-3.x,machine-learning,scikit-learn,random-forest,Python 3.x,Machine Learning,Scikit Learn,Random Forest,Python:3.6 窗口：10 我对随机森林和手头的问题几乎没有疑问：我正在使用Gridsearch运行使用随机林的回归问题。我想绘制对应于gridsearch找到的最佳拟合参数的树。这是代码 from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=55)

Python:3.6

窗口：10

我对随机森林和手头的问题几乎没有疑问：

我正在使用Gridsearch运行使用随机林的回归问题。我想绘制对应于gridsearch找到的最佳拟合参数的树。这是代码

    from sklearn.model_selection import train_test_split
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=55)

    # Use the random grid to search for best hyperparameters
    # First create the base model to tune
    rf = RandomForestRegressor()
    # Random search of parameters, using 3 fold cross validation, 
    # search across 100 different combinations, and use all available cores
    rf_random = RandomizedSearchCV(estimator = rf, param_distributions = random_grid, n_iter = 50, cv = 5, verbose=2, random_state=56, n_jobs = -1)
    # Fit the random search model
    rf_random.fit(X_train, y_train)

    rf_random.best_params_

得出的最佳参数是：

    {'n_estimators': 1000,
     'min_samples_split': 5,
     'min_samples_leaf': 1,
     'max_features': 'auto',
     'max_depth': 5,
     'bootstrap': True}

如何使用上述参数绘制此树

我的因变量

位于范围[0,1]（连续）内，所有预测变量都是二进制或分类的。在这个输入和输出特征空间中，哪种算法通常能很好地工作。我尝试了随机森林。（结果不太好）。注意这里的

变量是一种比率，因此它介于0和1之间<代码>示例：食品费用/总费用

上述数据是倾斜的，这意味着60%的数据中的从属变量或

变量的值=

，其余数据中的值介于0和1之间。比如

0.66、0.87

等等

因为我的数据只有二进制

{0,1}

和分类变量

{A，B，C}

。我是否需要将其转换为一个热编码变量以使用随机林

在回答你的问题之前，请允许我退一步

理想情况下，应该通过

GridSearchCV

进一步深入查看

RandomizedSearchCV

的

最佳参数RandomizedSearchCV
将检查您的参数，而不尝试所有可能的选项。然后，一旦您获得了随机搜索CV
的最佳参数
，我们就可以在更窄的范围内调查所有可能的选项
您没有在代码输入中包含random\u grid
参数，但我希望您能像这样执行GridSearchCV：
# Create the parameter grid based on the results of RandomizedSearchCV
param_grid = {
    'max_depth': [4, 5, 6],
    'min_samples_leaf': [1, 2],
    'min_samples_split': [4, 5, 6],
    'n_estimators': [990, 1000, 1010]
}
# Fit the grid search model
grid_search = GridSearchCV(estimator = rf, param_grid = param_grid, 
                          cv = 5, n_jobs = -1, verbose = 2, random_state=56)

上面要做的是检查param_网格中所有可能的参数组合
，并为您提供最佳参数
现在来回答你的问题：
随机森林是多棵树的组合-因此您不必只有一棵树可以绘制。相反，您可以绘制随机森林使用的一棵或多棵树。这可以通过函数来实现。阅读文档和这个问题，以便更好地理解它
你先尝试过简单的线性回归吗
这将影响您使用何种精度指标来评估模型的适合性/准确性。在处理不平衡/扭曲的数据时，会想到精确度、召回率和F1分数
是的，在拟合随机林之前，需要将分类变量转换为虚拟变量
关于情节（我担心你的其他问题太宽泛了，一般的想法是避免同时问多个问题）：
拟合你的随机搜索CV
产生了一个rf\u random.best\u estimator\u
，它本身就是一个随机林，具有你问题中显示的参数（包括'n\u estimators'：1000
）
根据，拟合的随机森林回归器包含一个属性：
评估师：决策树评估师列表
拟合子估计量的集合
所以，要绘制随机林中的任意一棵树，您应该使用
from sklearn import tree
tree.plot_tree(rf_random.best_estimator_.estimators_[k])

或
在您的情况下（[0，999]
中所需的k
（在一般情况下为[0，n\u估计器-1]
）
您在上面提出的绘制树的建议：与随机林分类器配合使用效果很好，但不适用于regressor@MAC根据scikit learn的文档，plot_树函数可用于分类器和回归器。尽管我必须承认我从未将其应用于回归方程。我写过：grid=GridSearchCV（estimator=xgb，param\u grid=params，score='neg\u mean\u squared\u error'，n\u jobs=4，verbose=3）和grid.fit（X\u train，y\u train）
。现在，我如何基于最佳估计器绘制树？？？@MAC XGBoost和随机森林是多个决策树的集合。没有一棵树可以表示最佳参数。但是，可以使用plot\u tree（grid，num\u trees=0）
在经过训练的XGBoost模型中绘制特定的树。将0替换为要可视化的第n个决策树。要找出网格
模型中的树数，请检查itsn_估计量。
from sklearn import tree
tree.export_graphviz(rf_random.best_estimator_.estimators_[k])