R 随机森林包的功能重要性_R_Random Forest_Feature Selection

R 随机森林包的功能重要性

R 随机森林包的功能重要性,r,random-forest,feature-selection,R,Random Forest,Feature Selection,我想使用随机林来为分类问题找到最重要的特征（我有两个类：0和1）我创建了模型： rf = randomForest(y ~ ., data = df, sampsize=100000,ntree=100, importance = TRUE, keep.forest = FALSE) 然后我用以下方法检查重要性： importance(rf, type = 1, class = 1) 我读到class参数可以用于分类问题。我的问题是，我是否必须按其绝对值对结果进行排序，以降低平均精度。

我想使用随机林来为分类问题找到最重要的特征（我有两个类：0和1）

我创建了模型：

rf = randomForest(y  ~ ., data = df, sampsize=100000,ntree=100, importance = TRUE, keep.forest = FALSE)

然后我用以下方法检查重要性：

importance(rf, type = 1, class = 1)

我读到class参数可以用于分类问题。

我的问题是，我是否必须按其绝对值对结果进行排序，以降低平均精度。当我使用<代码> VarImpPlot < /代码>时，我也应该考虑否定值。参数
class=1
的确切含义是什么？
我们可以使用iris数据集，其中有3种物种：
数据（iris）表（鸢尾属$种）
我们拟合了一个随机森林：

library(randomForest) mdl = randomForest(Species~.,data=iris,importance=TRUE) # let's do it without options importance(mdl) setosa versicolor virginica MeanDecreaseAccuracy Sepal.Length 6.364533 6.2112640 7.632076 10.365371 Sepal.Width 4.790211 0.4339124 5.500338 5.153676 Petal.Length 22.027701 34.5777755 29.080648 35.215194 Petal.Width 22.500729 31.1403378 30.714576 33.335003 MeanDecreaseGini Sepal.Length 9.223319 Sepal.Width 2.189763 Petal.Length 44.703684 Petal.Width 43.163546
上表是您的所有结果，如果您确定了重要性（mdl，type=1），则该变量在所有类别中的平均精度都会降低。对于可以预测的每个类（setosa、versicolor、virginica），您会看到三个单独的列，因此如果您这样做：

importance(mdl,type=1,class="setosa") setosa Sepal.Length 6.364533 Sepal.Width 4.790211 Petal.Length 22.027701 Petal.Width 22.500729
您可以更改与该类关联的精度
因此，在您的代码中，当您执行
重要性（rf，type=1，class=1）
，并且您的模型是
随机森林（y~，data=df…
，您试图找到变量的重要性，该变量与标签为1的预测变量相关
最后，您可以按如下方式对它们进行排序：

res = importance(mdl,type=1,class="setosa") res = res[order(res[,1],decreasing=TRUE),drop=FALSE,] res

嗨，Sara，如果您的数据准备正确，那么您的代码看起来是正确的。让我再查一次课。。因此，是否排序取决于您希望对结果做什么？
res = importance(mdl,type=1,class="setosa") res = res[order(res[,1],decreasing=TRUE),drop=FALSE,] res