Algorithm Scikit仅基于网格分数了解RFECV功能数量_Algorithm_Python 2.7_Machine Learning_Scikit Learn

Algorithm Scikit仅基于网格分数了解RFECV功能数量

algorithm python-2.7 machine-learning scikit-learn

Algorithm Scikit仅基于网格分数了解RFECV功能数量,algorithm,python-2.7,machine-learning,scikit-learn,Algorithm,Python 2.7,Machine Learning,Scikit Learn,从scikit学习中，该算法依次选择较小的特征集，并且仅保留具有最高权重的特征。具有低权重的特征将被删除，并且此过程将重复自身，直到剩余的特征数与用户指定的特征数匹配（或者默认情况下为原始特征数的一半）结果表明，特征采用RFE和KFCV进行排名代码中有一组25个功能，如下所示：以下是我得到的输出： Original number of features is 25 RFE final number of features : 12 RFECV final number of feature

从scikit学习中，该算法依次选择较小的特征集，并且仅保留具有最高权重的特征。具有低权重的特征将被删除，并且此过程将重复自身，直到剩余的特征数与用户指定的特征数匹配（或者默认情况下为原始特征数的一半）

结果表明，特征采用RFE和KFCV进行排名

代码中有一组25个功能，如下所示：

以下是我得到的输出：

Original number of features is 25
RFE final number of features : 12
RFECV final number of features : 3

Printing RFECV results:
1. Number of features: 3; Grid_Score: 0.818041
2. Number of features: 4; Grid_Score: 0.816065
3. Number of features: 5; Grid_Score: 0.816053
4. Number of features: 6; Grid_Score: 0.799107
5. Number of features: 7; Grid_Score: 0.797047
6. Number of features: 8; Grid_Score: 0.783034
7. Number of features: 10; Grid_Score: 0.783022
8. Number of features: 9; Grid_Score: 0.781992
9. Number of features: 11; Grid_Score: 0.778028
10. Number of features: 12; Grid_Score: 0.774052
11. Number of features: 14; Grid_Score: 0.762015
12. Number of features: 13; Grid_Score: 0.760075
13. Number of features: 15; Grid_Score: 0.752003
14. Number of features: 16; Grid_Score: 0.750015
15. Number of features: 18; Grid_Score: 0.750003
16. Number of features: 22; Grid_Score: 0.748039
17. Number of features: 17; Grid_Score: 0.746003
18. Number of features: 19; Grid_Score: 0.739105
19. Number of features: 20; Grid_Score: 0.739021
20. Number of features: 21; Grid_Score: 0.738003
21. Number of features: 23; Grid_Score: 0.729068
22. Number of features: 25; Grid_Score: 0.725056
23. Number of features: 24; Grid_Score: 0.725044
24. Number of features: 2; Grid_Score: 0.506952
25. Number of features: 1; Grid_Score: 0.272896

在这个特定的例子中：

对于RFE：代码始终返回12个特性（约25个特性的一半，如文档所示）

对于RFECV，代码返回1-25之间的不同数字（不是特性数量的一半）

在我看来，当选择RFECV时，特征的数量仅根据KFCV分数进行选择，即交叉验证分数超过了RFE对特征的连续修剪

这是真的吗？如果想使用本机递归特征消除算法，那么RFECV是使用该算法还是使用其混合版本

在RFECV中，是否对修剪后剩余的特征子集进行交叉验证？如果是这样，在RFECV中每次删减后会保留多少功能？

在交叉验证版本中，在每个步骤中会对功能重新排序，并删除排名最低的功能——这在文档中称为“递归功能选择”

如果要将其与原始版本进行比较，则需要计算RFE所选特性的交叉验证分数。我的猜测是RFECV的答案是正确的——从特征减少时模型性能的急剧增加判断，您可能有一些高度相关的特征，这些特征正在损害模型的性能。

1。交叉验证是在修剪特征后完成的吗？例如，在被截断的特征集上？还有，为什么它只删除一个功能-这是一个规则吗？2.所谓相关特征，是指线性相关特征吗？或者特征之间是否存在其他类型的相关性？是的，我们应该预期3个特征。我所使用的代码只是来自文档示例，其中的目的是使用3个信息特性进行分类。但是很明显，原生RFE并没有给出这一点——如果我们事先不知道有多少功能，那么我们将使用RFE，并在最后保留默认数量的功能（本例中为12个）。在我看来，只有RFECV才能给出正确的答案……也许这只是本地RFE的一个重大限制——即，它无法消除相关功能，因此这种情况下需要GridScore？是的，交叉验证是在修剪之后完成的，在

RFECV

中有一个名为

step

的参数，它指定在每个步骤中要删除多少个功能，默认为1。我认为它可能是任何类型的相关性，尽管我不是100%的了解不同类型的相关性将如何影响模型。对于你的另一个问题，交叉验证基本上是做统计的“现代”方式，其他像标准RFE这样的heruistic方法存在的原因是因为过去没有可用的计算能力。但是现在计算机便宜了，您几乎应该总是更喜欢交叉验证（或其他一些列车测试分离）来评估模型性能！这回答了我的问题。

Original number of features is 25
RFE final number of features : 12
RFECV final number of features : 3

Printing RFECV results:
1. Number of features: 3; Grid_Score: 0.818041
2. Number of features: 4; Grid_Score: 0.816065
3. Number of features: 5; Grid_Score: 0.816053
4. Number of features: 6; Grid_Score: 0.799107
5. Number of features: 7; Grid_Score: 0.797047
6. Number of features: 8; Grid_Score: 0.783034
7. Number of features: 10; Grid_Score: 0.783022
8. Number of features: 9; Grid_Score: 0.781992
9. Number of features: 11; Grid_Score: 0.778028
10. Number of features: 12; Grid_Score: 0.774052
11. Number of features: 14; Grid_Score: 0.762015
12. Number of features: 13; Grid_Score: 0.760075
13. Number of features: 15; Grid_Score: 0.752003
14. Number of features: 16; Grid_Score: 0.750015
15. Number of features: 18; Grid_Score: 0.750003
16. Number of features: 22; Grid_Score: 0.748039
17. Number of features: 17; Grid_Score: 0.746003
18. Number of features: 19; Grid_Score: 0.739105
19. Number of features: 20; Grid_Score: 0.739021
20. Number of features: 21; Grid_Score: 0.738003
21. Number of features: 23; Grid_Score: 0.729068
22. Number of features: 25; Grid_Score: 0.725056
23. Number of features: 24; Grid_Score: 0.725044
24. Number of features: 2; Grid_Score: 0.506952
25. Number of features: 1; Grid_Score: 0.272896