Statistics 灵敏度与正预测值-哪一个最好？_Statistics_Classification_Regression_Ensemble Learning

Statistics 灵敏度与正预测值-哪一个最好？

statistics

Statistics 灵敏度与正预测值-哪一个最好？,statistics,classification,regression,ensemble-learning,Statistics,Classification,Regression,Ensemble Learning,我试图在一个类不平衡的数据集（二进制1:25%和0:75%）上构建一个模型。尝试了分类算法和集成技术。我对以下两个概念有点困惑，因为我对预测更多的1更感兴趣 1. Should i give preference to Sensitivity or Positive Predicted Value. Some ensemble techniques give maximum 45% of sensitivity and low Positive Predicted Value. And som

我试图在一个类不平衡的数据集（二进制1:25%和0:75%）上构建一个模型。尝试了分类算法和集成技术。我对以下两个概念有点困惑，因为我对预测更多的1更感兴趣

1. Should i give preference to Sensitivity or Positive Predicted Value. 
Some ensemble techniques give maximum 45% of sensitivity and low Positive Predicted Value.
And some give 62% of Positive Predicted Value and low Sensitivity.


2. My dataset has around 450K observations and 250 features. 
After power test i took 10K observations by Simple random sampling. While selecting 
variable importance using ensemble technique's the features 
are different compared to the features when i tried with 150K observations. 
Now with my intuition and domain knowledge i felt features that came up as important in 
150K observation sample are more relevant. what is the best practice?

3. Last, can i use the variable importance generated by RF in other ensemple 
techniques to predict the accuracy?

你能帮我解决一下吗？我有点困惑，敏感性和阳性预测值之间的偏好取决于你分析的最终目标。这里很好地解释了这两个值之间的差异：总的来说，这是两个从两个不同角度看待结果的指标。敏感度给你一个可能性，即测试将在你的测试对象中找到一个“条件”。阳性预测值着眼于受试者中“病情”的患病率

准确度取决于分类结果：它被定义为（真阳性+真阴性）/（总数），而不是由RF产生的可变重要性

此外，还可以补偿数据集中的不平衡，请参见

@Julian，谢谢您的建议。这个链接非常适合我的情况。。此外，在RF IF类不平衡的情况下，我们可以考虑MeanDecreaseGini选择变量重要度。当杜松子酒几乎稳定下来时，人们可以停下来。。如果我错了，请更正……”我们的模拟研究表明，对于随机森林函数，所有三个变量重要性度量都是不可靠的，基尼重要性是最严重的偏差。“来自Strobl等人“随机森林变量重要性度量的偏差：图解、来源和解决方案”，BMC生物信息学20078:25，考虑使用置换重要性。阿尔特曼等人。“排列重要性：校正的特征重要性度量”，生物信息学，第26卷，第10期，2010年5月15日，第1340-1347页，当然是朱利安。。非常感谢您的回复。。在浏览完链接之后。。我觉得我完全错了！！