Stream 基于流学习的Weka阈值选择器和CostSensitiveClassifier
Weka的ThresholdSelector和/或CostSensitiveClassifier是否与流学习(可更新分类器)兼容?我的目标是将它们与weka.classifiers.meta.MOA一起使用,以便将学习重点放在特定的类上,并将某些不平衡数据上的FN最小化 非常感谢 下面的答案是,或阈值选择器或CostSensitiveClassifier都不支持可更新的分类器。因此,目前不可能使用这些元分类器进行流式学习 因此,我提出了一个代码草案来创建这些分类器的可更新版本。欢迎提出任何意见/建议 weka.classifiers.meta.CostSensitiveClassifier代码更新以创建可更新的版本(此版本“似乎”最简单) weka.classifiers.meta.ThresholdSelector代码更新以创建可更新版本(等待您的评论/建议): 谢谢Stream 基于流学习的Weka阈值选择器和CostSensitiveClassifier,stream,machine-learning,weka,Stream,Machine Learning,Weka,Weka的ThresholdSelector和/或CostSensitiveClassifier是否与流学习(可更新分类器)兼容?我的目标是将它们与weka.classifiers.meta.MOA一起使用,以便将学习重点放在特定的类上,并将某些不平衡数据上的FN最小化 非常感谢 下面的答案是,或阈值选择器或CostSensitiveClassifier都不支持可更新的分类器。因此,目前不可能使用这些元分类器进行流式学习 因此,我提出了一个代码草案来创建这些分类器的可更新版本。欢迎提出任何意见/
/*
weka.classifiers.meta.CostSensitiveClassifier: draft code update and questions to make it compatible with updateable classifiers
*/
import weka.classifiers.UpdateableClassifier;
....
implements ... UpdateableClassifier;
...
protected boolean classifierAlreadyUpdated = False;
public void updateClassifier(Instance instance) throws Exception {
if (!instance.classIsMissing()) {
if (m_Classifier == null)
throw new Exception("No base classifier has been set!");
// not sure on how to properly check if m_CostMatrix has already been fully intialized here or from elsewhere (ie. external call to buildClassifier)
if (m_CostMatrix is null || (m_CostMatrix.size() == 1 && !classifierAlreadyUpdated)) {
buildClassifier(new Instances[] {instance}); // re-use intialization process from buildClassifier
classifierAlreadyUpdated = True;
}
else {
double factor = 1.0;
int classValIndex = (int) instance.classValue();
Object element = (classValIndex == 0) ? m_CostMatrix.getCell(classValIndex, 1) : m_CostMatrix.getCell(classValIndex, 0);
if (element instanceof Double) {
factor = ((Double) element).doubleValue();
} else {
factor = ((AttributeExpression) element).evaluateExpression(instance);
}
double weightOfInstance = instance.weight() * factor;
if (!m_MinimizeExpectedCost) {
((UpdateableClassifier)m_Classifier).updateClassifier(instance.setWeight(weightOfInstance));
} else {
((UpdateableClassifier)m_Classifier).updateClassifier(instance);
}
}
}
}
/*
weka.classifiers.meta.ThresholdSelector draft code update and questions to make it compatible with updateable classifiers
I've got the big picture but I would need some help on findThreshold and the evaluation mode
findThreshold:
double low, high, maxValue and Instance maxInst => should become protected class properties in order
to keep them updated across build&all updates and could be resetted when calling buildClassifier
Evaluation mode and getPredictions: should I create a new Evaluation mode ?
EVAL_TRAINING_SET does not seem a good option as it would skip the updateClassifier
I could then modify toString and add the code below to getPredictions ?
case EVAL_STREAM:
return eu.getTrainTestPredictions(m_Classifier, instances, instances);
For updateClassifier, please find below a draft code
*/
import weka.classifiers.UpdateableClassifier;
....
implements ... UpdateableClassifier;
...
protected boolean classifierAlreadyUpdated = False;
public void updateClassifier(Instance instance) throws Exception {
if (!instance.classIsMissing()) {
if (m_Classifier == null)
throw new Exception("No base classifier has been set!");
// Don't know how to properly check if m_CostMatrix has already been fully intialized here or from elsewhere
if (!classifierAlreadyUpdated)) {
buildClassifier(new Instances[] {instance}); // re-use intialization process from buildClassifier
classifierAlreadyUpdated = True;
}
else {
// If data contains only one instance of positive data
// optimize on training data
if (stats.distinctCount != 2) {
System.err.println("Couldn't find examples of both classes. No adjustment.");
m_Classifier.updateClassifier(instance);
}
else {
// m_DesignatedClass: already initialized via buildClassifier (called if needed during first update)
if (m_manualThreshold) {
m_Classifier.updateClassifier(instance);
return;
}
if (stats.nominalCounts[m_DesignatedClass] == 1) {
System.err.println("Only 1 positive found: optimizing on training data");
findThreshold(getPredictions(new Instances[] {instance}, EVAL_TRAINING_SET, 0));
} else {
int numFolds = Math.min(m_NumXValFolds, stats.nominalCounts[m_DesignatedClass]);
findThreshold(getPredictions(new Instances[] {instance}, m_EvalMode, numFolds));
if (m_EvalMode != EVAL_TRAINING_SET) {
m_Classifier.updateClassifier(instance);
}
}
}
}
}