首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >Weka ThresholdSelector和CostSensitiveClassifier on stream learning

Weka ThresholdSelector和CostSensitiveClassifier on stream learning
EN

Stack Overflow用户
提问于 2014-07-18 16:20:47
回答 1查看 154关注 0票数 0

Weka的ThresholdSelector和/或CostSensitiveClassifier是否与流学习(可更新的分类器)兼容?我的目标是将它们与weka.classifiers.meta.MOA结合使用,专注于特定类的学习,并在一些不平衡的数据上最小化FN。

非常感谢!

EN

回答 1

Stack Overflow用户

发布于 2014-07-19 20:36:10

my post on Weka Pentaho forum之后,答案是、ThresholdSelector或CostSensitiveClassifier都不支持可更新的分类器。因此,使用这些元分类器进行流式学习目前是不可能的。

因此,我提出了一个代码草案来创建这些分类器的可更新版本。任何意见/建议都将非常受欢迎。

更新weka.classifiers.meta.CostSensitiveClassifier代码以创建可更新的版本(这个“看起来”是最简单的)

代码语言:javascript
复制
/*
   weka.classifiers.meta.CostSensitiveClassifier: draft code update and questions to make it compatible with updateable classifiers
*/

import weka.classifiers.UpdateableClassifier;
....
implements ... UpdateableClassifier;
...
protected boolean classifierAlreadyUpdated = False;

public void updateClassifier(Instance instance) throws Exception {
    if (!instance.classIsMissing()) {

        if (m_Classifier == null)
            throw new Exception("No base classifier has been set!");

        // not sure on how to properly check if m_CostMatrix has already been fully intialized here or from elsewhere (ie. external call to buildClassifier)
        if (m_CostMatrix is null || (m_CostMatrix.size() == 1 && !classifierAlreadyUpdated)) {
            buildClassifier(new Instances[] {instance}); // re-use intialization process from buildClassifier
            classifierAlreadyUpdated = True;
        }
        else {
            double factor = 1.0;
            int classValIndex = (int) instance.classValue();
            Object element = (classValIndex == 0) ? m_CostMatrix.getCell(classValIndex, 1) : m_CostMatrix.getCell(classValIndex, 0);

            if (element instanceof Double) {
                factor = ((Double) element).doubleValue();
            } else {
                factor = ((AttributeExpression) element).evaluateExpression(instance);
            }

            double weightOfInstance = instance.weight() * factor;

            if (!m_MinimizeExpectedCost) {
                ((UpdateableClassifier)m_Classifier).updateClassifier(instance.setWeight(weightOfInstance));
            } else {
                ((UpdateableClassifier)m_Classifier).updateClassifier(instance);
            }
        }
    }
}

weka.classifiers.meta.ThresholdSelector代码更新以创建可更新的版本(等待您的意见/建议):

代码语言:javascript
复制
/*
   weka.classifiers.meta.ThresholdSelector draft code update and questions to make it compatible with updateable classifiers

   I've got the big picture but I would need some help on findThreshold and the evaluation mode

   findThreshold:
       double low, high, maxValue and Instance maxInst => should become protected class properties in order
       to keep them updated across build&all updates and could be resetted when calling buildClassifier

   Evaluation mode and getPredictions: should I create a new Evaluation mode ?
   EVAL_TRAINING_SET does not seem a good option as it would skip the updateClassifier

   I could then modify toString and add the code below to getPredictions ?
     case EVAL_STREAM:
       return eu.getTrainTestPredictions(m_Classifier, instances, instances);

   For updateClassifier, please find below a draft code
*/

import weka.classifiers.UpdateableClassifier;
....
implements ... UpdateableClassifier;
...
protected boolean classifierAlreadyUpdated = False;

public void updateClassifier(Instance instance) throws Exception {
    if (!instance.classIsMissing()) {

        if (m_Classifier == null)
            throw new Exception("No base classifier has been set!");

        // Don't know how to properly check if m_CostMatrix has already been fully intialized here or from elsewhere
        if (!classifierAlreadyUpdated)) {
            buildClassifier(new Instances[] {instance}); // re-use intialization process from buildClassifier
            classifierAlreadyUpdated = True;
        }
        else {

            // If data contains only one instance of positive data
            // optimize on training data
            if (stats.distinctCount != 2) {
                System.err.println("Couldn't find examples of both classes. No adjustment.");
                m_Classifier.updateClassifier(instance);
            }
            else {
                // m_DesignatedClass: already initialized via buildClassifier (called if needed during first update)

                if (m_manualThreshold) {
                    m_Classifier.updateClassifier(instance);
                    return;
                }

                if (stats.nominalCounts[m_DesignatedClass] == 1) {
                    System.err.println("Only 1 positive found: optimizing on training data");
                    findThreshold(getPredictions(new Instances[] {instance}, EVAL_TRAINING_SET, 0));
                } else {
                    int numFolds = Math.min(m_NumXValFolds, stats.nominalCounts[m_DesignatedClass]);
                    findThreshold(getPredictions(new Instances[] {instance}, m_EvalMode, numFolds));
                    if (m_EvalMode != EVAL_TRAINING_SET) {
                        m_Classifier.updateClassifier(instance);
                }
            }
        }
    }
}

谢谢

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/24820360

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档