文章/答案/技术大牛

发布

社区首页 >问答首页 >Cleartk - Mallet分类器在0个实例上训练，尽管那里有数据

问Cleartk - Mallet分类器在0个实例上训练，尽管那里有数据
EN

Stack Overflow用户

提问于 2014-10-03 11:45:39

回答 1查看 305关注 0票数 0

我正在使用Cleartk (V.2.0) simple pipeline为CAS中的单个句子开发二进制分类器。然而，即使生成了训练数据，分类器在训练期间也不会拾取它，请参见下面的内容。

我正在使用this example，特别是下面的代码片段：

AnalysisEngineFactory.createPrimitiveDescription(
    <name-of-your-cleartk-annotator>.class,
    CleartkAnnotator.PARAM_IS_TRAINING, true,
    DirectoryDataWriterFactory.PARAM_OUTPUT_DIRECTORY,
    <your-output-directory-file>,
    DefaultSequenceDataWriterFactory.PARAM_DATA_WRITER_CLASS_NAME,
    <name-of-your-selected-classifier's-data-writer>.class);

因此，我的初始化代码如下所示：

AnalysisEngine trainClassifier = AnalysisEngineFactory.createPrimitive(MyClassifier.class, 
        CleartkAnnotator.PARAM_IS_TRAINING, true,
        DirectoryDataWriterFactory.PARAM_OUTPUT_DIRECTORY, "target/classifier-data/",
        DefaultSequenceDataWriterFactory.PARAM_DATA_WRITER_CLASS_NAME, MalletCrfStringOutcomeDataWriter.class.getName());

当我运行我的管道时，数据被创建并存储到target/classifier-data/training-data.malletcrf中，其中每一行都是一个特征向量，各个条目的格式为<featurename>_<value>和我的布尔目标属性。我可以在文本编辑器中打开它并查看它。

我使用字符串结果分类器，因为我的目标变量注释器继承自CleartkSequenceAnnotator，并且，正如我从之前对Cleartk列表的回答中所理解的那样，似乎没有一个布尔分类器能够处理每个CAS的多个分类任务。

我的粗略分类器代码：

public class MyClassifier extends CleartkSequenceAnnotator<String> {

@Override
public void process(JCas jCas) throws AnalysisEngineProcessException {

    // retrieve sentences in the cas
    for (Sentence sentence : sentences) {
        // apply feature extractors here to add features
        // add target variable
    }

    if (this.isTraining()) {

        // write the features and outcomes as training instances
        this.dataWriter.write(Instances.toInstances(targets, featureLists));

        try {
            System.out.println("training the classifier ... ");
            Train.main("target/classifier-data/");
            System.out.println("done training classifier");
        } catch (Exception e) {
            System.out.println("ERROR while training the classifier.");
            e.printStackTrace();
        }

    } else /* Classification */ {...}
}

以下是管道代码：

SimplePipeline.runPipeline(reader,
        trainClassifier,
        XmiWriter);

当我运行流水线时，即使训练数据已经写入，我也会得到以下控制台输出：

... reader initialization ...
Couldn't open cc.mallet.util.MalletLogger resources/logging.properties file.
Perhaps the 'resources' directories weren't copied into the 'class' directory.
Continuing.
starting pipeline
training the classifier ... 
Okt 02, 2014 11:19:48 PM cc.mallet.fst.SimpleTagger main
INFORMATION: Number of features in training data: 0
Okt 02, 2014 11:19:48 PM cc.mallet.fst.SimpleTagger main
INFORMATION: Number of predicates: 0
Okt 02, 2014 11:19:48 PM cc.mallet.fst.SimpleTagger main
INFORMATION: Labels: O
Okt 02, 2014 11:19:48 PM cc.mallet.fst.CRF addOrderNStates
INFORMATION: Preparing O
Okt 02, 2014 11:19:48 PM cc.mallet.fst.CRF addOrderNStates
INFORMATION: O->O(O) O,O
State #0 "O"
initialWeight=0.0, finalWeight=0.0
#destinations=1
-> O
Okt 02, 2014 11:19:48 PM cc.mallet.fst.SimpleTagger train
INFORMATION: Training on 0 instances
Okt 02, 2014 11:19:48 PM cc.mallet.fst.CRF setWeightsDimensionAsIn
INFORMATION: CRF weights[O,O] num features = 0
Okt 02, 2014 11:19:48 PM cc.mallet.fst.CRF setWeightsDimensionAsIn
INFORMATION: Number of weights = 1
Okt 02, 2014 11:19:48 PM cc.mallet.fst.CRFTrainerByLabelLikelihood train
INFORMATION: CRF about to train with 1 iterations
Okt 02, 2014 11:19:48 PM cc.mallet.fst.CRFOptimizableByLabelLikelihood getValue
INFORMATION: getValue() (loglikelihood, optimizable by label likelihood) = 0.0
Okt 02, 2014 11:19:48 PM cc.mallet.optimize.LimitedMemoryBFGS optimize
INFORMATION: L-BFGS initial gradient is zero; saying converged
Okt 02, 2014 11:19:48 PM cc.mallet.fst.CRFTrainerByLabelLikelihood train
INFORMATION: CRF finished one iteration of maximizer, i=0
Okt 02, 2014 11:19:48 PM cc.mallet.fst.CRFTrainerByLabelLikelihood train
INFORMATION: CRF training has converged, i=0
done training classifier

..。对我来说，这表明分类器不知何故没有从文件中提取训练数据。

我做错了什么？提前感谢！

uima

cleartk

回答 1

Stack Overflow用户

发布于 2015-02-04 20:06:48

我的猜测是，您导入了错误的句子类。通过调试MyClassifier的process-method中的for循环，您可以很容易地发现我是否正确。

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/26173018

复制

相似问题

问Cleartk - Mallet分类器在0个实例上训练，尽管那里有数据
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Cleartk - Mallet分类器在0个实例上训练，尽管那里有数据EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Cleartk - Mallet分类器在0个实例上训练，尽管那里有数据
EN