首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >MaxEnt OpenNLP实现的输入格式?

MaxEnt OpenNLP实现的输入格式?
EN

Stack Overflow用户
提问于 2017-07-25 13:01:31
回答 1查看 125关注 0票数 0

我正在尝试使用最大熵分类器的OpenNLP实现,但它似乎非常缺乏文档,尽管这个库显然是为易于使用而设计的,但我找不到输入文件格式(即培训集)的一个示例和/或规范。

有谁知道在哪里可以找到这个或一个最低限度的培训工作示例?

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2017-07-25 15:15:20

OpenNLP的格式非常灵活。如果要在MaxEnt中使用OpenNLP分类器,需要执行几个步骤。

下面是带有注释的示例代码:

代码语言:javascript
复制
package example;

import java.io.File;
import java.io.IOException;
import java.nio.charset.Charset;
import java.util.Arrays;
import java.util.HashMap;
import java.util.Map;

import opennlp.tools.ml.maxent.GISTrainer;
import opennlp.tools.ml.model.Event;
import opennlp.tools.ml.model.MaxentModel;
import opennlp.tools.tokenize.WhitespaceTokenizer;
import opennlp.tools.util.FilterObjectStream;
import opennlp.tools.util.MarkableFileInputStreamFactory;
import opennlp.tools.util.ObjectStream;
import opennlp.tools.util.PlainTextByLineStream;
import opennlp.tools.util.TrainingParameters;

public class ReadData {


    public static void main(String[] args) throws Exception{

        // this is the data file ...
        // the format is <LIST of FEATURES separated by spaces> <outcome>
        // change the file to fit your needs
        File f=new File("football.dat");

        // we need to create an ObjectStream of events for the trainer..
        //   First create an InputStreamFactory -- given a file we can create an InputStream, required for resetting...
        MarkableFileInputStreamFactory factory=new MarkableFileInputStreamFactory(f);
        // create a PlainTextByLineInputStream -- Note: you can create your own Stream that can handle binary files or data that
        //                                     --       crosses two line...
        ObjectStream<String> stream=new PlainTextByLineStream(factory, Charset.defaultCharset());
        //  Now you have a stream of string you need to convert it to a stream of events...
        //  I use a custom FilterObjectStream which simply takes a line, breaks it up into tokens,
        //  uses all except the last as the features [context] and the last token as the outcome class
        ObjectStream<Event> eventStream=new FilterObjectStream<String, Event>(stream) {
            @Override
            public Event read() throws IOException {
                String line=samples.read();
                if (line==null) return null;

                String[] parts=WhitespaceTokenizer.INSTANCE.tokenize(line);
                String[] context=Arrays.copyOf(parts, parts.length-1);

                System.out.println(parts[parts.length-1]+" "+Arrays.toString(context));
                return new Event(parts[parts.length-1], context);
            }
        };


        TrainingParameters parameters=new TrainingParameters();
        // By default OpenNLP uses a cutoff of 5 (a feature has to occur 5 times before it is used)
        // use 1 for my small dataset
        parameters.put(GISTrainer.CUTOFF_PARAM, 1);

        GISTrainer trainer=new GISTrainer();
        // the report map is supposed to mark when default values are assigned...
        Map<String,String> reportMap=new HashMap<>();
        // DONT FORGET TO INITIALIZE THE TRAINER!!!
        trainer.init(parameters, reportMap);
        MaxentModel model=trainer.train(eventStream);

        // Now we have a model -- you should test on a test set, but 
        // this is a toy example... so I am just resetting the eventstream.
        eventStream.reset();
        Event evt=null;
        while ( (evt=eventStream.read())!=null ){
            System.out.print(Arrays.toString(evt.getContext())+":  ");
            // Evaluate the context from the event using our model.
            // you would want to calculate summary statistics..
            double[] p=model.eval(evt.getContext());
            System.out.print(model.getBestOutcome(p)+"  ");
            if (model.getBestOutcome(p).equals(evt.getOutcome())){
                System.out.println("CORRECT");
            }else{
                System.out.println("INCORRECT");                
            }
        }

    }

}

Football.dat:

代码语言:javascript
复制
home=man_united Beckham=false Scholes=true Neville=true Henry=true Kanu=true Parlour=false Ferguson=confident Wengler=tense arsenal_lost_previous man_united_won_previous arsenal
home=man_united Beckham=true Scholes=false Neville=true Henry=false Kanu=true Parlour=false Ferguson=tense Wengler=confident arsenal_won_previous man_united_lost_previous man_united
home=man_united Beckham=false Scholes=true Neville=true Henry=true Kanu=true Parlour=false Ferguson=tense Wengler=tense arsenal_lost_previous man_united_won_previous tie
home=man_united Beckham=true Scholes=true Neville=false Henry=true Kanu=false Parlour=false Ferguson=confident Wengler=confident arsenal_won_previous man_united_won_previous tie
home=man_united Beckham=false Scholes=true Neville=true Henry=true Kanu=true Parlour=false Ferguson=confident Wengler=tense arsenal_won_previous man_united_won_previous arsenal
home=man_united Beckham=false Scholes=true Neville=true Henry=false Kanu=true Parlour=false Ferguson=confident Wengler=confident arsenal_won_previous man_united_won_previous man_united
home=man_united Beckham=true Scholes=true Neville=false Henry=true Kanu=true Parlour=false Ferguson=confident Wengler=tense arsenal_won_previous man_united_won_previous man_united
home=arsenal Beckham=false Scholes=true Neville=true Henry=true Kanu=true Parlour=false Ferguson=confident Wengler=tense arsenal_lost_previous man_united_won_previous arsenal
home=arsenal Beckham=true Scholes=false Neville=true Henry=false Kanu=true Parlour=false Ferguson=tense Wengler=confident arsenal_won_previous man_united_lost_previous arsenal
home=arsenal Beckham=false Scholes=true Neville=true Henry=true Kanu=true Parlour=false Ferguson=tense Wengler=tense arsenal_lost_previous man_united_won_previous tie
home=arsenal Beckham=true Scholes=true Neville=false Henry=true Kanu=false Parlour=false Ferguson=confident Wengler=confident arsenal_won_previous man_united_won_previous man_united
home=arsenal Beckham=false Scholes=true Neville=true Henry=true Kanu=true Parlour=false Ferguson=confident Wengler=tense arsenal_won_previous man_united_won_previous arsenal
home=arsenal Beckham=false Scholes=true Neville=true Henry=false Kanu=true Parlour=false Ferguson=confident Wengler=confident arsenal_won_previous man_united_won_previous man_united
home=arsenal Beckham=true Scholes=true Neville=false Henry=true Kanu=true Parlour=false Ferguson=confident Wengler=tense arsenal_won_previous man_united_won_previous arsenal

希望它能帮上忙

票数 3
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/45304181

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档