首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >找不到NLTK conll2002_ned_IIS.pickle

找不到NLTK conll2002_ned_IIS.pickle
EN

Stack Overflow用户
提问于 2015-01-11 17:20:44
回答 1查看 642关注 0票数 1

我尝试在折叠代码conll2002中使用NLTK,使用

How to improve dutch NER chunkers in NLTK

我已经在我已经解压缩的NLTK目录下运行了以下命令。

python train_chunker.py conll2002 --fileids ned.train --classifier NaiveBayes --filename /nltk_data/chunkers/conll2002_ned_NaiveBayes.pickle

我找到了picle文件(conll2002_ned_NaiveBayes.pickle),并复制了chunker文件目录conll2002_ned_NaiveBayes.pickle。这也是NLTK.download下载软件包的地方。

并尝试执行以下代码:

代码语言:javascript
复制
import nltk

from nltk.corpus import conll2002

tokenizer = nltk.data.load('tokenizers/punkt/dutch.pickle')
tagger = nltk.data.load('taggers/conll2002_ned_IIS.pickle')
chunker = nltk.data.load('chunkers/conll2002_ned_NaiveBayes.pickle')

test_sents = conll2002.tagged_sents(fileids="ned.testb")[0:1000]

print "tagger accuracy on test-set: " + str(tagger.evaluate(test_sents))

test_sents = conll2002.chunked_sents(fileids="ned.testb")[0:1000]

print chunker.evaluate(test_sents)

但是在运行这段代码之后,我会得到以下错误:

LookupError: Resource u'taggers/conll2002_ned_IIS.pickle' not found. Please ....

我尝试过用NLTK.download() GUI加载所有的包和模型,但是仍然会出现相同的错误

有谁知道如何解决这个问题吗?非常感谢

埃里克

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2016-06-12 10:28:25

你得同时训练那个大块头和块头。

代码语言:javascript
复制
python train_chunker.py conll2002 --fileids ned.train --classifier NaiveBayes --filename ~/nltk_data/chunkers/conll2002_ned_NaiveBayes.pickle

这意味着:

代码语言:javascript
复制
loading conll2002
using chunked sentences from ned.train
15806 chunks, training on 15806
training ClassifierChunker with ['NaiveBayes'] classifier
Constructing training corpus for classifier.
Training classifier (202644 instances)
training NaiveBayes classifier
evaluating ClassifierChunker
ChunkParse score:
    IOB Accuracy:  95.4%
    Precision:     66.9%
    Recall:        71.9%
    F-Measure:     69.3%
dumping ClassifierChunker to /home/hugo/nltk_data/chunkers/conll2002_ned_NaiveBayes.pickle

现在训练这位侍应生:

代码语言:javascript
复制
python train_tagger.py conll2002 --fileids ned.train --classifier IIS --filename ~/nltk_data/chunkers/conll2002_ned_IIS.pickle

这意味着:

代码语言:javascript
复制
loading conll2002
using tagged sentences from ned.train
15806 tagged sents, training on 15806
training AffixTagger with affix -3 and backoff <DefaultTagger: tag=-None->
training <class 'nltk.tag.sequential.UnigramTagger'> tagger with backoff <AffixTagger: size=3988>
training <class 'nltk.tag.sequential.BigramTagger'> tagger with backoff <UnigramTagger: size=7799>
training <class 'nltk.tag.sequential.TrigramTagger'> tagger with backoff <BigramTagger: size=1451>
training ['IIS'] ClassifierBasedPOSTagger
Constructing training corpus for classifier.
Training classifier (202644 instances)
training IIS classifier
  ==> Training (10 iterations)
evaluating ClassifierBasedPOSTagger
accuracy: 0.980666
dumping ClassifierBasedPOSTagger to /home/hugo/nltk_data/chunkers/conll2002_ned_IIS.pickle

这需要一些时间..。现在你该走了..。

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/27889882

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档