首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >NLP NER处理错误

NLP NER处理错误
EN

Stack Overflow用户
提问于 2015-07-16 17:04:45
回答 2查看 859关注 0票数 0

这是tsv文件。c2is2r3.tsv

代码语言:javascript
复制
The O
fate    O
of  O
Lehman  ORGANIZATION
Brothers    ORGANIZATION

. . .

New ORGANIZATION
York    ORGANIZATION
Fed ORGANIZATION
,   O
and O
Treasury    TITLE
Secretary   TITLE
Henry   PERSON
M.  PERSON
Paulson PERSON
Jr. PERSON
.   O

更多c2is2r3.支柱

代码语言:javascript
复制
trainFile = c2is2r3.tsv
serializeTo = c2is2r3-ner-model.ser.gz
map = word=0,answer=1

useClassFeature=true
useWord=true
useNGrams=true
noMidNGrams=true
maxNGramLeng=6
usePrev=true
useNext=true
useSequences=true
usePrevSequences=true
maxLeft=1
useTypeSeqs=true
useTypeSeqs2=true
useTypeySequences=true
wordShape=chris2useLC
useDisjunctive=true

这是最初的序列

代码语言:javascript
复制
java -cp  stanford-ner-3.5.2.jar edu.stanford.nlp.ie.crf.CRFClassifier -prop c2is2r3.prop


java -cp stanford-ner-3.5.2.jar -mx2g edu.stanford.nlp.ie.NERClassifierCombiner -ner.model c2is2r3-ner-model.ser.gz,classifiers/english.muc.7class.distsim.crf.ser.gz -ner.useSUTime false -ner.combinationMode HIGH_RECALL -serializeTo c2is2.serialized.ncc.ncc.ser.gz


java -cp stanford-ner-3.5.2.jar -mx1g edu.stanford.nlp.ie.crf.CRFClassifier -loadClassifier c2is2.serialized.ncc.ncc.ser.gz -textFile c2is2r3.txt


CRFClassifier invoked on Fri Jul 17 09:51:13 EDT 2015 with arguments:
   -loadClassifier c2is2.serialized.ncc.ncc.ser.gz -textFile c2is2r3.txt
loadClassifier=c2is2.serialized.ncc.ncc.ser.gz
textFile=c2is2r3.txt
Loading classifier from /mnt/hgfs/share/nlp/stanford-ner-2015-04-20/c2is2.serialized.ncc.ncc.ser.gz ... Error deserializing /mnt/hgfs/share/nlp/stanford-ner-2015-04-20/c2is2.serialized.ncc.ncc.ser.gz
Exception in thread "main" java.lang.RuntimeException: java.lang.ClassCastException: java.util.Properties cannot be cast to [Ledu.stanford.nlp.util.Index;
    at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifierNoExceptions(AbstractSequenceClassifier.java:1572)
    at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifierNoExceptions(AbstractSequenceClassifier.java:1523)
    at edu.stanford.nlp.ie.crf.CRFClassifier.main(CRFClassifier.java:2987)
Caused by: java.lang.ClassCastException: java.util.Properties cannot be cast to [Ledu.stanford.nlp.util.Index;
    at edu.stanford.nlp.ie.crf.CRFClassifier.loadClassifier(CRFClassifier.java:2613)
    at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifier(AbstractSequenceClassifier.java:1451)
    at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifier(AbstractSequenceClassifier.java:1558)
    at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifierNoExceptions(AbstractSequenceClassifier.java:1569)
    ... 2 more

这是试图使用NERClassifierCombiner

代码语言:javascript
复制
java -cp stanford-ner-3.5.2.jar -mx1g edu.stanford.nlp.ie.NERClassifierCombiner  -loadClassifier c2is2.serialized.ncc.ncc.ser.gz -testFile c2is2r3.txt

这是错误堆栈:

代码语言:javascript
复制
NERClassifierCombiner invoked on Fri Jul 17 10:11:17 EDT 2015 with arguments:
   -loadClassifier c2is2.serialized.ncc.ncc.ser.gz -testFile c2is2r3.txt
testFile=c2is2r3.txt
loadClassifier=c2is2.serialized.ncc.ncc.ser.gz
testFile=c2is2r3.txt
ner.useSUTime=false
ner.model=c2is2r3-ner-model.ser.gz,classifiers/english.muc.7class.distsim.crf.ser.gz
serializeTo=c2is2.serialized.ncc.ncc.ser.gz
loadClassifier=c2is2.serialized.ncc.ncc.ser.gz
ner.combinationMode=HIGH_RECALL
loading CRF...
loading CRF...
Error on line 1: The fate of Lehman Brothers, the beleaguered investment bank, hung in the balance on Sunday as Federal Reserve officials and the leaders of major financial institutions continued to gather in emergency meetings trying to complete a plan to rescue the stricken bank.  Several possible plans emerged from the talks, held at the Federal Reserve Bank of New York and led by Timothy R. Geithner, the president of the New York Fed, and Treasury Secretary Henry M. Paulson Jr.
Exception in thread "main" java.lang.UnsupportedOperationException: Argument array lengths differ: [word, tag, answer] vs. [The, fate, of, Lehman, Brothers,, the, beleaguered, investment, bank,, hung, in, the, balance, on, Sunday, as, Federal, Reserve, officials, and, the, leaders, of, major, financial, institutions, continued, to, gather, in, emergency, meetings, trying, to, complete, a, plan, to, rescue, the, stricken, bank., Several, possible, plans, emerged, from, the, talks,, held, at, the, Federal, Reserve, Bank, of, New, York, and, led, by, Timothy, R., Geithner,, the, president, of, the, New, York, Fed,, and, Treasury, Secretary, Henry, M., Paulson, Jr.]
    at edu.stanford.nlp.ling.CoreLabel.initFromStrings(CoreLabel.java:153)
    at edu.stanford.nlp.ling.CoreLabel.<init>(CoreLabel.java:133)
    at edu.stanford.nlp.sequences.ColumnDocumentReaderAndWriter$ColumnDocParser.apply(ColumnDocumentReaderAndWriter.java:85)
    at edu.stanford.nlp.sequences.ColumnDocumentReaderAndWriter$ColumnDocParser.apply(ColumnDocumentReaderAndWriter.java:60)
    at edu.stanford.nlp.objectbank.DelimitRegExIterator.parseString(DelimitRegExIterator.java:67)
    at edu.stanford.nlp.objectbank.DelimitRegExIterator.setNext(DelimitRegExIterator.java:60)
    at edu.stanford.nlp.objectbank.DelimitRegExIterator.<init>(DelimitRegExIterator.java:54)
    at edu.stanford.nlp.objectbank.DelimitRegExIterator$DelimitRegExIteratorFactory.getIterator(DelimitRegExIterator.java:122)
    at edu.stanford.nlp.sequences.ColumnDocumentReaderAndWriter.getIterator(ColumnDocumentReaderAndWriter.java:54)
    at edu.stanford.nlp.objectbank.ObjectBank$OBIterator.setNextObject(ObjectBank.java:436)
    at edu.stanford.nlp.objectbank.ObjectBank$OBIterator.<init>(ObjectBank.java:415)
    at edu.stanford.nlp.objectbank.ObjectBank.iterator(ObjectBank.java:253)
    at edu.stanford.nlp.sequences.ObjectBankWrapper.iterator(ObjectBankWrapper.java:52)
    at edu.stanford.nlp.ie.AbstractSequenceClassifier.classifyAndWriteAnswers(AbstractSequenceClassifier.java:1160)
    at edu.stanford.nlp.ie.AbstractSequenceClassifier.classifyAndWriteAnswers(AbstractSequenceClassifier.java:1111)
    at edu.stanford.nlp.ie.AbstractSequenceClassifier.classifyAndWriteAnswers(AbstractSequenceClassifier.java:1071)
    at edu.stanford.nlp.ie.NERClassifierCombiner.main(NERClassifierCombiner.java:382)

所以不知道下一步该怎么做。任何其他的组合。

EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2015-07-16 22:19:23

在序列化步骤中,您要使用:

edu.stanford.nlp.ie.NERClassifierCombiner

在加载步骤中,您要加载以下内容:

edu.stanford.nlp.ie.crf.CRFClassifier

因此,在第二个命令中,使用edu.stanford.nlp.ie.NERClassifierCombiner代替,错误就会消失。您序列化了一个NERClassifierCombiner,但试图将它作为一个CRFClassifier加载。如果你有其他麻烦,请告诉我!

票数 1
EN

Stack Overflow用户

发布于 2016-03-17 11:19:23

第二个文件c2is2r3.txt需要首先转换为tsv文件,然后将其传递到命令中。

您只需将O(如果您不确定或希望节省手工标记它的时间)与生成的所有令牌关联起来,然后用您的模型进行测试。

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/31460407

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档