首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >当属性值包含关键字时,尝试使用antlr2解析edifact-文件时出错

当属性值包含关键字时,尝试使用antlr2解析edifact-文件时出错
EN

Stack Overflow用户
提问于 2019-08-07 06:42:03
回答 1查看 68关注 0票数 0

我有一项忘恩负义的任务,要修复旧antlr2解析器中的一个bug,该解析器用于解析edifact文件。不幸的是,我一点也不熟悉antlr2或解析器,也无法使它工作。

edifact-文件如下:

代码语言:javascript
复制
ABC+Name+Surname+zip+city+street+country+1961219++0037141008'
XYZ+Company+++XYZ+zip+street'
LMN+20081010+1100'

有几个不同的段,它们以关键字开头。例如XYZ或ABC。关键字后面跟着不同的属性值,都用'+‘分隔。属性值可能为空。每个段以一个'结束。

问题是,每当数据属性包含关键字时,解析器就会抛出一个错误:

意外令牌: XYZ

XYZ+Company+++XYZ+zip+street‘

这是语法文件的摘录:

代码语言:javascript
复制
// $ANTLR 2.7.6


xyz: "XYZ"       ELT_SEP! 
     (xyz1_1a:ANUM|xyz1_1b:NUM)          {lq(90,xyz1_1a,xyz1_1b,"XYZ1-1"+LQ90)}?  ELT_SEP!
     (xyz1_2a:ANUM|xyz1_2b:NUM)?         {lq_(90,xyz1_2a,xyz1_2b,"XYZ1-2"+LQ90)}? ELT_SEP!
     (xyz1_3a:ANUM|xyz1_3b:NUM)?         {lq_(90,xyz1_3a,xyz1_3b,"XYZ1-3"+LQ90)}? ELT_SEP! 
     (xyz2a:ANUM|xyz2b:NUM)?             {lq_(3,xyz2a,xyz2b,"XYZ2"+LQ3)}?         ELT_SEP! 
     (xyz3a:ANUM|xyz3b:NUM)?             {lq_(6,xyz3a,xyz3b,"XYZ3"+LQ6)}?         ELT_SEP! 
     (xyz4a:ANUM|xyz4b:NUM)              {lq(30,xyz4a,xyz4b,"XYZ4"+LQ30)}?
     (ELT_SEP! (xyz5a:ANUM|xyz5b:NUM)?)?  {lq_(46,xyz5a,xyz5b,"XYZ5"+LQ46)}?       SEG_TERM!
     {
        if (skipNachricht()) return;
        Xyz xyz = new Xyz();
        xyz.xyz1_1 = getText(nn(xyz1_1a, xyz1_1b));
        xyz.xyz1_2 = getText(nn(xyz1_2a, xyz1_2b));
        xyz.xyz1_3 = getText(nn(xyz1_3a, xyz1_3b));
        xyz.xyz2 = getText(nn(xyz2a, xyz2b));
        xyz.xyz3 = getText(nn(xyz3a, xyz3b));
        xyz.xyz4 = getText(nn(xyz4a, xyz4b));
        xyz.xyz5 = getText(nn(xyz5a, xyz5b));
        handleXyz(xyz);
     }
   ;  



/*
 * Lexer
 */
class EdifactLexer extends Lexer;

options { 
          k=2; 
          filter=true; 
          charVocabulary = '\3'..'\377'; // Latin
}

DEZ_SEP: ',' 
    {
          //System.out.println("Found dez_sep: " + getText()); 
        }
    ;

ELT_SEP: '+' 
    {
          //System.out.println("Found elt_sep: " + getText()); 
        }
    ;

SEG_TERM: '\''
    {
          // System.out.println("Found seg_term: " + getText()); 
        }
    ;

NUM:   (('0'..'9')+ (',' ('0'..'9')+)? ('+' | '\'')) 
          => ('0'..'9')+ (',' ('0'..'9')+)? 
            {
                //System.out.println("num_: " + getText());
            }
       | 
       ((ESCAPED | ~('?' | '+' | '\'' | ',' | '\r' | '\n'))+ ) 
          => ( ESCAPED | ~('?' | '+' | '\'' | ',' | '\r' | '\n'))+
                {
                        $setType(ANUM); 
            //System.out.println("anum: " + getText());
        } 
       |
       (WRONGLY_ESCAPED) => WRONGLY_ESCAPED 
                {$setType(WRONGLY_ESCAPED); }
       ;

protected
WRONGLY_ESCAPED: '?' ~('?' | ':' | '+' | '\'' | ',') 
    {
          //System.out.println("Found wrong_escaped: " + getText()); 
        }
        ;

protected
ESCAPED: '?' 
      ( ','  {$setText(","); }
      | '?'  {$setText("?"); }
          | '\'' {$setText("'"); }
          | ':'  {$setText(":"); }
          | '+'  {$setText("+"); }
      ) 
    {
          //System.out.println("Found escaped: " + getText()); 
        }
    ;

NEWLINE   :  ( "\r\n" // DOS
               | '\r'   // MAC
               | '\n'   // Unix
             )
             { newline(); 
               $setType(Token.SKIP);
             }
          ;

任何帮助都很感激:)。

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2019-08-20 14:24:40

这可能不是最好的解决办法,但我终于找到了解决问题的办法。所以,如果有人在类似的问题上犹豫不决,这就是我的解决方案:

我编写了一个方法,如果当前令牌类型与我的关键字匹配,则将令牌类型更改为ANUM

代码语言:javascript
复制
void ckt() throws TokenStreamException, SemanticException {
    if (mKeywordList.contains(LT(1).getType())) {
        LT(1).setType(ANUM);
    }
}

在尝试访问ANUM-Token之前,将在解析器规则中调用该方法:

代码语言:javascript
复制
xyz: "XYZ"       ELT_SEP! 
     {ckt();}(xyz1_1a:ANUM|xyz1_1b:NUM)          {lq(90,xyz1_1a,xyz1_1b,"XYZ1-1"+LQ90)}?  ELT_SEP!
     {ckt();}(xyz1_2a:ANUM|xyz1_2b:NUM)?         {lq_(90,xyz1_2a,xyz1_2b,"XYZ1-2"+LQ90)}? ELT_SEP!
     {ckt();}(xyz1_3a:ANUM|xyz1_3b:NUM)?         {lq_(90,xyz1_3a,xyz1_3b,"XYZ1-3"+LQ90)}? ELT_SEP! 
     {ckt();}(xyz2a:ANUM|xyz2b:NUM)?             {lq_(3,xyz2a,xyz2b,"XYZ2"+LQ3)}?         ELT_SEP! 
     {ckt();}(xyz3a:ANUM|xyz3b:NUM)?             {lq_(6,xyz3a,xyz3b,"XYZ3"+LQ6)}?         ELT_SEP! 
     {ckt();}(xyz4a:ANUM|xyz4b:NUM)              {lq(30,xyz4a,xyz4b,"XYZ4"+LQ30)}?
     (ELT_SEP! {ckt();}(xyz5a:ANUM|xyz5b:NUM)?)?  {lq_(46,xyz5a,xyz5b,"XYZ5"+LQ46)}?       SEG_TERM!
     {
        if (skipNachricht()) return;
        Xyz xyz = new Xyz();
        xyz.xyz1_1 = getText(nn(xyz1_1a, xyz1_1b));
        xyz.xyz1_2 = getText(nn(xyz1_2a, xyz1_2b));
        xyz.xyz1_3 = getText(nn(xyz1_3a, xyz1_3b));
        xyz.xyz2 = getText(nn(xyz2a, xyz2b));
        xyz.xyz3 = getText(nn(xyz3a, xyz3b));
        xyz.xyz4 = getText(nn(xyz4a, xyz4b));
        xyz.xyz5 = getText(nn(xyz5a, xyz5b));
        handleXyz(xyz);
     }
   ;  
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/57388351

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档