首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >Python层语法分析器的语法错误

Python层语法分析器的语法错误
EN

Stack Overflow用户
提问于 2016-03-11 15:44:12
回答 1查看 2.2K关注 0票数 1

我正在使用Python编写简化的MODULA-2语法。

但我收到语法错误:

代码语言:javascript
复制
$ python3 m2.py
Syntax error at 'MODULE'

我不知道规则的问题是什么。

以下是语法:

代码语言:javascript
复制
import ply.lex as lex
import ply.yacc as yacc

# =============================================================================
# Lexer rules
# =============================================================================

tokens = (
    # Keywords
    'RETURN', 'IF', 'THEN', 'VAR', 'MODULE', 'BEGIN', 'END',
    # Contants
    'NUMBER',
    # Operators
    'PLUS', 'MINUS', 'TIMES', 'DIV', 'MOD', 'ASSIGN_OP',
    # Separators
    'LPAR', 'RPAR', 'PERIOD', 'COLON', 'SEMICOLON',
    # Identifier
    'IDENT',
    )

# Tokens

t_NUMBER        = r'\d+'
t_PLUS          = r'\+'
t_MINUS         = r'-'
t_TIMES         = r'\*'
t_LPAR          = r'\('
t_RPAR          = r'\)'
t_PERIOD        = r'\.'
t_COLON         = r':'
t_SEMICOLON     = r';'
t_ASSIGN_OP     = r':='
t_IDENT         = r'[a-zA-Z][a-zA-Z0-9]*'

# Ignored characters
t_ignore = ' \t'

def t_error(t):
    print("Illegal character '%s'" % t.value[0])
    t.lexer.skip(1)

# Build the lexer
lexer = lex.lex()


# =============================================================================
# Parser rules
# =============================================================================

precedence = (
    ('left', 'PLUS', 'MINUS'),
    ('left', 'TIMES', 'DIV'),
)

def p_add_operator(t):
    """ add_operator : PLUS
                     | MINUS
    """
    pass

def p_mul_operator(t):
    """ mul_operator : TIMES
                     | DIV
                     | MOD
    """
    pass

def p_simple_expression(t):
    """ expression : term
                   | expression add_operator term
    """
    pass

def p_term(t):
    """ term : factor
             | term mul_operator factor
    """
    pass

def p_factor(t):
    """ factor : NUMBER
               | IDENT
               | LPAR expression RPAR
    """
    pass

def p_statement(t):
    """ statement : IDENT
                  | IDENT ASSIGN_OP expression
                  | IF expression THEN statement_sequence END
                  | RETURN expression
    """
    pass

def p_statement_sequence(t):
    """ statement_sequence : statement
                           | statement_sequence SEMICOLON statement
    """
    pass

def p_block(t):
    """ block : declaration_list BEGIN statement_sequence END
    """
    pass

def p_declaration_list(t):
    """ declaration_list : declaration
                         | declaration_list declaration
    """
    pass

def p_declaration(t):
    """ declaration : VAR IDENT COLON IDENT SEMICOLON
    """
    pass

def p_program_module(t):
    """ program_module : MODULE IDENT SEMICOLON block IDENT PERIOD
    """
    pass

def p_error(t):
    print("Syntax error at '%s'" % t.value)

parser = yacc.yacc(start='program_module')

if __name__ == "__main__":
    s = "MODULE test; VAR x: INTEGER; BEGIN x := 10 END test."
    parser.parse(s)

有趣的是,为lex/yacc编写的语法规则运行良好。有人能帮我吗?

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2016-03-11 17:15:38

AFAIK,ply.lex没有足够的魔力知道您希望特殊的MODULE单词成为令牌MODULE

根据您的定义,简单的测试:

代码语言:javascript
复制
lexer.input("MODULE test; VAR x: INTEGER; BEGIN x := 10 END test.")
for tok in lexer:
    print(tok)

产出:

代码语言:javascript
复制
LexToken(IDENT,'MODULE',1,0)
LexToken(IDENT,'test',1,7)
LexToken(SEMICOLON,';',1,11)
LexToken(IDENT,'VAR',1,13)
LexToken(IDENT,'x',1,17)
LexToken(COLON,':',1,18)
LexToken(IDENT,'INTEGER',1,20)
LexToken(SEMICOLON,';',1,27)
LexToken(IDENT,'BEGIN',1,29)
LexToken(IDENT,'x',1,35)
LexToken(ASSIGN_OP,':=',1,37)
LexToken(NUMBER,'10',1,40)
LexToken(IDENT,'END',1,43)
LexToken(IDENT,'test',1,47)
LexToken(PERIOD,'.',1,51)

处理关键字的正确方法是在IDENT令牌中识别它们:

代码语言:javascript
复制
=============================================================================
# Lexer rules
# =============================================================================
# Keywords
keywords = ( 'RETURN', 'IF', 'THEN', 'VAR', 'MODULE', 'BEGIN', 'END' )
tokens = keywords + (
    # Contants
    'NUMBER',
    ...

代码语言:javascript
复制
def t_IDENT(t):
    r'[a-zA-Z][a-zA-Z0-9]*'
    if t.value in keywords:  # is this a keyword
        t.type = t.value
    return t

相同的lexer控件现在正确地给出了:

代码语言:javascript
复制
LexToken(MODULE,'MODULE',1,0)
LexToken(IDENT,'test',1,7)
LexToken(SEMICOLON,';',1,11)
LexToken(VAR,'VAR',1,13)
LexToken(IDENT,'x',1,17)
LexToken(COLON,':',1,18)
LexToken(IDENT,'INTEGER',1,20)
LexToken(SEMICOLON,';',1,27)
LexToken(BEGIN,'BEGIN',1,29)
LexToken(IDENT,'x',1,35)
LexToken(ASSIGN_OP,':=',1,37)
LexToken(NUMBER,'10',1,40)
LexToken(END,'END',1,43)
LexToken(IDENT,'test',1,47)
LexToken(PERIOD,'.',1,51)

解析结果没有显示错误。

票数 2
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/35944387

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档