首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >如何在规则语法模型中实现通配符、字符类、否定字符类等?

如何在规则语法模型中实现通配符、字符类、否定字符类等?
EN

Stack Overflow用户
提问于 2015-12-01 22:27:23
回答 1查看 350关注 0票数 8

TL;DR:

如何在计算上模拟语法的结果,使同一左手边存在不确定数量的产品?

我正在从事一个关于形式语言理论的项目,并试图编写一个类来构建规则语法对象,这些对象可以传递给有限状态机。我天真的尝试是创建一个API,用于为每个允许的输入添加一个产品。我尝试的简化版本如下(基于正式语法G = (N, Σ, P, S)的正式定义):

代码语言:javascript
复制
class ContextFreeGrammar:
    def __init__(self, variables, alphabet, production_rules, start_variable):
        self.variables = variables
        self.alphabet = alphabet
        self.production_rules = production_rules
        self.start_variable = start_variable

    def __repr__(self):
        return '{}({}, {}, {}, {})'.format(
            self.__class__.__name__,
            self.variables,
            self.alphabet,
            self.production_rules,
            self.start_variable
        )


class RegularGrammar(ContextFreeGrammar):
    _regular_expression_grammar = None # TODO

    @classmethod
    def from_regular_expression(cls, regular_expression):
        raise NotImplementedError()

我还没有到实际写有限状态自动机或下推自动机的地步。

正则表达式的语法是上下文无关的,因此我在WSN中包含了我的定义如下:

代码语言:javascript
复制
syntax = expression .
expression = term "|" expression .
expression = term .
term = factor repetition term .
term = factor term .
term = .
repetition = "*" .
repetition = "+" .
repetition = "?" .
repetition = "{" nonnegative_integer "," nonnegative_integer "}" .
repetition = "{" nonnegative_integer ",}" .
repetition = "{," nonnegative_integer "}" .
nonnegative_integer = nonzero_arabic_numeral arabic_numerals .
nonnegative_integer = arabic_numeral .
nonzero_arabic_numeral = "1" .
nonzero_arabic_numeral = "2" .
nonzero_arabic_numeral = "3" .
nonzero_arabic_numeral = "4" .
nonzero_arabic_numeral = "5" .
nonzero_arabic_numeral = "6" .
nonzero_arabic_numeral = "7" .
nonzero_arabic_numeral = "8" .
nonzero_arabic_numeral = "9" .
arabic_numeral = nonzero_arabic_numeral .
arabic_numeral = "0" .
arabic_numerals = arabic_numeral .
arabic_numerals = arabic_numeral arabic_numerals .
factor = "(" expression ")" .
factor = character_class .
factor = character .
escaped_character = "\\." .
escaped_character = "\\(" .
escaped_character = "\\)" .
escaped_character = "\\+" .
escaped_character = "\\*" .
escaped_character = "\\?" .
escaped_character = "\\[" .
escaped_character = "\\]" .
escaped_character = "\\\\" .
escaped_character = "\\{" .
escaped_character = "\\}" .
escaped_character = "\\|" .
character -> TODO ;
character_class = TODO .

人们可以很容易地看到,我显式地将交替分割成不同的产品。我这样做是为了便于执行。但是,我被困在如何去做角色类之类的事情上了。我想让production_rules成为一个地图,从每个左手侧到每一个对应的右手边的一组地图。但现在看来不可行了。

EN

回答 1

Stack Overflow用户

发布于 2016-03-14 13:04:22

我不完全理解您的问题,但从注释中可以看出,您似乎是在预定义的字符集内工作,该字符集不包括杂项Unicode和ASCII字符。

下面是我最近实现的用于处理类似约束的方法:

[RegEx]字符组

下面是一个实现上述定义的示例:

代码语言:javascript
复制
global rx_Trim_FromAlphaNumeric
rx_Trim_FromAlphaNumeric =                          \
    "[" + rx_AlphaNumeric                  + "]+" + \
    "[" + rx_ValidCharacters_WithLineSpace + "]*"

global rx_StartsWithSymbol
rx_StartsWithSymbol =                                \
    "[^" + rx_AlphaNumeric                  + "]"  + \
    "["  + rx_Symbols                       + "]+" + \
    "["  + rx_LineSpace + rx_Symbols        + "]*" + \
    "["  + rx_AlphaNumeric                  + "]+" + \
    "["  + rx_ValidCharacters_WithLineSpace + "]*"

global rx_StartsWithLetter
rx_StartsWithLetter =                                \
    "^[" + rx_Alphabetic                    + "]+" + \
    "["  + rx_ValidCharacters_WithLineSpace + "]+"

global rx_StartsWithNumber
rx_StartsWithNumber =                                \
    "^[" + rx_Numeric                       + "]+" + \
    "["  + rx_ValidCharacters_WithLineSpace + "]+"

global rx_WordSegments
rx_WordSegments =                  \
    "([" + rx_Symbols    + "]+|" + \
    "["  + rx_Numeric    + "]+|" + \
    "["  + rx_Alphabetic + "]+|" + \
    "["  + rx_LineSpace  + "]+)"

注意:我喜欢转义所有符号,因为某些字符,例如^__,有上下文转义的要求。如果他们总是逃脱,就不太可能遇到问题。

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/34031423

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档