文章/答案/技术大牛

发布

社区首页 >问答首页 >google protobuf自定义解析器

问google protobuf自定义解析器
EN

Code Review用户

提问于 2018-04-19 09:19:39

回答 1查看 309关注 0票数 4

我已经编写了一个解析德沃德游戏引擎使用的格式的模块，它本质上是google格式，但是有很多定制的东西来支持它们如何处理内部事务。

我希望对整个代码进行检查，但是看到它相当大(800行)，我认为这是否定的吗？如果有人能给它一个快速阅读，这将是很好的，如果不是仅仅是实际的解析？

该代码获取一个文档，然后将其变成一棵“树”，其中包含元素和属性，然后您可以更改这些值并再次将其写下来。

多年来，我一直在编写python，但我都是自学成才，从来没有对代码进行过审查，所以我想我还有很多需要学习的地方。

读取的数据如下所示

name: "default"
scale_along_z: 0
embedded_instances {
  id: "go"
  data: "components {\n"
  "  id: \"script\"\n"
  "  component: \"/main/main.script\"\n"
  "  position {\n"
  "    x: 0.0\n"
  "    y: 0.0\n"
  "    z: 0.0\n"
  "  }\n"
  "  rotation {\n"
  "    x: 0.0\n"
  "    y: 0.0\n"
  "    z: 0.0\n"
  "    w: 1.0\n"
  "  }\n"
  "}\n"
  "components {\n"
  "  id: \"mockup\"\n"
  "  component: \"/main/mockup.gui\"\n"
  "  position {\n"
  "    x: 0.0\n"
  "    y: 0.0\n"
  "    z: 0.0\n"
  "  }\n"
  "  rotation {\n"
  "    x: 0.0\n"
  "    y: 0.0\n"
  "    z: 0.0\n"
  "    w: 1.0\n"
  "  }\n"
  "}\n"
  ""
  position {
    x: 0.0
    y: 0.0
    z: 0.0
  }
  rotation {
    x: 0.0
    y: 0.0
    z: 0.0
    w: 1.0
  }
  scale3 {
    x: 1.0
    y: 1.0
    z: 1.0
  }
}

不幸的是，这方面的代码是巨大的，可能是太不可读了？整个代码都在github https://github.com/Jerakin/DefTree/blob/master/deftree/init.py上

class BaseDefParser:  # pragma: no cover
    _pattern = ''
    _regex = re_compile(_pattern)
    file_path = None

    def __init__(self, root_element):
        self.root = root_element
        self._element_chain = [self.root]

    def _raise_parse_error(self):
        if self.file_path:
            raise ParseError("Error when parsing file: {}".format(self.file_path))
        raise ParseError("Error when parsing supplied document")

    def parse(self, source) -> 'DefTree':
        """Loads an external Defold section into this DefTree

        :param source: path to the file.
        :returns Element: root Element"""
        self.file_path = source
        document = self._open(self.file_path)
        return self._parse(document)

    def from_string(self, source) -> 'DefTree':
        """Parses an Defold section from a string constant

        :param source: string to parse.
        :returns Element: root Element"""

        return self._parse(source)

    def _parse(self, input_doc):
        document = input_doc
        last_index = True
        while last_index:
            last_index = self._tree_builder(document)
            if last_index:
                document = document[last_index:]
        return self.root

    def _tree_builder(self, document):
        """Searches the document for a match and builds the tree"""
        return False

    @staticmethod
    def _get_level(child):
        element_level = -1

        def _count_up(_child, count):
            parent = _child.get_parent()
            if not parent:
                return count

            return _count_up(parent, count+1)
        return _count_up(child, element_level)

    @staticmethod
    def _open(_path):
        """Returns the documents data as a string"""

        with open(_path, "r") as document:
            current_document = document.read()
        return current_document

    @classmethod
    def serialize(cls, element) -> str:
        """Returns a string of the element"""
        return ""


class DefParser(BaseDefParser):
    _pattern = r'(?:data:)|(?:^|\s)(\w+):\s+(.+(?:\s+".*)*)|(\w*)\W{|(})'
    _regex = re_compile(_pattern)

    def __init__(self, root_element):
        super().__init__(root_element)

    def _tree_builder(self, document):
        """Searches the document for a match and builds the tree"""
        regex_match = self._regex.search(document)
        if not regex_match and len(document) > 25:
            self._raise_parse_error()
        if regex_match:
            element_name = regex_match.group(3)
            attribute_name, attribute_value = regex_match.group(1, 2)
            element_exit = regex_match.group(4)

            if element_name:
                if self._element_chain:
                    last_element = self._element_chain[-1]
                else:
                    self._raise_parse_error()  # pragma: no cover
                element = last_element.add_element(element_name)
                self._element_chain.append(element)
            elif attribute_name and attribute_value:
                if attribute_name == "data":
                    attribute_value = bytes(attribute_value, "utf-8").decode("unicode_escape").replace('\n"\n  "',
                                                                                                       "\n")[1:-1]
                    last_element = self._element_chain[-1]
                    element = last_element.add_element("data")
                    self._element_chain.append(element)
                    self._parse(attribute_value)
                    self._element_chain.pop()
                else:
                    if self._element_chain:
                        last_element = self._element_chain[-1]
                    else:
                        self._raise_parse_error()  # pragma: no cover
                    last_element.add_attribute(attribute_name, attribute_value)

            elif element_exit:
                if self._element_chain:
                    self._element_chain.pop()
                else:
                    self._raise_parse_error()  # pragma: no cover

            return regex_match.end()
        return False

    @classmethod
    def serialize(cls, element, internal=False):
        """Returns a string of the element"""
        assert_is_element(element)

        def construct_string(node):
            """Recursive function that formats the text"""
            nonlocal output_string
            nonlocal level
            for child in node:
                element_level = cls._get_level(child)
                if is_element(child):
                    if child.name == "data" and not internal:
                        value = cls._escape_element(child)
                        output_string += "{}{}: {}\n".format("  " * element_level, child.name, value)
                    else:
                        level += 1
                        output_string += "{}{} {{\n".format("  " * element_level, child.name)
                        construct_string(child)
                elif is_attribute(child):
                    output_string += "{}{}: {}\n".format("  " * element_level, child.name,
                                                         child.string)
                if level > element_level and not child.name == "data":
                    level -= 1
                    output_string += "{}{}".format("  " * level, "}\n")

        level = 0
        output_string = ""
        construct_string(element)
        return output_string

    @classmethod
    def _escape_element(cls, ele):
        def yield_attributes(element_parent):
            for child in element_parent:
                if is_attribute(child):
                    yield child
                else:
                    yield from yield_attributes(child)
        data_elements = dict()
        data_elements[cls._get_level(ele)] = [ele]

        for x in ele.iter_elements():
            if is_element(x) and x.name == "data":
                lvl = cls._get_level(x)
                if lvl not in data_elements:
                    data_elements[lvl] = []
                data_elements[lvl].append(x)

        while data_elements:
            for x in data_elements[max(data_elements)]:
                for a in yield_attributes(x):
                    if isinstance(a, DefTreeString) and a.string.startswith('"') and a.string.endswith('"'):
                        a._value = a.string.replace('"', '\\"')
                _root = DefTree().get_root()
                attr = _root.add_attribute("data", "")
                parent = x.get_parent()
                index = parent.index(x)
                parent.remove(x)
                parent.insert(index, attr)
                x._parent = None
                text = cls.serialize(x, True)
                _root.set_attribute("data", '"{}"'.format(text.replace('\"', '\\\"').replace('\n', "\\\n")))
                del data_elements[max(data_elements)]

        return '"{}"'.format(cls.serialize(ele).replace("\n", '\\n\"\n  \"'))

formatting

python

performance

回答 1

Code Review用户

回答已采纳

发布于 2018-04-19 15:23:27

我绝对不会说那是不可读的！但我认为还有一些改进的余地，主要是风格。这是我想出的，没有什么特别的顺序：

在名称

中的下划线

引导下划线应该用来表示对象是“私有的”，尽管在Python中没有真正的意义:这只是一个约定，它说“如果你使用这个，它可能会改变它的名称或它的实现，而不是公共API的一部分”。

->，您应该删除它们，至少对于方法参数；它们在这里没有添加任何意义，这无助于可读性。

来源

非局部变量

递归

在construct_string中，您可以使用函数作用域以外的变量进行递归。我建议您不要这样做，因为它不是非常Pythonic的，通常情况下最好避免副作用，这会使代码更难理解。

->只是将它们作为参数传递给递归调用。

类/实例属性混淆

在BaseDefParser中，您将file_path声明为类属性，并在方法中像实例属性一样对其进行操作。这可能会导致很多混乱，特别是对于可变对象，因为更改将反映在类的所有实例上。

->在__init__中声明实例属性。

`_raise_parse_error`

分解这是一个好主意，但是您在途中丢失了一些东西:您不会向这个方法传递任何可能帮助理解错误所在的上下文(除了文档，但这太宽泛了)。

->为您的异常添加上下文，您(或您未来的用户和贡献者)稍后将为此感谢您自己！

复杂性

您的代码注释不多，也不太容易理解，您可以到处添加注释。

您还可以将一些更复杂的部分拆分为更小的方法。

我会更新，如果我想其他东西，但这应该是一个很好的起点。不管怎么说，你的代码很好，如果你是自学成才，就跟得上！

票数 1

页面原文内容由Code Review提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://codereview.stackexchange.com/questions/192449

复制

相似问题

问google protobuf自定义解析器
EN

回答 1

Code Review用户

在名称

递归

类/实例属性混淆

`_raise_parse_error`

复杂性

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问google protobuf自定义解析器EN

回答 1

Code Review用户

在名称

递归

类/实例属性混淆

_raise_parse_error

复杂性

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问google protobuf自定义解析器
EN

`_raise_parse_error`