我在slight解析中遇到了一个似乎无法解决的小问题。我想写一条规则,为我解析一个多行段落。最终目标是得到一个递归语法,它将解析以下内容:
Heading: awesome
This is a paragraph and then
a line break is inserted
then we have more text
but this is also a different line
with more lines attached
Other: cool
This is another indented block
possibly with more paragraphs
This is another way to keep this up
and write more things
But then we can keep writing at the old level
and get this因此,可能(当然,通过一个解析树,我可以将它转换成我喜欢的任何格式)。
<Heading class="awesome">
<p> This is a paragraph and then a line break is inserted and then we have more text </p>
<p> but this is also a different line with more lines attached<p>
<Other class="cool">
<p> This is another indented block possibly with more paragraphs</p>
<p> This is another way to keep this up and write more things</p>
</Other>
<p> But then we can keep writing at the old level and get this</p>
</Heading>进展
我已经成功地达到了可以解析标题行的阶段,以及一个使用pyparsing的缩进块。但我不能:
一个例子
按照这里,我可以将段落输出到一行,但似乎没有办法在不删除换行字符的情况下将其转换为解析树。
我相信有一段应该是:
words = ## I've defined words to allow a set of characters I need
lines = OneOrMore(words)
paragraph = OneOrMore(lines) + lineEnd但这似乎不适合我。任何想法都会很棒:)
发布于 2017-06-15 08:39:29
所以我设法解决了这个问题,对于将来偶然发现这个问题的人来说。你可以这样定义这个段落。虽然它肯定不是理想的,而且与我描述的语法不完全匹配。有关守则是:
line = OneOrMore(CharsNotIn('\n')) + Suppress(lineEnd)
emptyline = ~line
paragraph = OneOrMore(line) + emptyline
paragraph.setParseAction(join_lines)其中join_lines被定义为:
def join_lines(tokens):
stripped = [t.strip() for t in tokens]
joined = " ".join(stripped)
return joined如果这符合你的需要的话,这应该会给你指明正确的方向:)我希望这会有所帮助!
更好的空行
上面给出的空行的定义肯定不是理想的,而且可以大大改进。我发现的最好的方法是:
empty_line = Suppress(LineStart() + ZeroOrMore(" ") + LineEnd())
empty_line.setWhitespaceChars("")这使您可以使用空行填充空格,而不会破坏匹配。
https://stackoverflow.com/questions/44534653
复制相似问题