我在为下列问题找出正确的正则表达式时遇到了一些困难:
我有一个输入文件,我试图根据关键字表达式将其分组。下面是该文件的一个示例(让我们称之为本例1):
Foo: B
"This is instance B of type Foo"
Bar: X
etc.
Foo: C
"This is instance C of type Foo"
Bar: Y
etc.以下指标如下:
#"(?s)(Foo:)(?:(?!Foo:).)*"工作像一个魅力,并产生了我的预期结果:
(["Foo: B\n \"This is instance B of type Foo\"\n Bar: X\n etc.\n\n"
"Foo:"]
["Foo: C\n \"This is instance C of type Foo\"\n Bar: Y\n etc.\n\n\n"
"Foo:"])但是,如果有人在注释“Foo”中添加了一个冒号,它就会变得很奇怪,结果是:
(["Foo: B\n \"This is instance B of type " "Foo:"]
["Foo:\"\n Bar: X\n etc.\n\n" "Foo:"]
["Foo: C\n \"This is instance C of type Foo\"\n Bar: Y\n etc.\n\n\n"
"Foo:"])如果在测试中从输入中删除Foo: C and it's content并将regex更改为:
"(?s)(Foo:)(?:(?!\"Foo:\").)*"我得到了预期的结果
(["Foo: B\n \"This is instance B of type Foo:\"\n Bar: X\n etc.\n\n\n\n"
"Foo:"])但是,将Foo: C重新添加到混合中,它不再尊重边界并导致:
(["Foo: B\n \"This is instance B of type Foo:\"\n Bar: X\n etc.\n\nFoo: C\n \"This is instance C of type Foo:\"\n Bar: Y\n etc.\n\n\n\n"
"Foo:"])我尝试过这样做,但没有结果:#"(?s)(Foo:)(?:(?!Foo:|\"Foo:\").)*"说出了几千次不成功的回转。
我很感谢你的帮助。其目的是与regex一起执行文件分块。
当前的解决方案离开了regex,因为它太微妙了,无法处理我所需要的简单块。第一个解决方案是循环/重复情况,其中有几个(太多)条件和突变原子作为累积映射。
我一直渴望用reduce做一些具体的事情,虽然可能不是最好的应用程序,但我在这个练习中学到了它,并且删除了过多的代码行。
(def owl-type-map
{
"Prefix:" :prefixes
"AnnotationProperty:" :annotation-properties
"Ontology:" :ontology
"Datatype:" :data-types
"DataProperty:" :data-properties
"ObjectProperty:" :object-properties
"Class:" :classes
"Individual:" :individuals
"EquivalentClasses:" :miscellaneous
"DisjointClasses:" :miscellaneous
"EquivalentProperties:" :miscellaneous
"DisjointProperties:" :miscellaneous
"SameIndividual:" :miscellaneous
"DifferentIndividuals:" :miscellaneous
})
(def owl-control (reduce #(assoc %1 (second %2) nil) {:current nil} owl-type-map))
(def space-split #(s/split (str %) #" "))
(defn owl-chunk
"Reduce ready function to accumulate a series of strings associated to
particular instaparse EBNF productions (e.g. Class:, Prefix:, Ontology:).
owl-type-map refers to the association between owl-type (string) and EBNF production"
[acc v]
(let [odex (:current acc)
stip ((comp first space-split) v)
index (get owl-type-map stip odex)
imap (if (= index odex) acc (assoc-in k [:current] index))
]
(assoc-in imap [index] (str (get imap index) v "\n"))))
;; Calling
(reduce owl-chunk owl-control s) 发布于 2015-03-12 19:13:54
您可能需要考虑使用解析器生成器。Mark的英斯塔帕斯是Clojure的一个出色的解析库,它的目的是让这成为一个简单的选择--如果上下文无关的语法和正则表达式一样容易使用,那么它的自述文件的第一行是什么?
下面是一个示例,说明如何使用它解析示例输入:
;; [instaparse "1.3.5"]
(require '[instaparse.core :as insta])
(def p (insta/parser "
S = Group*
Group = GroupHeader GroupComment GroupBody
GroupHeader = #'[A-Za-z]+' ': ' #'[A-Za-z]+' '\n'
GroupComment = ws? '\"' #'[^\"]+' '\"\n'
GroupBody = Line*
Line = #'.*' '\n'
ws = #'\\s+'
"))
(p "Foo: B
\"This is instance B of type Foo\"
Bar: X
Foo: C
\"This is instance C of type Foo\"
Bar: Y
")
;;=
[:S
[:Group
[:GroupHeader "Foo" ": " "B" "\n"]
[:GroupComment [:ws " "] "\"" "This is instance B of type Foo" "\"\n"]
[:GroupBody
[:Line " Bar: X" "\n"]]]
[:Group
[:GroupHeader "Foo" ": " "C" "\n"]
[:GroupComment [:ws " "] "\"" "This is instance C of type Foo" "\"\n"]
[:GroupBody
[:Line " Bar: Y" "\n"]]]]在"Foo“之后添加一个冒号,在已标注的字符串中添加一个冒号将不是问题。(当然,上面的语法非常简单--我想您可能希望在Bar:等地开始嵌套组。)
https://stackoverflow.com/questions/29012801
复制相似问题