文章/答案/技术大牛

发布

社区首页 >问答首页 >具有意外结果的Clojure re-seq regex

问具有意外结果的Clojure re-seq regex
EN

Stack Overflow用户

提问于 2015-03-12 14:43:55

回答 1查看 113关注 0票数 1

我在为下列问题找出正确的正则表达式时遇到了一些困难：

我有一个输入文件，我试图根据关键字表达式将其分组。下面是该文件的一个示例(让我们称之为本例1)：

Foo: B
  "This is instance B of type Foo"
  Bar: X
  etc.

Foo: C
  "This is instance C of type Foo"
  Bar: Y
  etc.

以下指标如下：

#"(?s)(Foo:)(?:(?!Foo:).)*"

工作像一个魅力，并产生了我的预期结果：

(["Foo: B\n  \"This is instance B of type Foo\"\n  Bar: X\n  etc.\n\n"
  "Foo:"]
 ["Foo: C\n  \"This is instance C of type Foo\"\n  Bar: Y\n  etc.\n\n\n"
  "Foo:"])

但是，如果有人在注释“Foo”中添加了一个冒号，它就会变得很奇怪，结果是：

(["Foo: B\n  \"This is instance B of type " "Foo:"]
 ["Foo:\"\n  Bar: X\n  etc.\n\n" "Foo:"]
 ["Foo: C\n  \"This is instance C of type Foo\"\n  Bar: Y\n  etc.\n\n\n"
  "Foo:"])

如果在测试中从输入中删除Foo: C and it's content并将regex更改为：

"(?s)(Foo:)(?:(?!\"Foo:\").)*"

我得到了预期的结果

(["Foo: B\n  \"This is instance B of type Foo:\"\n  Bar: X\n  etc.\n\n\n\n"
  "Foo:"])

但是，将Foo: C重新添加到混合中，它不再尊重边界并导致：

(["Foo: B\n  \"This is instance B of type Foo:\"\n  Bar: X\n  etc.\n\nFoo: C\n  \"This is instance C of type Foo:\"\n  Bar: Y\n  etc.\n\n\n\n"
  "Foo:"])

我尝试过这样做，但没有结果：#"(?s)(Foo:)(?:(?!Foo:|\"Foo:\").)*"说出了几千次不成功的回转。

我很感谢你的帮助。其目的是与regex一起执行文件分块。

当前的解决方案离开了regex，因为它太微妙了，无法处理我所需要的简单块。第一个解决方案是循环/重复情况，其中有几个(太多)条件和突变原子作为累积映射。

我一直渴望用reduce做一些具体的事情，虽然可能不是最好的应用程序，但我在这个练习中学到了它，并且删除了过多的代码行。

(def owl-type-map
    {
     "Prefix:"               :prefixes
     "AnnotationProperty:"   :annotation-properties
     "Ontology:"             :ontology
     "Datatype:"             :data-types
     "DataProperty:"         :data-properties
     "ObjectProperty:"       :object-properties
     "Class:"                :classes
     "Individual:"           :individuals
     "EquivalentClasses:"    :miscellaneous
     "DisjointClasses:"      :miscellaneous
     "EquivalentProperties:" :miscellaneous
     "DisjointProperties:"   :miscellaneous
     "SameIndividual:"       :miscellaneous
     "DifferentIndividuals:" :miscellaneous
     })

  (def owl-control (reduce #(assoc %1 (second %2) nil) {:current nil} owl-type-map))

  (def space-split #(s/split (str %) #" "))

  (defn owl-chunk
    "Reduce ready function to accumulate a series of strings associated to
    particular instaparse EBNF productions (e.g. Class:, Prefix:, Ontology:).
    owl-type-map refers to the association between owl-type (string) and EBNF production"
    [acc v]
    (let [odex  (:current acc)
          stip  ((comp first space-split) v)
          index (get owl-type-map stip odex)
          imap  (if (= index odex) acc (assoc-in k [:current] index))
          ]
      (assoc-in imap [index] (str (get imap index) v "\n"))))

;; Calling

(reduce owl-chunk owl-control s)

regex

clojure

回答 1

Stack Overflow用户

发布于 2015-03-12 19:13:54

您可能需要考虑使用解析器生成器。Mark的英斯塔帕斯是Clojure的一个出色的解析库，它的目的是让这成为一个简单的选择--如果上下文无关的语法和正则表达式一样容易使用，那么它的自述文件的第一行是什么？

下面是一个示例，说明如何使用它解析示例输入：

;; [instaparse "1.3.5"]
(require '[instaparse.core :as insta])

(def p (insta/parser "

S = Group*
Group = GroupHeader GroupComment GroupBody
GroupHeader = #'[A-Za-z]+' ': ' #'[A-Za-z]+' '\n'
GroupComment = ws? '\"' #'[^\"]+' '\"\n'
GroupBody = Line*
Line = #'.*' '\n'
ws = #'\\s+'

"))

(p "Foo: B
  \"This is instance B of type Foo\"
  Bar: X
Foo: C
  \"This is instance C of type Foo\"
  Bar: Y
")
;;=
[:S
 [:Group
  [:GroupHeader "Foo" ": " "B" "\n"]
  [:GroupComment [:ws "  "] "\"" "This is instance B of type Foo" "\"\n"]
  [:GroupBody
   [:Line "  Bar: X" "\n"]]]
 [:Group
  [:GroupHeader "Foo" ": " "C" "\n"]
  [:GroupComment [:ws "  "] "\"" "This is instance C of type Foo" "\"\n"]
  [:GroupBody
   [:Line "  Bar: Y" "\n"]]]]

在"Foo“之后添加一个冒号，在已标注的字符串中添加一个冒号将不是问题。(当然，上面的语法非常简单--我想您可能希望在Bar:等地开始嵌套组。)

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/29012801

复制

相似问题

问具有意外结果的Clojure re-seq regex
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问具有意外结果的Clojure re-seq regexEN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问具有意外结果的Clojure re-seq regex
EN