xml-conduit documentation仅列出由ConduitM使用整个XML树的示例,例如:
<people>
<person age="25">Michael</person>
<person age="2">Eliezer</person>
</people>我正在尝试解析一个树,其中除了上面的<person>标记之外,还有一些我不感兴趣的深度嵌套的子树(它们的确切模式甚至可能是未知的),例如:
<people>
<person age="25">Michael</person>
<tagImNotInterestedIn><!-- deeply nested complex subtree --></tagImNotInterestedIn>
<person age="2">Eliezer</person>
</people>当使用the people.hs example from the docs进行解析时,我得到了以下异常:
people.hs: XmlException {xmlErrorMessage = "Expected end tag for: Name {nameLocalName = \"people\", nameNamespace = Nothing, namePrefix = Nothing}", xmlBadInput = Just (EventBeginElement (Name {nameLocalName = "tagImNotInterestedIn", nameNamespace = Nothing, namePrefix = Nothing}) [])}基本上,我正在寻找一种方法来忽略任何标记(包括它的所有子标记和属性),除了我为其指定解析器的特定标记。当使用像HXT这样的基于DOM的解析器时,这显然很容易,但是tag docs明确指出,除非使用所有的子对象,否则它将失败。
我能想到的实现此目的的唯一假设方法是使用Control.Exception中的函数来构建一个具有Maybe a结果(异常时返回Nothing )的管道,然后使用orE将其与解析器本身结合起来
尽管has been stated认为xml-conduit API需要一些更新,但我认为必须有一种不那么繁琐的方式来忽略整个子树。任何想法都将受到感谢!
发布于 2018-08-02 04:01:44
由于1.5.0 Text.XML.Stream.Parse提供了函数takeTree,因此可能可以使用该函数来实现此目的。
{-# LANGUAGE OverloadedStrings #-}
import Control.Monad (void)
import Control.Monad.Trans.Class (lift)
import Control.Monad.Trans.Resource (MonadThrow, runResourceT)
import Data.ByteString.Lazy (ByteString)
import Data.ByteString.Lazy.Char8 (concat)
import Data.Conduit (ConduitT, runConduit, (.|))
import Data.Conduit.List (mapM_)
import Data.Text (Text, unpack)
import Data.XML.Types (Event)
import Prelude hiding (concat, mapM_)
import Text.XML.Stream.Parse (choose, content, def,
ignoreAnyTreeContent,
ignoreAttrs, manyYield, many_,
parseLBS, requireAttr, tag',
tagNoAttr, takeTree)
data Person = Person Int Text deriving Show
parsePerson :: MonadThrow m => ConduitT Event o m (Maybe Person)
parsePerson = tag' "person" (requireAttr "age") $ \age -> do
name <- content
return $ Person (read $ unpack age) name
parsePeople :: MonadThrow m => ConduitT Event Person m ()
parsePeople = void $ tagNoAttr "people" $
many_ (choose([takeTree "person" ignoreAttrs, ignoreAnyTreeContent])) .| manyYield parsePerson
persons :: ByteString
persons = concat [
"<people>"
, "<foo/>"
, "<person age=\"25\">Michael</person>"
, "<bar/>"
, "<person age=\"2\">Eliezer</person>"
, "<tagImNotInterestedIn>x</tagImNotInterestedIn>"
, "</people>"
main :: IO ()
main = runResourceT $
runConduit $ parseLBS def persons .| parsePeople .| mapM_ (lift . print)上面的代码是基于xml-conduit sample的。仅更改parsePeople。
λ> main
Person 25 "Michael"
Person 2 "Eliezer"https://stackoverflow.com/questions/24336541
复制相似问题