Warning: file_get_contents(/data/phpspider/zhask/data//catemap/6/haskell/8.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
忽略流式xml管道中的子树_Xml_Haskell_Xml Conduit - Fatal编程技术网

忽略流式xml管道中的子树

忽略流式xml管道中的子树,xml,haskell,xml-conduit,Xml,Haskell,Xml Conduit,仅列出了整个XML树被ConduitM使用的示例,例如: <people> <person age="25">Michael</person> <person age="2">Eliezer</person> </people> <people> <person age="25">Michael</person> <tagImNotIntereste

仅列出了整个XML树被
ConduitM
使用的示例,例如:

<people>
    <person age="25">Michael</person>
    <person age="2">Eliezer</person>
</people>
<people>
    <person age="25">Michael</person>
    <tagImNotInterestedIn><!-- deeply nested complex subtree --></tagImNotInterestedIn>
    <person age="2">Eliezer</person>
</people>
基本上,我正在寻找一种方法来忽略任何标记(包括它的所有子项和属性),但我为其指定解析器的特定标记除外。当使用基于DOM的解析器(如HXT)时,这显然很容易,但是显式声明它将失败,除非所有子项都被使用

我能想到的实现这一点的唯一假设方法是使用
Control.Exception
中的函数来构建一个带有
可能是
结果的管道(在异常时返回
Nothing
),然后使用
orE
将其与解析器本身结合起来


尽管xml管道API需要一些更新,但我认为必须有一种不太老练的方法来忽略整个子树。任何想法都将不胜感激

自1.5.0以来
Text.XML.Stream.Parse提供了一个函数,该函数可能用于此目的

{-# LANGUAGE OverloadedStrings #-}

import           Control.Monad                (void)
import           Control.Monad.Trans.Class    (lift)
import           Control.Monad.Trans.Resource (MonadThrow, runResourceT)
import           Data.ByteString.Lazy         (ByteString)
import           Data.ByteString.Lazy.Char8   (concat)
import           Data.Conduit                 (ConduitT, runConduit, (.|))
import           Data.Conduit.List            (mapM_)
import           Data.Text                    (Text, unpack)
import           Data.XML.Types               (Event)
import           Prelude                      hiding (concat, mapM_)
import           Text.XML.Stream.Parse        (choose, content, def,
                                               ignoreAnyTreeContent,
                                               ignoreAttrs, manyYield, many_,
                                               parseLBS, requireAttr, tag',
                                               tagNoAttr, takeTree)

data Person = Person Int Text deriving Show

parsePerson :: MonadThrow m => ConduitT Event o m (Maybe Person)
parsePerson = tag' "person" (requireAttr "age") $ \age -> do
    name <- content
    return $ Person (read $ unpack age) name

parsePeople :: MonadThrow m => ConduitT Event Person m ()
parsePeople = void $ tagNoAttr "people" $
  many_ (choose([takeTree "person" ignoreAttrs, ignoreAnyTreeContent])) .| manyYield parsePerson

persons :: ByteString
persons = concat [
    "<people>"
  , "<foo/>"
  , "<person age=\"25\">Michael</person>"
  , "<bar/>"
  , "<person age=\"2\">Eliezer</person>"
  , "<tagImNotInterestedIn>x</tagImNotInterestedIn>"
  , "</people>"

main :: IO ()
main = runResourceT $
  runConduit $ parseLBS def persons .| parsePeople .| mapM_ (lift . print)

使用流式解析器,您必须解析所有内容,直到到达包含当前子树的标记的close事件。当您离开
标记的内部解析器时,它希望立即解析结束标记。遗憾的是,xml管道中没有用于此的函数。一种方法是编写一个自定义接收器,计算开始标记(结束时递减),并将其间的所有内容都丢弃,直到计数器达到零。@Pinoid我提交了一个pull请求,该请求添加了执行此任务的函数。通过使用递归,我不必使用计数器,即使这是可能的。合并后我会发布一个答案。同时,感谢您的支持@UliKöhler的公关有没有登陆过?你能链接到它吗?@Flip原始公关,但我没有时间完成它,所以pavelkogan介入并创建了一个合并的项目
{-# LANGUAGE OverloadedStrings #-}

import           Control.Monad                (void)
import           Control.Monad.Trans.Class    (lift)
import           Control.Monad.Trans.Resource (MonadThrow, runResourceT)
import           Data.ByteString.Lazy         (ByteString)
import           Data.ByteString.Lazy.Char8   (concat)
import           Data.Conduit                 (ConduitT, runConduit, (.|))
import           Data.Conduit.List            (mapM_)
import           Data.Text                    (Text, unpack)
import           Data.XML.Types               (Event)
import           Prelude                      hiding (concat, mapM_)
import           Text.XML.Stream.Parse        (choose, content, def,
                                               ignoreAnyTreeContent,
                                               ignoreAttrs, manyYield, many_,
                                               parseLBS, requireAttr, tag',
                                               tagNoAttr, takeTree)

data Person = Person Int Text deriving Show

parsePerson :: MonadThrow m => ConduitT Event o m (Maybe Person)
parsePerson = tag' "person" (requireAttr "age") $ \age -> do
    name <- content
    return $ Person (read $ unpack age) name

parsePeople :: MonadThrow m => ConduitT Event Person m ()
parsePeople = void $ tagNoAttr "people" $
  many_ (choose([takeTree "person" ignoreAttrs, ignoreAnyTreeContent])) .| manyYield parsePerson

persons :: ByteString
persons = concat [
    "<people>"
  , "<foo/>"
  , "<person age=\"25\">Michael</person>"
  , "<bar/>"
  , "<person age=\"2\">Eliezer</person>"
  , "<tagImNotInterestedIn>x</tagImNotInterestedIn>"
  , "</people>"

main :: IO ()
main = runResourceT $
  runConduit $ parseLBS def persons .| parsePeople .| mapM_ (lift . print)
λ> main
Person 25 "Michael"
Person 2 "Eliezer"