如何使用clojure.data.xml删除空xml标记?
给定一个名称空间的xml(在此示例中被忽略) 到目前为止,我的解决方案还不完整,看起来一些递归可能会有所帮助:如何使用clojure.data.xml删除空xml标记?,xml,clojure,Xml,Clojure,给定一个名称空间的xml(在此示例中被忽略) 到目前为止,我的解决方案还不完整,看起来一些递归可能会有所帮助: (require '[clojure.data.xml :as xml]) (defn- child-element? [e] (let [content (:content e)] (and (= (count content) (count (filter #(instance? clojure.data.xml.node.Element %)
(require '[clojure.data.xml :as xml])
(defn- child-element? [e]
(let [content (:content e)]
(and (= (count content)
(count (filter #(instance? clojure.data.xml.node.Element %) content))))))
(defn remove-empty-tags
[xml-data]
(let [empty-tags? #(or (empty? %) (-> % .toString blank?))]
(reduce (fn [col e]
(if-not (empty-tags? (:content e))
(merge col e)
col)))
xml-data))
(def body (slurp "sample.xml")) ;; the above xml
(def xml-data (-> (xml/parse (java.io.StringReader. body)) :content))
(remove-empty-tags xml-data)
转换为xml后,将返回:
<foo>
<name>John</name>
<address>1 hacker way</address>
<school>
<name/>
<state/>
</school>
<college>
<name>mit</name>
<address/>
<state/>
</college>
</foo>
约翰
1黑客之路
麻省理工学院
显然,此函数需要递归,才能使用子元素?
删除空的子节点
建议?这里有一个非常简单的解决方案,使用
clojure.walk/postwark
:
(defn删除空元素[xml数据]
(clojure.walk/postwark)
(fn[v]
(续)
(和(实例?clojure.data.xml.Element v)
(每?空(:内容五)))
nil;;没有内容的nil out元素
(实例?clojure.data.xml.Element v)
(更新v:content#(过滤部分?);;过滤内容中的nils
(五)
xml数据)
其工作原理是首先遍历XML数据深度,用no:content
替换元素为nil,然后从其他元素的:content
集合中过滤出这些nil
注意:cond
中的第二个(instance?clojure.data.xml.Element v)
子句可以省略,因为xml/emit str
忽略:content
集合中的nils,也就是说,它将以任何方式发出相同的字符串
(println(xml/emit str(remove empty elements xml data)))
格式化输出:
约翰
1黑客之路
麻省理工学院
您可以轻松地使用。下面是一个介绍。对于您的问题:
(let [xml-data "<foo>
<name>John</name>
<address>1 hacker way</address>
<phone></phone>
<school>
<name></name>
<state></state>
<type></type>
</school>
<college>
<name>mit</name>
<address></address>
<state></state>
</college>
</foo> "]
结果:
(hid->hiccup root-hid) =>
[:foo
[:name "John"]
[:address "1 hacker way"]
[:phone]
[:school [:name] [:state] [:type]]
[:college [:name "mit"] [:address] [:state]]]
(hid->hiccup root-hid) =>
[:foo
[:name "John"]
[:address "1 hacker way"]
[:college
[:name "mit"]]]
我们可以遍历树并删除空节点,如下所示:
(walk-tree root-hid {:leave (fn [hid]
(when (empty-leaf-hid? hid)
(remove-hid hid)))})
结果:
(hid->hiccup root-hid) =>
[:foo
[:name "John"]
[:address "1 hacker way"]
[:phone]
[:school [:name] [:state] [:type]]
[:college [:name "mit"] [:address] [:state]]]
(hid->hiccup root-hid) =>
[:foo
[:name "John"]
[:address "1 hacker way"]
[:college
[:name "mit"]]]
更新 现场代码
更新#2 如果您想运行代码,则需要在
ns
表单中提供如下内容(请参见上面的实时代码示例):
我能够通过递归和reduce(我最初的部分答案,complete)的组合实现这一点。关键是在递归中传递每个节点的头部,因此reduce可以将子节点的变换附加到头部
(defn- child-element? [e]
(let [content (:content e)]
(and (= (count content)
(count (filter #(instance? clojure.data.xml.node.Element %) content))))))
(defn- empty-element? [e]
(println "empty-element" e)
(or (empty? e) (-> e .toString blank?)))
(defn element? [e]
(and (instance? clojure.lang.LazySeq e)
(instance? clojure.data.xml.node.Element (first e))))
(defn remove-empty-elements!
"Remove empty elements (and child elements) in an xml"
[head xml-data]
(let [data (if (seq? xml-data) xml-data (:content xml-data))
rs (reduce (fn [col e]
(let [content (:content e)]
(cond
(empty-element? content)
col
(and (not (element? content)) (not (every? empty-element? content)))
(merge col e)
(and (element? content) (every? true? (map #(empty-element? (:content %)) content)))
col
(and (child-element? content))
(let [_head (xml/element (:tag e) {})]
(merge col (remove-empty-element! _head content)))
:else col)))
[]
data)]
(assoc head :content rs)))
;; test
(remove-empty-element! xml-data (xml/element (:tag xml-data) {}))
您可以指定如何调用该函数吗?xml数据中有什么?等…当然,我在上面添加了调用代码。感谢分享此代码。但是,当我运行这段代码时,它输出相同的输入xml,没有任何更改。你能确认一下这是否有效吗?我曾尝试使用postwalk,但它会遍历'element'记录中的每个关键字,而不是整个元素,因此逻辑并不简单。请注意,
(empty?(filter some?x))
编写得更好,因为(every?nil?x)
不能使用nil检查,因为空字符串需要剥离;这是理想的:#(或(空?%)(>%.toString空?)
@pri我逐字使用了您问题中的XML。walk函数将遍历输入中的每个元素,直至各个原语。这就是为什么cond
有子句来查看特定值,而其他所有内容都保持不变。感谢您指出这一点。我以前查看过这个库,但它有依赖项nightmare有十几个不相关的lib,包括试剂、重构、模式、数据和文档都不清楚。也就是说,你能用一个函数中的代码编辑帖子,这样我就可以测试它吗?当它们在复制/粘贴时断裂时,很难读到拼接有注释的代码。明白了。tupelo forest可能是一个很好的标准没有deps的e库。我无法解析符号漫游树
,尽管(:use tupelo.core tupelo.forest)
。试剂/重新帧内容已丢失(只是临时合并问题)。没有datomic
依赖项。Tupelo将使用Clojure 1.8或更高版本-只是调整了设置。关于dotest
,请参阅tst.Tupelo.forest示例
命名空间的顶部(上面的更新#2).dotest
是对clojure.test/deftest
的增强最后一点注意:该函数在生产数据中使用时不起作用,因为它在enlive上爆炸并带有长堆栈跟踪,未找到匹配方法:getBytes for class clojure.lang.PersistentArrayMap Reflector.java:53 clojure.lang.Reflector/invokeMatchingMethodReflector.java:28 clojure.lang.Reflector/invokeInstanceMethod string.cljc:279 tupelo.string$eval23128$string\u GT\u stream\u 23133$fn\u 23134/invoke string.cljc:276 tupelo.string$eval23128$string\u GT\u stream\u 23133/invoke forest.cljc:538
(ns tst.tupelo.forest-examples
(:use tupelo.core tupelo.forest tupelo.test)
...)
(defn- child-element? [e]
(let [content (:content e)]
(and (= (count content)
(count (filter #(instance? clojure.data.xml.node.Element %) content))))))
(defn- empty-element? [e]
(println "empty-element" e)
(or (empty? e) (-> e .toString blank?)))
(defn element? [e]
(and (instance? clojure.lang.LazySeq e)
(instance? clojure.data.xml.node.Element (first e))))
(defn remove-empty-elements!
"Remove empty elements (and child elements) in an xml"
[head xml-data]
(let [data (if (seq? xml-data) xml-data (:content xml-data))
rs (reduce (fn [col e]
(let [content (:content e)]
(cond
(empty-element? content)
col
(and (not (element? content)) (not (every? empty-element? content)))
(merge col e)
(and (element? content) (every? true? (map #(empty-element? (:content %)) content)))
col
(and (child-element? content))
(let [_head (xml/element (:tag e) {})]
(merge col (remove-empty-element! _head content)))
:else col)))
[]
data)]
(assoc head :content rs)))
;; test
(remove-empty-element! xml-data (xml/element (:tag xml-data) {}))