Clojure 如何使用开始/停止谓词对列表的连续元素进行分组?
假设我有一个如下列表:Clojure 如何使用开始/停止谓词对列表的连续元素进行分组?,clojure,Clojure,假设我有一个如下列表: (def data [:a :b :c :d :e :f :g :h :b :d :x]) 和谓词,如: (defn start? [x] (= x :b)) (defn stop? [x] (= x :d)) 标记子序列的第一个和最后一个元素。我想返回一个包含子组的列表,如下所示: (parse data) => [:a [:b :c :d] :e :f :g :h [:b :d] :x] 如何使用Clojure完成此任务?Clojure函数与拆分可用于完
(def data [:a :b :c :d :e :f :g :h :b :d :x])
和谓词,如:
(defn start? [x] (= x :b))
(defn stop? [x] (= x :d))
标记子序列的第一个和最后一个元素。我想返回一个包含子组的列表,如下所示:
(parse data) => [:a [:b :c :d] :e :f :g :h [:b :d] :x]
如何使用Clojure完成此任务?Clojure函数
与
拆分可用于完成大部分工作。唯一棘手的是使子组也包含stop?
值。这里有一个解决方案:
(ns tst.demo.core
(:use tupelo.core demo.core tupelo.test))
(def data [:a :b :c :d :e :f :g :h :b :d :x])
(defn start? [x] (= x :b))
(defn stop? [x] (= x :d))
(defn parse [vals]
(loop [result []
vals vals]
(if (empty? vals)
result
(let [[singles group-plus] (split-with #(not (start? %)) vals)
[grp* others*] (split-with #(not (stop? %)) group-plus)
grp (glue grp* (take 1 others*))
others (drop 1 others*)
result-out (cond-it-> (glue result singles)
(not-empty? grp) (append it grp))]
(recur result-out others)))))
结果:
(parse data) => [:a [:b :c :d] :e :f :g :h [:b :d] :x]
我们使用t/glue
和t/append
这样我们就可以(不像conj
使用列表的开头那样)
更新 最后使用
cond-it->
来避免粘在空的[]
向量上有点难看。后来我想到,这是一种相互递归的形式,非常适合trampoline
函数:
(ns tst.demo.core
(:use tupelo.core demo.core tupelo.test))
(def data [:a :b :c :d :e :f :g :h :b :d :x])
(defn start? [x] (= x :b))
(defn stop? [x] (= x :d))
(declare parse-singles parse-group)
(defn parse-singles [result vals]
(if (empty? vals)
result
(let [[singles groupies] (split-with #(not (start? %)) vals)
result-out (glue result singles)]
#(parse-group result-out groupies))))
(defn parse-group [result vals]
(if (empty? vals)
result
(let [[grp-1 remaining] (split-with #(not (stop? %)) vals)
grp (glue grp-1 (take 1 remaining))
singlies (drop 1 remaining)
result-out (append result grp)]
#(parse-singles result-out singlies))))
(defn parse [vals]
(trampoline parse-singles [] vals))
(dotest
(spyx (parse data)))
(parse data) => [:a [:b :c :d] :e :f :g :h [:b :d] :x]
请注意,对于任何大小合理的解析任务(假设对parse singles
和parse group
的调用不到几千次,您真的不需要使用trampoline
。在这种情况下,只需从对parse singles
和parse group
的两个调用中删除trampoline
,并从parse
的定义中删除即可>
Clojure备忘单
与往常一样,不要忘记您可以使用自定义状态传感器:
(defn subgroups [start? stop?]
(let [subgroup (volatile! nil)]
(fn [rf]
(fn
([] (rf))
([result] (rf result))
([result item]
(let [sg @subgroup]
(cond
(and (seq sg) (stop? item))
(do (vreset! subgroup nil)
(rf result (conj sg item)))
(seq sg)
(do (vswap! subgroup conj item)
result)
(start? item)
(do (vreset! subgroup [item])
result)
:else (rf result item))))))))
(into []
(subgroups #{:b} #{:d})
[:a :b :c :d :e :f :g :h :b :d :x])
; => [:a [:b :c :d] :e :f :g :h [:b :d] :x]
这是一个使用lazy seq和split with的版本。
关键是考虑序列中每个元素需要生成什么,在这种情况下,伪代码如下所示:
;; for each element (e) in the input sequence
if (start? e)
(produce values up to an including (stop? e))
else
e
实现它的Clojure代码并不比上面的描述长多少
(def data [:a :b :c :d :e :f :g :h :b :d :x])
(def start? #(= :b %))
(def stop? #(= :d %))
(defn parse [vals]
(when-let [e (first vals)]
(let [[val rst] (if (start? e)
(let [[run remainder] (split-with (complement stop?) vals)]
[(concat run [(first remainder)]) (rest remainder)])
[e (rest vals)])]
(cons val (lazy-seq (parse rst))))))
;; this produces the following output
(parse data) ;; => (:a (:b :c :d) :e :f :g :h (:b :d) :x)
看起来用
拆分应该是个不错的选择,但是meh
(loop [data data
res []]
(let [[left tail] (split-with (comp not start?) data)
[group [stop & new-data]] (split-with (comp not stop?) tail)
group (cond-> (vec group) stop (into [stop]))
new-res (cond-> (into res left)
(seq group) (into [group]))]
(if (seq new-data)
(recur new-data new-res)
new-res)))
我喜欢这个,但是注意这个问题并没有说明如果发现一个起始元素但没有找到停止元素的行为应该是什么。如果一个子组被打开,转换器将截断输入序列,这可能是意外的/不可取的。考虑停止元素被移除的例子:
(into [] (subgroups #{:b} #{:d}) [:a :b :c :e :f :g :h :b :x])
=> [:a] ;; drops inputs from before (last) subgroup opens
在这种情况下,传感器具有可用于冲洗任何开放子组的完整算术:
完成(arity 1)-某些进程不会结束,但对于那些结束的进程(如transduce),完成arity用于生成最终值和/或刷新状态。此arity必须准确调用xf完成arity一次
本示例与原始传感器示例的唯一区别在于完整性:
然后将刷新悬空、打开的组:
(into [] (subgroups-all #{:b} #{:d}) [:a :b :c :d :e :f :g :h :b :x])
=> [:a [:b :c :d] :e :f :g :h [:b :x]]
(into [] (subgroups-all #{:b} #{:d}) [:a :b :c :e :f :g :h :b :x])
=> [:a [:b :c :e :f :g :h :b :x]]
请注意,在上一个示例中,嵌套的开始/打开不会导致嵌套分组,这让我想到了另一个解决方案
嵌套组和拉链
当我把这更一般地看作是“打开”序列时,我想到了拉链:
(defn unflatten [open? close? coll]
(when (seq coll)
(z/root
(reduce
(fn [loc elem]
(cond
(open? elem)
(-> loc (z/append-child (list elem)) z/down z/rightmost)
(and (close? elem) (z/up loc))
(-> loc (z/append-child elem) z/up)
:else (z/append-child loc elem)))
(z/seq-zip ())
coll))))
这将在空列表上创建一个拉链,并在输入序列上使用reduce
进行构建。它使用一对谓词来打开/关闭组,并允许任意嵌套组:
(unflatten #{:b} #{:d} [:a :b :c :b :d :d :e :f])
=> (:a (:b :c (:b :d) :d) :e :f)
(unflatten #{:b} #{:d} [:a :b :c :b :d :b :b :d :e :f])
=> (:a (:b :c (:b :d) (:b (:b :d) :e :f)))
(unflatten #{:b} #{:d} [:b :c :e :f])
=> ((:b :c :e :f))
(unflatten #{:b} #{:d} [:d :c :e :f])
=> (:d :c :e :f)
(unflatten #{:b} #{:d} [:c :d])
=> (:c :d)
(unflatten #{:b} #{:d} [:c :d :b])
=> (:c :d (:b))
只是因为我喜欢FSM和快板凳
(let [start? #(= % :b)
stop? #(= % :d)
data [:a :b :c :d :e :f :g :h :b :d :x]]
(letfn [(start [result [x & xs]]
#(collect-vec (conj result [x]) xs))
(collect-vec [result [x & xs]]
#(if (nil? x)
result
((if (stop? x) initial collect-vec)
(conj (subvec result 0 (dec (count result))) (conj (last result) x)) xs)))
(collect [result [x & xs]]
#(initial (conj result x) xs))
(initial [result [x & xs :as v]]
(cond (nil? x) result
(start? x) #(start result v)
:else (fn [] (collect result v))))]
(trampoline initial [] data)))
如果性能不是一个问题,我会使用它。首先,为要分析的数据定义语法:
(ns playground.startstop
(:require [clojure.spec.alpha :as spec]))
(defn start? [x] (= x :b))
(defn stop? [x] (= x :d))
(spec/def ::not-start-stop #(and (not (start? %))
(not (stop? %))))
(spec/def ::group (spec/cat :start start?
:contents (spec/* ::not-start-stop)
:stop stop?))
(spec/def ::element (spec/alt :group ::group
:primitive ::not-start-stop))
(spec/def ::elements (spec/* ::element))
现在,您可以使用conform
函数解析数据:
(def data [:a :b :c :d :e :f :g :h :b :d :x])
(spec/conform ::elements data)
;; => [[:primitive :a] [:group {:start :b, :contents [:c], :stop :d}] [:primitive :e] [:primitive :f] [:primitive :g] [:primitive :h] [:group {:start :b, :stop :d}] [:primitive :x]]
上面的输出不是我们想要的,因此我们定义函数来呈现结果:
(defn render [[type data]]
(case type
:primitive data
:group `[~(:start data) ~@(:contents data) ~(:stop data)]))
并将其映射到已解析的数据上:
(mapv render (spec/conform ::elements data))
;; => [:a [:b :c :d] :e :f :g :h [:b :d] :x]
这种基于规范的解决方案可能不是最快的代码,但它易于理解、维护、扩展和调试
(defn render [[type data]]
(case type
:primitive data
:group `[~(:start data) ~@(:contents data) ~(:stop data)]))
(mapv render (spec/conform ::elements data))
;; => [:a [:b :c :d] :e :f :g :h [:b :d] :x]