Clojure 在谓词真值测试更改后对惰性序列进行分区_Clojure_Functional Programming_Clojurescript

Clojure 在谓词真值测试更改后对惰性序列进行分区

clojure functional-programming

Clojure 在谓词真值测试更改后对惰性序列进行分区,clojure,functional-programming,clojurescript,Clojure,Functional Programming,Clojurescript,考虑以惰性顺序存储的句子：每个单词都是一个条目，但标点符号属于单词： ("It's" "time" "when" "it's" "time!" "What" "did" "you" "say?" "Nothing!") 现在应该用句子来“划分”。我编写了一个助手函数last particted？，它检查最后一个字符是否是非字母字符。（这没问题）预期结果： (("It's" "time" "when" "it's" "time!") ("What" "did" "you" "say?") ("

考虑以惰性顺序存储的句子：每个单词都是一个条目，但标点符号属于单词：

("It's" "time" "when" "it's" "time!" "What" "did" "you" "say?" "Nothing!")

现在应该用句子来“划分”。我编写了一个助手函数last particted？，它检查最后一个字符是否是非字母字符。（这没问题）

预期结果：

(("It's" "time" "when" "it's" "time!") ("What" "did" "you" "say?") ("Nothing!"))

一切都应该保持懒惰。不幸的是，我不能使用partition by：这个函数在给定谓词的结果改变之前分裂，这意味着带标点的条目不会被解释为子序列中的最后一个条目。

当输入的大小与输出的大小不同时，答案通常是使用

reduce

(defn last-word? [word]
  (assert word)
  (or (.endsWith word "!")
      (.endsWith word "?")))

(defn make-sentence [in]
  (reduce (fn [acc ele]
            (let [up-to-current-sentence (vec (butlast acc))
                  last-word-last-sentence (-> acc last last)
                  new-sentence? (when last-word-last-sentence (last-word? last-word-last-sentence))
                  current-sentence (vec (last acc))]
              (if new-sentence?
                (conj acc [ele])
                (conj up-to-current-sentence (conj current-sentence ele)))))
          [] in))

不幸的是，

reduce

需要结束，因此无法使用惰性输入。有讨论。

我建议使用

惰性seq

。没有比这更好的了（也许这不是最好的）：

答复：

user> (let [items '("It's" "time" "when" "it's"
                    "time!" "What" "did" "you"
                    "say?" "Nothing!")]
        (parts items (comp #{\? \! \. \,} last)))

(("It's" "time" "when" "it's" "time!") ("What" "did" "you" "say?") ("Nothing!"))

user> (let [items '("what?" "It's" "time" "when" "it's"
                    "time!" "What" "did" "you"
                    "say?" "Nothing!")]
        (parts items (comp #{\? \! \. \,} last)))

(("what?") ("It's" "time" "when" "it's" "time!") ("What" "did" "you" "say?") ("Nothing!"))

user> (let [items '("what?" "It's" "time" "when" "it's"
                    "time!" "What" "did" "you"
                    "say?" "Nothing!")]
        (realized? (parts items (comp #{\? \! \. \,} last))))

false

更新：可能与

迭代相同的方法会更好
(defn parts [items pred]
  (->> [nil items]
       (iterate (fn [[_ items]]
                  (let [[l r] (split-with (complement pred) items)]
                    [(concat l (take 1 r)) (rest r)])))
       rest
       (map first)
       (take-while seq)))

通过生成一个新序列，包含“分割标记”，然后根据不同的谓词进行分区，实际上可以很容易地表达这个问题
(def punctuation? #{\. \! \?})

(def words ["It's" "time" "when" "it's" "time!" "What" "did" "you" "say?" "Nothing!"])

(defn partition-sentences [ws]
  (->> ws
    (mapcat #(if (punctuation? (last %)) [% :br] [%]))
    (partition-by #(= :br %))
    (take-nth 2)))


(println (take 20 (partition-sentences (repeatedly #(rand-nth words))))

reduce
一点也不懒惰。。。所以我猜这不是op想要的东西。我只是在考虑这个事实。即将查找是否可以使reduce
变为懒惰。你可能知道reduce
似乎是这个问题的“目标”。如果能让它变懒就好了。看起来不错。。我现在无法深入研究它。。。但是，只是：你确定那最后还是懒惰吗？啊，还有：一个改进：有一个mapcat，它将使扁平化消失。我已经按照你的建议改为使用mapcat。更新后的示例显示，这永远不会强制实现整个序列。感谢您的努力，我喜欢这种方法的创造性。它工作得很好。最后，我决定使用leetwinski建议的解决方案之一（iterate/lazy-seq），因为我发现“stopper”的使用有点太粗糙，但这是我个人的喜好。就简单性而言，我最喜欢第一个（lazy-seq）版本。对我来说，这很清楚，它没有开销，迭代必须采取：比如：用nil初始化，其余的/（映射优先）。问题是：为什么您认为迭代方法会更好？它是否与递归和堆栈跟踪有关？（与循环/重现类似）
(def punctuation? #{\. \! \?})

(def words ["It's" "time" "when" "it's" "time!" "What" "did" "you" "say?" "Nothing!"])

(defn partition-sentences [ws]
  (->> ws
    (mapcat #(if (punctuation? (last %)) [% :br] [%]))
    (partition-by #(= :br %))
    (take-nth 2)))


(println (take 20 (partition-sentences (repeatedly #(rand-nth words))))