Recursion 功能上按空格分割字符串,按引号分组!

Recursion 功能上按空格分割字符串,按引号分组!,recursion,functional-programming,clojure,Recursion,Functional Programming,Clojure,在Clojure[1]中,编写惯用的函数代码时,如何编写一个函数,用空格分隔字符串,但保持引用的短语完整?当然,一个快速的解决方案是使用正则表达式,但如果没有它们,这应该是可能的。乍一看,这似乎很难!我已经用命令式语言写了一篇类似的文章,但我想看看函数式递归方法是如何工作的 快速检查我们的功能应该做什么: "Hello there!" -> ["Hello", "there!"] "'A quoted phrase'" -> ["A quoted phrase"] "'a' 'b'

在Clojure[1]中,编写惯用的函数代码时,如何编写一个函数,用空格分隔字符串,但保持引用的短语完整?当然,一个快速的解决方案是使用正则表达式,但如果没有它们,这应该是可能的。乍一看,这似乎很难!我已经用命令式语言写了一篇类似的文章,但我想看看函数式递归方法是如何工作的

快速检查我们的功能应该做什么:

"Hello there!"  -> ["Hello", "there!"]
"'A quoted phrase'" -> ["A quoted phrase"]
"'a' 'b' c d" -> ["a", "b", "c", "d"]
"'a b' 'c d'" -> ["a b", "c d"]
"Mid'dle 'quotes do not concern me'" -> ["Mid'dle", "quotes do not concern me"]
我不介意引号之间的间距是否改变(这样就可以先使用简单的空格分割)

[1] 这个问题可以在一般层面上得到回答,但我想Clojure中的函数方法可以轻松地转换为Haskell、ML等。

使用正则表达式:

 (defn my-split [string]
  (let [criterion " +(?=([^']*'[^']*')*[^']*$)"]
   (for [s (into [] (.split string criterion))] (.replace s "'" ""))))
正则表达式中的第一个字符是要拆分字符串的字符,这里至少有一个空格

如果你想更改引用字符,只需将“every”改为其他类似/”


编辑:我刚才看到您明确提到您不想使用正则表达式。对不起!

例如,有一种方法允许您以函数的方式编写解析器。

此解决方案在haskell中,但其主要思想也应该适用于clojure。
解析器的两个状态(引号内或引号外)由两个相互递归的函数表示

splitq = outside [] . (' ':)

add c res = if null res then [[c]] else map (++[c]) res

outside res xs = case xs of
    ' '  : ' '  : ys -> outside res $ ' ' : ys
    ' '  : '\'' : ys -> res ++ inside [] ys
    ' '  : ys        -> res ++ outside [] ys
    c    : ys        -> outside (add c res) ys
    _                -> res

inside res xs = case xs of
    ' '  : ' ' : ys -> inside res $ ' ' : ys
    '\'' : ' ' : ys -> res ++ outside [] (' ' : ys)
    '\'' : []       -> res
    c    : ys       -> inside (add c res) ys
    _               -> res

这是Clojure版本。这可能会破坏非常大输入的堆栈。正则表达式或真正的解析器生成器将更加简洁

(declare parse*)
(defn slurp-word [words xs terminator]
  (loop [res "" xs xs]
    (condp = (first xs)
      nil  ;; end of string after this word
      (conj words res)

      terminator ;; end of word
      (parse* (conj words res) (rest xs))

      ;; else
      (recur (str res (first xs)) (rest xs)))))

(defn parse* [words xs]
  (condp = (first xs)
    nil ;; end of string
    words

    \space  ;; skip leading spaces
    (parse* words (rest xs))

    \' ;; start quoted part
    (slurp-word words (rest xs) \')

    ;; else slurp until space
    (slurp-word words xs \space)))

(defn parse [s]
  (parse* [] s))
你的意见:

user> (doseq [x ["Hello there!"
                 "'A quoted phrase'"
                 "'a' 'b' c d"
                 "'a b' 'c d'"
                 "Mid'dle 'quotes do not concern me'"
                 "'lots    of   spacing' there"]]
        (prn (parse x)))

["Hello" "there!"]
["A quoted phrase"]
["a" "b" "c" "d"]
["a b" "c d"]
["Mid'dle" "quotes do not concern me"]
["lots    of   spacing" "there"]
nil

能够修改Brian的使用trampoline,使其不会耗尽堆栈空间。基本上,使用
slurp word
parse*
返回函数,而不是执行它们,然后将
parse
更改为使用
trampoline

(defn slurp-word [words xs terminator]
  (loop [res "" xs xs]
    (condp = (first xs)
        nil  ;; end of string after this word
      (conj words res)

      terminator ;; end of word
      #(parse* (conj words res) (rest xs))

      ;; else
      (recur (str res (first xs)) (rest xs)))))

(defn parse* [words xs]
  (condp = (first xs)
      nil ;; end of string
    words

    \space  ;; skip leading spaces
    (parse* words (rest xs))

    \' ;; start quoted part
    #(slurp-word words (rest xs) \')

    ;; else slurp until space
    #(slurp-word words xs \space)))

    (defn parse [s]
      (trampoline #(parse* [] s)))


(defn test-parse []
  (doseq [x ["Hello there!"
             "'A quoted phrase'"
             "'a' 'b' c d"
             "'a b' 'c d'"
             "Mid'dle 'quotes do not concern me'"
             "'lots    of   spacing' there"
             (apply str (repeat 30000 "'lots    of   spacing' there"))]]
    (prn (parse x))))

下面是一个版本,它返回一个懒散的单词序列/带引号的字符串:

(defn splitter [s]
  (lazy-seq
   (when-let [c (first s)]
     (cond
      (Character/isSpace c)
      (splitter (rest s))
      (= \' c)
      (let [[w* r*] (split-with #(not= \' %) (rest s))]
        (if (= \' (first r*))
          (cons (apply str w*) (splitter (rest r*)))
          (cons (apply str w*) nil)))
      :else
      (let [[w r] (split-with #(not (Character/isSpace %)) s)]
        (cons (apply str w) (splitter r)))))))
试运行:

user> (doseq [x ["Hello there!"
                 "'A quoted phrase'"
                 "'a' 'b' c d"
                 "'a b' 'c d'"
                 "Mid'dle 'quotes do not concern me'"
                 "'lots    of   spacing' there"]]
        (prn (splitter x)))
("Hello" "there!")
("A quoted phrase")
("a" "b" "c" "d")
("a b" "c d")
("Mid'dle" "quotes do not concern me")
("lots    of   spacing" "there")
nil
user> (doseq [x ["Hello there!"
                 "'A quoted phrase'"
                 "'a' 'b' c d"
                 "'a b' 'c d'"
                 "Mid'dle 'quotes do not concern me'"
                 "'lots    of   spacing' there"
                 "Mid'dle 'quotes do no't concern me'"
                 "'asdf"]]
        (prn (splitter x)))
("Hello" "there!")
("A quoted phrase")
("a" "b" "c" "d")
("a b" "c d")
("Mid'dle" "quotes do not concern me")
("lots    of   spacing" "there")
("Mid'dle" "quotes do no't concern me")
("asdf")
nil
如果输入中的单引号不正确匹配,则从最后一个开头单引号开始的所有内容都将构成一个“单词”:


更新:另一个版本回应edbond的评论,更好地处理单词中的引号字符:

(defn splitter [s]
  ((fn step [xys]
     (lazy-seq
      (when-let [c (ffirst xys)]
        (cond
         (Character/isSpace c)
         (step (rest xys))
         (= \' c)
         (let [[w* r*]
               (split-with (fn [[x y]]
                             (or (not= \' x)
                                 (not (or (nil? y)
                                          (Character/isSpace y)))))
                           (rest xys))]
           (if (= \' (ffirst r*))
             (cons (apply str (map first w*)) (step (rest r*)))
             (cons (apply str (map first w*)) nil)))
         :else
         (let [[w r] (split-with (fn [[x y]] (not (Character/isSpace x))) xys)]
           (cons (apply str (map first w)) (step r)))))))
   (partition 2 1 (lazy-cat s [nil]))))
试运行:

user> (doseq [x ["Hello there!"
                 "'A quoted phrase'"
                 "'a' 'b' c d"
                 "'a b' 'c d'"
                 "Mid'dle 'quotes do not concern me'"
                 "'lots    of   spacing' there"]]
        (prn (splitter x)))
("Hello" "there!")
("A quoted phrase")
("a" "b" "c" "d")
("a b" "c d")
("Mid'dle" "quotes do not concern me")
("lots    of   spacing" "there")
nil
user> (doseq [x ["Hello there!"
                 "'A quoted phrase'"
                 "'a' 'b' c d"
                 "'a b' 'c d'"
                 "Mid'dle 'quotes do not concern me'"
                 "'lots    of   spacing' there"
                 "Mid'dle 'quotes do no't concern me'"
                 "'asdf"]]
        (prn (splitter x)))
("Hello" "there!")
("A quoted phrase")
("a" "b" "c" "d")
("a b" "c d")
("Mid'dle" "quotes do not concern me")
("lots    of   spacing" "there")
("Mid'dle" "quotes do no't concern me")
("asdf")
nil

哦,天哪,既然我的测试成功了,给出的答案似乎比我的要好。不管怎样,我把它贴在这里是为了征求一些关于代码习惯化的意见

我画了一个哈斯凯尔式的假象:

pl p w:ws = | if w:ws empty
               => p
            | if w begins with a quote
               => pli p w:ws
            | otherwise
               => pl (p ++ w) ws

pli p w:ws = | if w:ws empty
                => p
             | if w begins with a quote
                => pli (p ++ w) ws
             | if w ends with a quote
                => pl (init p ++ (tail p ++ w)) ws
             | otherwise
                => pli (init p ++ (tail p ++ w)) ws
好吧,名字不好,好了

  • 函数
    pl
    处理未引用的单词
  • 函数
    pli
    (i为内部)处理引用的短语
  • 参数(列表)
    p
    是已处理(完成)的信息
  • 参数(列表)
    w:ws
    是要处理的信息
我用这种方式翻译了伪代码:

(def quote-chars '(\" \')) ;'

; rewrite .startsWith and .endsWith to support multiple choices
(defn- starts-with?
  "See if given string begins with selected characters."
  [word choices]
  (some #(.startsWith word (str %)) choices))

(defn- ends-with?
  "See if given string ends with selected characters."
  [word choices]
  (some #(.endsWith word (str %)) choices))

(declare pli)
(defn- pl [p w:ws]
    (let [w (first w:ws)
          ws (rest w:ws)]
     (cond
        (nil? w)
            p
        (starts-with? w quote-chars)
            #(pli p w:ws)
        true
            #(pl (concat p [w]) ws))))

(defn- pli [p w:ws]
    (let [w (first w:ws)
          ws (rest w:ws)]
     (cond
        (nil? w)
            p
        (starts-with? w quote-chars)
            #(pli (concat p [w]) ws)
        (ends-with? w quote-chars)
            #(pl (concat 
                  (drop-last p)
                  [(str (last p) " " w)])
                ws)
        true
            #(pli (concat 
                  (drop-last p)
                  [(str (last p) " " w)])
                ws))))

(defn split-line
    "Split a line by spaces, leave quoted groups intact."
    [input]
    (let [splt (.split input " +")]
        (map strip-input 
            (trampoline pl [] splt))))

细节不太像Clojurque。我还依赖于regexp来拆分和剥离引号,因此我应该得到一些否决票。

在带有“中间引号”的示例中“我注意到有一句话被完全遗漏了。这是故意的吗?故意的。我对这个问题的看法是,只有单词的开头和结尾才重要。但我不知道它是否实用……我想一个好的方法是首先在类似于Python的
split
的空格处拆分字符串。这应该是微不足道的。然后,您可能会在列表中查找任何以撇号开头的单词,如果找到一个,则继续查找,直到找到以撇号结尾的单词,然后合并移动的元素。丹尼斯:我的命令式方法就是用这种方法。我正在画一个递归解决方案,但不知道这是否可行……没关系。无论如何,它比我当前的正则表达式解决方案更简洁:)你不能通过这个测试。你[“'ab'”“'cd'”],它应该是[“ab”,“cd”],确实如此。我刚刚修改了它,加入了一个快速修复程序。这与我所画的非常接近!而且,非常酷,它避免了最初的分裂!使用
trampoline
制作一个不影响堆栈的版本是相当容易的。我无法编辑这个,所以我复制了你的,稍微修改了一下,并添加了一个例子,在修改之前把我的机器搞砸了。太好了。。。我不太擅长Clojure的懒镜头。。。该拆分器是否应该与
重复出现
一起使用?但是执行看起来非常惯用,它节省了间距!非常好:)user=>(拆分器“Mid'dle'引号与我无关”(“Mid'dle”“引号与我无关”)如果引号周围有两个字符,我会留下它。@progo:很高兴听到这个消息。:-)至于
递归
,不,惰性seq不能与尾部递归混合。请参阅以了解更多详细信息(这些答案可能是惰性seq的一般介绍,至于惰性seq与尾部递归,我试图在我的答案中指出这一点)@埃德邦:是啊,我太懒了。刚刚编辑的版本应该可以更好地处理这种情况。有一些事情正在发生。。。最后一个字符可能会被吃掉,在
“你好!”
“有很多空格”
中。