Parsing 是否有人有一个有效的R3函数，模仿R2中find/any的行为？_Parsing_Rebol_Rebol3

Parsing 是否有人有一个有效的R3函数，模仿R2中find/any的行为？

parsing

Parsing 是否有人有一个有效的R3函数，模仿R2中find/any的行为？,parsing,rebol,rebol3,Parsing,Rebol,Rebol3,Rebol2对FIND函数有一个/ANY改进，可以执行通配符搜索： >> find/any "here is a string" "s?r" == "string" 我在需要良好执行的紧密循环中广泛使用它。但在Rebol3中删除了细化在Rebol3中，最有效的方法是什么？（我猜是某种parse解决方案。）这里是处理“*”案例的一个尝试：用法如下： >> like "abcde" "b*d" == "bcde" 我把你的问题修改为“清晰”，改为“已删除”。这听起来像

Rebol2对FIND函数有一个/ANY改进，可以执行通配符搜索：

>> find/any "here is a string" "s?r"
== "string"

我在需要良好执行的紧密循环中广泛使用它。但在Rebol3中删除了细化

在Rebol3中，最有效的方法是什么？（我猜是某种

parse

解决方案。）

这里是处理“*”案例的一个尝试：

用法如下：

>> like "abcde" "b*d"
== "bcde"

我把你的问题修改为“清晰”，改为“已删除”。这听起来像是一个深思熟虑的决定。但事实证明，它可能只是没有得到实施

但是如果有人问我，我认为它不应该放在盒子里……不仅仅是因为“所有”这个词的用法很糟糕。原因如下：

您正在寻找字符串中的模式……因此，如果您被限制使用字符串来指定该模式，您将遇到“元”问题。假设我想提取单词

*Rebol*

或

？Red？

，现在必须要逃走，事情又变得一团糟了。返回正则表达式：-/

所以你真正想要的不是一根绳子！像

s？r

但是一个块的图案！类似于

[“s”？“r”]

的模式。这将允许像

[“？”？“？”]

或

[{？}{？}]

这样的构造。这比重新使用其他语言使用的字符串黑客技术要好

这就是PARSE所做的，尽管它的声明方式稍微少一些。它也使用文字而不是符号，就像Rebol喜欢做的那样

[{？}skip{？}]

是一个匹配规则，

skip

是一条指令，它将解析位置移动到问号之间解析序列的任何单个元素。如果将块解析为输入，它也可以这样做，并匹配

[{？}12-Dec-2012{？}]

我不完全知道/ALL的行为会或应该是什么，比如“ab？？cd e？*f”。。。如果它提供了备用模式逻辑或什么。我假设Rebol2实现很简单？很可能它只匹配一种模式

要设置基线，这里有一个针对

sr

intent的可能蹩脚的解析解决方案：

>> parse "here is a string" [
       some [                ; match rule repeatedly
           to "s"            ; advance to *before* "s"
           pos:              ; save position as potential match
           skip              ; now skip the "s"
           [                 ;     [sub-rule]
               skip          ; ignore any single character (the "?")
               "r"           ; match the "r", and if we do...
               return pos    ; return the position we saved
           |                 ;     | (otherwise)
               none          ; no-op, keep trying to match
           ]
       ]
       fail                  ; have PARSE return NONE
   ]
== "string"

如果您希望它是

s*r

，您可以将

跳过“r”返回pos

更改为

到“r”返回pos

在效率方面，我要提到的是，字符与字符的匹配速度确实比字符串快。因此，

到#“s”

和

#“r”到end

通常在解析字符串时会在速度上产生可测量的差异。除此之外，我相信其他人可以做得更好

规则肯定比

“s？r”

长。但发表评论的时间并不长：

[some [to #"s" pos: skip [skip #"r" return pos | none]] fail]

（注意：它确实泄漏了pos:。在解析、实现或计划中有使用吗？）

然而，它的一个优点是，它在所有决策时刻都提供了钩子点，并且没有天真的字符串解决方案所具有的逃避缺陷。（我很想发表我平常的演讲。）

但是，如果您不想直接在PARSE中编码，那么真正的答案似乎是某种类型的解析编译器。这可能是Rebol对glob的最佳解释，因为您可以一次性：

 >> parse "here is a string" glob "s?r"
 == "string"

或者，如果要经常进行匹配，请缓存已编译的表达式。另外，让我们想象一下，我们的块形式使用单词来表示识字：

 s?r-rule: glob ["s" one "r"]

 pos-1: parse "here is a string" s?r-rule
 pos-2: parse "reuse compiled RegEx string" s?r-rule

对于regex来说，看到这样一个编译器可能也很有趣。它们不仅可以接受字符串输入，还可以接受块输入，因此

“s.r”

和

[“s”。.r”]

都是合法的…如果使用块形式，则不需要转义，可以编写

[“.”]

来匹配

。“

相当有趣的事情是可能的。鉴于在正则表达式中：

(abc|def)=\g{1}
matches abc=abc or def=def
but not abc=def or def=abc

Rebol可以修改为字符串形式，也可以编译为解析规则，格式如下：

regex [("abc" | "def") "=" (1)]

然后你得到一个方言变体，不需要逃逸。设计和编写这样的编译器留给读者作为练习。：-）

我将其分为两个函数：一个用于创建与给定搜索值匹配的规则，另一个用于执行搜索。将二者分离，可以重用生成的相同解析块，其中一个搜索值应用于多个迭代：

expand-wildcards: use [literal][
    literal: complement charset "*?"

    func [
        {Creates a PARSE rule matching VALUE expanding * (any characters) and ? (any one character)}
        value [any-string!] "Value to expand"
        /local part
    ][
        collect [
            parse value [
                ; empty search string FAIL
                end (keep [return (none)])
                |

                ; only wildcard return HEAD
                some #"*" end (keep [to end])
                |

                ; everything else...
                some [
                    ; single char matches
                    #"?" (keep 'skip)
                    |

                    ; textual match
                    copy part some literal (keep part)
                    |

                    ; indicates the use of THRU for the next string
                    some #"*"

                    ; but first we're going to match single chars
                    any [#"?" (keep 'skip)]

                    ; it's optional in case there's a "*?*" sequence
                    ; in which case, we're going to ignore the first "*"
                    opt [
                        copy part some literal (
                            keep 'thru keep part
                        )
                    ]
                ]
            ]
        ]
    ]
]

like: func [
    {Finds a value in a series and returns the series at the start of it.}
    series [any-string!] "Series to search"
    value [any-string! block!] "Value to find"
    /local skips result
][
    ; shortens the search a little where the search starts with a regular char
    skips: switch/default first value [
        #[none] #"*" #"?" ['skip]
    ][
        reduce ['skip 'to first value]
    ]

    any [
        block? value
        value: expand-wildcards value
    ]

    parse series [
        some [
            ; we have our match
            result: value

            ; and return it
            return (result)
            |

            ; step through the string until we get a match
            skips
        ]

        ; at the end of the string, no matches
        fail
    ]
]

拆分函数还为优化两个不同的关注点提供了基础：找到起始点和匹配值

我使用了PARSE，因为即使*？
似乎是简单的规则，但没有什么比PARSE更能表达和快速有效地实现这样的搜索

也许可以按照@ HooTyrFoobe考虑一个方言而不是带通配符的字符串，确实到ReGEX被一个编译到解析的方言所取代的地方，但可能超出了问题的范围。与效率和解析相关的

可能提供一些一般的见解。我猜你已经有了解决办法。您可以回答自己的问题，其他用户也可以投票支持。仅供参考：通配符不包括在FIND for或的词典定义中，而是包含在中。

expand-wildcards: use [literal][
    literal: complement charset "*?"

    func [
        {Creates a PARSE rule matching VALUE expanding * (any characters) and ? (any one character)}
        value [any-string!] "Value to expand"
        /local part
    ][
        collect [
            parse value [
                ; empty search string FAIL
                end (keep [return (none)])
                |

                ; only wildcard return HEAD
                some #"*" end (keep [to end])
                |

                ; everything else...
                some [
                    ; single char matches
                    #"?" (keep 'skip)
                    |

                    ; textual match
                    copy part some literal (keep part)
                    |

                    ; indicates the use of THRU for the next string
                    some #"*"

                    ; but first we're going to match single chars
                    any [#"?" (keep 'skip)]

                    ; it's optional in case there's a "*?*" sequence
                    ; in which case, we're going to ignore the first "*"
                    opt [
                        copy part some literal (
                            keep 'thru keep part
                        )
                    ]
                ]
            ]
        ]
    ]
]

like: func [
    {Finds a value in a series and returns the series at the start of it.}
    series [any-string!] "Series to search"
    value [any-string! block!] "Value to find"
    /local skips result
][
    ; shortens the search a little where the search starts with a regular char
    skips: switch/default first value [
        #[none] #"*" #"?" ['skip]
    ][
        reduce ['skip 'to first value]
    ]

    any [
        block? value
        value: expand-wildcards value
    ]

    parse series [
        some [
            ; we have our match
            result: value

            ; and return it
            return (result)
            |

            ; step through the string until we get a match
            skips
        ]

        ; at the end of the string, no matches
        fail
    ]
]