Regex 求反命名正则表达式，或Raku中的字符类插值_Regex_Raku

Regex 求反命名正则表达式，或Raku中的字符类插值

regex

Regex 求反命名正则表达式，或Raku中的字符类插值,regex,raku,Regex,Raku,我正在尝试分析一个带引号的字符串。大概是这样的： say '"in quotes"' ~~ / '"' <-[ " ]> * '"'/; $。/multi-token.raku 带引号的字符串 *失败（任何）我还在学拉库，所以我可能做错了什么更新2 哦！感谢@raiph指出这一点。我忘了在上加一个量词，你可以采取几种不同的方法——哪种方法最好可能取决于你所采用的结构的其余部分但首先要观察一下您当前的解决方案，以及为什么向其

我正在尝试分析一个带引号的字符串。大概是这样的：

say '"in quotes"' ~~ / '"' <-[ " ]> * '"'/;

$。/multi-token.raku
带引号的字符串
*失败
（任何）

我还在学拉库，所以我可能做错了什么

更新2

哦！感谢@raiph指出这一点。我忘了在

上加一个量词，你可以采取几种不同的方法——哪种方法最好可能取决于你所采用的结构的其余部分
但首先要观察一下您当前的解决方案，以及为什么向其他人开放它不会以这种方式起作用。考虑字符串<代码>“值”/代码>。是否应该解析？你所布置的结构实际上是匹配的！这是因为每个<代码> <代码>令牌将匹配一个或双引号。
处理内部问题
最简单的解决方案是将内部部分设置为非贪婪通配符：
<quote> (.*?) <quote>

（不需要使用实际的标记
，您可以将其写成{\'，假设您只想再次匹配相同的引号字符
token attribute-value { <string> }

token string {
  # match <quote> and expect to end with "$<quote>"
  <quote> ~ "$<quote>"

  [
    # update match structure in $/ otherwise "$<quote>" won't work
    {}

    <!before "$<quote>"> # next character isn't the same as $<quote>

    .    # any character

  ]*     # any number of times
}

token quote { <["']> }

token属性值{}
令牌串{
#匹配并期望以“$”结尾
~ "$"
[
#更新$/中的匹配结构，否则“$”将不起作用
{}
#下一个角色与$
.#任何字符
]*#任意次数
}
token quote{他们写道：“我需要不止一种类型的quote。类似这样的语法构成了不起作用的语法：…（$）$
。请注意，他们使用了
作为开场白，而$
用于否定和关闭。后者清楚地向我传达了这样一个概念，即关闭者是开场白的捕获者，因此与开场白相同。因此我认为“您布置的结构实际上会匹配”您的值“
示例是“不公平的”。）@BradGilbert说得很对！我知道有人会有更干净的方法，我已经更新了我的answer@raiph这就是我在深夜回答时得到的lol，事实上我错过了$
首先，感谢你的详细回答，并感谢@BradGilbert的消极前瞻建议！消极前瞻解决方案是e这正是我想要的。尽管如此，我还是无法获得“多令牌”"解决方案开始工作。有关详细信息，请参阅更新。感谢对消极前瞻解决方案的详细解释！您错过了每个
结尾的*
。实际上，在更新中，您会发现现在甚至不需要
令牌。Fwiw，使用+
作为量词意味着字符串必须包含在lea中st一个字符，因此'
或'
将无法解析。@raiph，对于我的用例，不允许使用空字符串。您的最终更新显示了一个经典的折衷。反向前瞻肯定更快，但可读性较差（它不会立即跳出正在发生的事情，尽管大约15-20秒就清楚了）。multi立即跳出正在发生的事情（特别是如果你知道~做什么的话）。干得好，欢迎来到Raku！
#!/usr/bin/env raku

use Grammar::Tracer;

grammar QuotedString {
  proto token quoted_string (|) {*}
  multi token quoted_string:sym<'> { <sym> ~ <sym> <-[']>+ }
  multi token quoted_string:sym<"> { <sym> ~ <sym> <-["]>+ }
  token quote         { <["']> }
}

my $string = '"foo"';

my $quoted-string = QuotedString.parse($string, :rule<quoted_string>);
say $quoted-string;

#!/usr/bin/env raku

grammar NegativeLookahead {
  token quoted_string { <quote> $<string>=([<!quote> .]+) $<quote> }
  token quote         { <["']> }
}

grammar MultiToken {
  proto token quoted_string (|) {*}
  multi token quoted_string:sym<'> { <sym> ~ <sym> $<string>=(<-[']>+) }
  multi token quoted_string:sym<"> { <sym> ~ <sym> $<string>=(<-["]>+) }
}

use Bench;

my $string = "'foo'";

my $bench = Bench.new;
$bench.cmpthese(10000, {
  negative-lookahead =>
    sub { NegativeLookahead.parse($string, :rule<quoted_string>); },
  multi-token        =>
    sub { MultiToken.parse($string, :rule<quoted_string>); },
});

<quote> (.*?) <quote>

<quote> ~ <quote> (.*?)

[<!quote> .]*

proto token attribute_value (|) { * }
multi token attribute_value:sym<'> { <sym> ~ <sym> <-[']> }
multi token attribute_value:sym<"> { <sym> ~ <sym> <-["]> }

token attribute_value {
    $<start-quote>=<quote>      # your actual start quote
    :my $*end-quote;            # define the variable in the regex scope
    { $*end-quote = ... }       # determine the requisite end quote (e.g. ” for “)
    <attribute_value_contents>  # handle actual content
    $*end-quote                 # fancy end quote
}

token attribute_value_contents {
    # We have access to $*end-quote here, so we can use
    # either of the techniques we've described before
    # (a) using a look ahead
    [<!before $*end-quote> .]*
    # (b) being lazy (the easier)
    .*?
    # (c) using another token (described below)
    <attr_value_content_char>+
}

proto token attr_value_content_char (|) { * }
multi token attr_value_content_char:sym<escaped> { \\ $*end-quote }
multi token attr_value_content_char:sym<literal> { . <?{ $/ ne $*end-quote }> }

token attribute-value { <string> }

token string {
  # match <quote> and expect to end with "$<quote>"
  <quote> ~ "$<quote>"

  [
    # update match structure in $/ otherwise "$<quote>" won't work
    {}

    <!before "$<quote>"> # next character isn't the same as $<quote>

    .    # any character

  ]*     # any number of times
}

token quote { <["']> }