Regex Perl6解析文件_Regex_Grammar_Raku

Regex Perl6解析文件

regex

Regex Perl6解析文件,regex,grammar,raku,Regex,Grammar,Raku,作为实践，我尝试解析一些标准文本，这些文本是shell命令的输出 pool: thisPool state: ONLINE status: Some supported features are not enabled on the pool. The pool can still be used, but some features are unavailable. action: Enable all features using 'zpool upgrade'. Once t

作为实践，我尝试解析一些标准文本，这些文本是shell命令的输出

  pool: thisPool
 state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
    still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
    the pool may no longer be accessible by software that does not support
    the features. See zpool-features(5) for details.
  scan: none requested
config:

    NAME                                                STATE     READ WRITE CKSUM
    homePool                                            ONLINE       0     0     0
      mirror-0                                          ONLINE       0     0     0
        ata-WDC_WD5000AZLX-00CL5A0_WD-WCC3F7NUE93C      ONLINE       0     0     0
        ata-WDC_WD5000AZLX-00CL5A0_WD-WCC3F7RE2A4F      ONLINE       0     0     0
    cache
      ata-KINGSTON_SV300S37A60G_50026B7261025D7E-part3  ONLINE       0     0     0

errors: No known data errors

我想使用Perl6语法，我想在一个单独的令牌或正则表达式中捕获每个字段。因此，我制定了以下语法：

grammar zpool {
        regex TOP { \s+ [ <keyword> <collection> ]+ }
        token keyword { "pool: " | "state: " | "status: " | "action: " | "scan: " | "config: " | "errors: " }
        regex collection { [<:!keyword>]*  }
}

我不知道如何让它在找到一个关键字时停止吃掉字符，然后将其视为另一个关键字。

问题1

您编写的是

，而不是

。那不是你想要的。您需要删除

：

P6正则表达式中的

语法，在本例中是属性

：foo

，这反过来意味着

：foo（True）

和

将单个字符与Unicode属性

：关键字（False）

匹配

但是没有Unicode属性

：关键字
因此，否定断言将始终为真，并且每次都将始终匹配输入的单个字符
因此，正如你所知，模式只是通过文本的其余部分咀嚼
问题2
一旦您解决了问题1，就会出现第二个问题
将单个字符与Unicode属性：关键字（False）
匹配。每次匹配时，它都会自动咀嚼一些输入（单个字符）
相反，
不使用任何匹配的输入。您必须确保使用它的模式咀嚼输入

在解决了这两个问题之后，您将获得您期望的输出。（您将看到的下一个问题是，config
关键字不起作用，因为输入文件示例中config:
中的：
后面没有空格。）

因此，通过一些清理：
my @keywords = <pool state status action scan config errors> ;

say grammar zpool {
    token TOP        { \s+ [ <keyword> <collection> ]* }
    token keyword    { @keywords ': ' }
    token collection { [ <!keyword> . ]* }
}

my@keywords=；
说语法zpool{
令牌TOP{\s+[]*}
令牌关键字{@keywords'：'}
令牌集合{[.]*}
}

我已将所有模式切换到token
声明。通常，除非您知道需要其他东西，否则请始终使用标记。（regex
启用回溯。如果不小心，这会大大降低速度。rule
使规则中的空格变得重要。）
我已将关键字提取到数组中@keywords
表示@keywords[0]|@keywords[1]|……

在最后一个模式中，我在
之后添加了一个
（以消耗字符的输入值，避免在
不消耗任何输入的情况下出现无限循环）
如果你没有看到他们，请注意他们是你的朋友
Hth
尽管我时不时地喜欢好的语法，但通过拨打以下电话，这更容易解决：
我们可以从嵌套的匹配树中提取结果，或者使用有状态的操作对象进行欺骗：
class ZPool::Actions {
    has $!last-key;
    has %.contents;
    method key($m)   { $!last-key = $m.Str                }
    method value($m) { %!contents{ $!last-key } = $m.trim }
}

然后使用它：
my $actions = ZPool::Actions.new;
ZPool.parse($input, :$actions);
say $actions.contents.perl;

key
和keychunk
不需要回溯，因此您可以将它们从regex
更改为token

当然，使用+？
和回溯可能被视为作弊，因此您可以使用raiph提到的技巧，并在值
正则表达式中进行负面展望
my @delimiter = <pool state status action scan config errors>;
grammar ZPool {
    regex key      { @delimiter             }
    regex keychunk { ^^ \h* <key> ':'       }
    regex value    { .*?                    }
    regex chunks   { <keychunk> \h* <value> }
    regex TOP      { <chunks>+              }
}

class ZPool::Actions {
    has $!last-key;
    has %.contents;
    method key($m)   { $!last-key = $m.Str                }
    method value($m) { %!contents{ $!last-key } = $m.trim }
}

my $actions = ZPool::Actions.new;
ZPool.parse($input, :$actions);
say $actions.contents.perl;