Parsing 如何使用Perl 6从嘈杂文件的中间提取一些数据？_Parsing_Text_Raku_Flip Flop

Parsing 如何使用Perl 6从嘈杂文件的中间提取一些数据？

parsing text

Parsing 如何使用Perl 6从嘈杂文件的中间提取一些数据？,parsing,text,raku,flip-flop,Parsing,Text,Raku,Flip Flop,我想使用惯用的Perl 6来实现这一点我在一个嘈杂的输出文件中发现了一块奇妙的连续数据块我只想打印标题行，以Cluster Unique开头，然后打印所有行，直到（但不包括）第一次出现空行。以下是该文件的外观： </path/to/projects/projectname/ParameterSweep/1000.1.7.dir> was used as the working directory. .... Cluster Unique Sequences Reads

我想使用惯用的Perl 6来实现这一点

我在一个嘈杂的输出文件中发现了一块奇妙的连续数据块

我只想打印标题行，以

Cluster Unique

开头，然后打印所有行，直到（但不包括）第一次出现空行。以下是该文件的外观：

</path/to/projects/projectname/ParameterSweep/1000.1.7.dir> was used as the working directory.
....

Cluster Unique Sequences    Reads   RPM
1   31  3539    3539
2   25  2797    2797
3   17  1679    1679
4   21  1636    1636
5   14  1568    1568
6   13  1548    1548
7   7   1439    1439

Input file: "../../filename.count.fa"
...

单行版本用英语打印输入文件中的每一行，从包含短语

Cluster Unique

的once开始，到下一个空行之前结束

带注释的代码相同

扩展版

```
lines（）
```
类似于perl5中的
。命令行上列出的每个文件中的每一行一次读取一个。由于这是在一个
```
for
```
循环中，因此每一行都放在默认变量
```
$\uu
```
中
```
say
```
与print类似，只是它还附加了一个换行符。当以
开头写入时，它直接作用于默认变量
```
$\uuuu
```

$是默认变量，在本例中包含文件中的一行


~~
是将$\u
与正则表达式进行比较的匹配运算符
/
在两个正斜杠之间创建一个正则表达式
\s+
匹配一个或多个空格
ff
是最新版本。只要左边的表达式为假，它就是假的。当其左侧的表达式计算为true时，它变为true。当其右侧的表达式变为真时，它将变为假，并且不再被计算为真。在这种情况下，如果我们使用^ff^
而不是ff^
，则输出中将不包括标题
当^
出现在ff
之前（或之后）时，它会修改ff
，使其左侧（或右侧）的表达式变为真的迭代也是假的
/^\*$/匹配一个空行

^
匹配字符串的开头
\s*
匹配零个或多个空格
$
匹配字符串的结尾


顺便说一下，Perl 5中的触发器运算符在标量上下文中是。
（在列表上下文中是范围运算符）。当然，它的特性没有Perl6那么丰富
我想使用惯用的Perl 6来实现这一点
在Perl中，在文件中定位块的惯用方法是以段落模式读取文件，然后在找到感兴趣的块时停止读取文件。如果您正在读取一个10GB文件，并且该块位于该文件的顶部，则继续读取该文件的其余部分效率低下——更不用说对文件中的每一行执行If测试了
在Perl 6中，您可以这样一次读取一个段落：
my $fname = 'data.txt';

my $infile = open(
    $fname, 
    nl => "\n\n",   #Set what perl considers the end of a line.
);  #Removed die() per Brad Gilbert's comment. 

for $infile.lines() -> $para {  
    if $para ~~ /^ 'Cluster Unique'/ {
        say $para.chomp;
        last;   #Quit reading the file.
    }
}

$infile.close;

#    ^                   Match start of string.
#   'Cluster Unique'     By default, whitespace is insignificant in a perl6 regex. Quotes are one way to make whitespace significant.   

但是，在perl6 rakudo/moarVM
中，open（）
函数无法正确读取nl
参数，因此当前无法设置段落模式
此外，有些人认为某些习语是不好的习惯用法，例如：
后缀if语句，例如，如果$y==0，就说“hello”

依赖于代码中的隐式$\uu
变量，例如.say

因此，根据您所处的环境，这在Perl中被认为是一种不好的做法。
您的一行程序使用裸say
而不是。say
；您还可以通过将其编写为来摆脱更多的参数。对于行，可以使用if/Cluster\s+Unique/ff^/^\s*$/表示感谢！你能把它作为一个答案吗？这只是对你的答案的一个小小的改进，所以我认为它不应该有自己的答案。你不需要Perl 6中的或死，而且它毫无意义，因为它永远不会运行。@BradGilbert，我在做了一些研究并查看了一些规范之后补充说，但是现在我读到autodie是默认的，这很好。“还在打网球吗？”Christopher Bottoms，我不认为把一个10GB的文件拖到内存中是一项工作，所以我删除了你的编辑。有更好的方法来定位文件中的块。
.say if /Cluster \s+ Unique/ ff^ /^\s*$/ for lines;

.say                    # print the default variable $_
if                      # do the previous action (.say) "if" the following term is true
/Cluster \s+ Unique/    # Match $_ if it contains "Cluster Unique"
ff^                     # Flip-flop operator: true until preceding term becomes true
                        #                     false once the term after it becomes true
/^\s*$/                 # Match $_ if it contains an empty line
for                     # Create a loop placing each element of the following list into $_
lines                   # Create a list of all of the lines in the file
;                       # End of statement

for lines() {
    .say if (
        $_ ~~ /Cluster \s+ Unique/  ff^  $_ ~~ /^\s*$/
    )
}

my $fname = 'data.txt';

my $infile = open(
    $fname, 
    nl => "\n\n",   #Set what perl considers the end of a line.
);  #Removed die() per Brad Gilbert's comment. 

for $infile.lines() -> $para {  
    if $para ~~ /^ 'Cluster Unique'/ {
        say $para.chomp;
        last;   #Quit reading the file.
    }
}

$infile.close;

#    ^                   Match start of string.
#   'Cluster Unique'     By default, whitespace is insignificant in a perl6 regex. Quotes are one way to make whitespace significant.