在perl中开始解析带有数字的行_Perl

在perl中开始解析带有数字的行

perl

在perl中开始解析带有数字的行,perl,Perl,我有以下文件 something in this line 2 something in this line 3 something in this lin 4 something in this line 5 something in this line 6 something in this line 6 something in this line 7 value text Read Write ----------------------------------

我有以下文件

something in this line  2
something in this line  3
something in this lin 4
something in this line 5
something in this line 6
something in this line 6
something in this line 7


value   text   Read      Write
------------------------------------------------
1        1      82090    62337
2        2      27177    39042
3        3      73       5708
4        4      170      749

现在我需要解析这些文件并获取以数字开头的行。我正在使用$\ux=~m/^\d+/。但它似乎不起作用

#!/usr/bin/perl
use strict;
use warnings;

my $data = do {local $/; <INFILE>};
my $hash = ();

foreach (split(/\n/, $data)) {
    print "printing $_\n";
    if ($_ =~ m/^\d+/) {
        my @temp = split('[\s\t]+', $_);
        $hash->{$temp[0]}->{read} = $temp[2];
        $hash->{$temp[0]}->{write} = $temp[3];
    }
}
return ($hash);

#/usr/bin/perl
严格使用；
使用警告；
my$data=do{local$/；}；
我的$hash=（）；
foreach（拆分（/\n/，$data））{
打印“打印$\n”；
如果（$\ux=~m/^\d+/）{
my@temp=split（'[\s\t]+'，$）；
$hash->{$temp[0]}->{read}=$temp[2]；
$hash->{$temp[0]}->{write}=$temp[3]；
}
}
返回（$hash）；

很难说为什么它“不工作”，因为我不知道如何检查它是否工作。但这就是代码的外观

使用词法文件句柄。使用数组而不是散列（可以混合和匹配）<代码>$\uu在拆分和

/../

中自动使用。不要使用复杂的散列引用，只需分配一个匿名散列即可

my @array;
while (<$infile>) {
    if (/^[0-9]/) {
        my @data = split;
        $array[$data[0]] = { 'read'  => $data[2], 'write' => $data[3] };
    }
}

return \@data;

很难说为什么它“不起作用”，因为我不知道你如何检查它是否起作用。但这就是代码的外观

使用词法文件句柄。使用数组而不是散列（可以混合和匹配）<代码>$\uu在拆分和

/../

中自动使用。不要使用复杂的散列引用，只需分配一个匿名散列即可

my @array;
while (<$infile>) {
    if (/^[0-9]/) {
        my @data = split;
        $array[$data[0]] = { 'read'  => $data[2], 'write' => $data[3] };
    }
}

return \@data;

无需将整个文件读入内存，只需将其拆分回行并在行上迭代即可。这将使程序的内存占用与整个输入文件的大小成比例

另一方面，逐行读取文件将使占用空间与最长行的大小成比例，而最长行的大小往往要小得多。当您打算逐行处理文件时，逐行读取文件也会使代码更简单

#!/usr/bin/perl

use warnings; use strict;
use YAML;

print Dump process_file(\*DATA);

sub process_file {
    my ($fh) = @_;

    my %hash;

    while ( my $line = <$fh> ) {
        next unless $line =~ /^[0-9]/;
        my ($val, undef, $read, $write) = split ' ', $line;
        @{ $hash{ $val } }{qw( read write )} = ($read, $write);
    }

    return \%hash;
}

__DATA__
something in this line  2
something in this line  3
something in this lin 4
something in this line 5
something in this line 6
something in this line 6
something in this line 7


value   text   Read      Write
------------------------------------------------
1        1      82090    62337
2        2      27177    39042
3        3      73       5708
4        4      170      749

#/usr/bin/perl
使用警告；严格使用；
使用YAML；
打印转储进程\u文件（\*数据）；
子进程文件{
我的（$fh）=@；
我的%hash；
while（我的$line=）{
下一步除非$line=~/^[0-9]/；
my（$val，unde，$read，$write）=拆分“”，$line；
@{$hash{$val}{qw（读写）}=（$read，$write）；
}
返回\%hash；
}
__资料__
这一行的东西2
第三行的东西
这里面有些东西
这行的东西5
这行的东西
这行的东西
第7行的一些东西
值文本读写
------------------------------------------------
1        1      82090    62337
2        2      27177    39042
3        3      73       5708
4        4      170      749

无需将整个文件读入内存，只需将其拆分回行并在行上迭代即可。这将使程序的内存占用与整个输入文件的大小成比例

#!/usr/bin/perl

use warnings; use strict;
use YAML;

print Dump process_file(\*DATA);

sub process_file {
    my ($fh) = @_;

    my %hash;

    while ( my $line = <$fh> ) {
        next unless $line =~ /^[0-9]/;
        my ($val, undef, $read, $write) = split ' ', $line;
        @{ $hash{ $val } }{qw( read write )} = ($read, $write);
    }

    return \%hash;
}

__DATA__
something in this line  2
something in this line  3
something in this lin 4
something in this line 5
something in this line 6
something in this line 6
something in this line 7


value   text   Read      Write
------------------------------------------------
1        1      82090    62337
2        2      27177    39042
3        3      73       5708
4        4      170      749

#/usr/bin/perl
使用警告；严格使用；
使用YAML；
打印转储进程\u文件（\*数据）；
子进程文件{
我的（$fh）=@；
我的%hash；
while（我的$line=）{
下一步除非$line=~/^[0-9]/；
my（$val，unde，$read，$write）=拆分“”，$line；
@{$hash{$val}{qw（读写）}=（$read，$write）；
}
返回\%hash；
}
__资料__
这一行的东西2
第三行的东西
这里面有些东西
这行的东西5
这行的东西
这行的东西
第7行的一些东西
值文本读写
------------------------------------------------
1        1      82090    62337
2        2      27177    39042
3        3      73       5708
4        4      170      749

/[\s\t]/

是冗余的

\s

包括

\t

作为“所有空白字符”的一部分。@Chris所有内容都是冗余的，因为

split

splits

$\ucode>在没有提供参数的情况下对（多个）空白进行分割。@TLP-True。我只是瞥了一眼代码，这恰好让我觉得很突出；它做什么？/[\s\t]/
是多余的\s
包括\t
作为“所有空白字符”的一部分。@Chris所有内容都是冗余的，因为split
splits$\ucode>在没有提供参数的情况下对（多个）空白进行分割。@TLP-True。我只是瞥了一眼代码，这恰好让我觉得很突出；它的作用是什么？不要重复在文件中发出咕噜声的操作错误，使用while
循环。另外，默认情况下，\d
与[0-9]
不匹配。它匹配任何Unicode数字字符。所以除非你想匹配① （带圆圈的数字1），您应该使用[0-9]
或者，如果您有5.14或更高版本的Perl，您可以说/^\d/a
.Chas。欧文斯：不一定是个错误；可以很容易地简化更复杂的代码，其中一些完全不同的代码行在内存中reason@Chas1）说得好，粗心的错误。2） 以前从没听说过。perldoc perlre说\d[3]匹配一个十进制数字字符
。它在哪里表明了你的主张？最新版本的非常清楚地说明了这一点：“类似地，世界上某个地方的所有十进制数字字符都将匹配\d；这是数百个，而不是10个，可能匹配。这些数字中的一些看起来像10个ASCII数字中的一些，但表示不同的数字，所以人类很容易认为一个数字是一个与实际不同的数量。“新的文档也记录了它，旧的文档也记录了它，但它并不突出。@Chas很高兴知道。然而，我不得不怀疑这是一个多大的问题。到目前为止，在我使用它的几年中，我从未遇到过任何问题。不要重复文件中发出咕噜声的错误，使用while
循环。另外，默认情况下，\d
与[0-9]
不匹配。信息技术