Performance 在Perl中从文本文件读入时跳过头的最佳方法？_Performance_Perl_Input_Io_Conditional

Performance 在Perl中从文本文件读入时跳过头的最佳方法？

performance perl input io

Performance 在Perl中从文本文件读入时跳过头的最佳方法？,performance,perl,input,io,conditional,Performance,Perl,Input,Io,Conditional,我从一个用Perl描述的选项卡文件中抓取了几列。文件的第一行与其他行完全不同，因此我希望尽可能快速高效地跳过这一行这就是我目前所拥有的 my $firstLine = 1; while (<INFILE>){ if($firstLine){ $firstLine = 0; } else{ my @columns = split (/\t+/); print OUTFILE "$columns[0]\t\t$c

我从一个用Perl描述的选项卡文件中抓取了几列。文件的第一行与其他行完全不同，因此我希望尽可能快速高效地跳过这一行

这就是我目前所拥有的

my $firstLine = 1;

while (<INFILE>){
    if($firstLine){
        $firstLine = 0;
    }
    else{
        my @columns = split (/\t+/);
        print OUTFILE "$columns[0]\t\t$columns[1]\t$columns[2]\t$columns[3]\t$columns[11]\t$columns[12]\t$columns[15]\t$columns[20]\t$columns[21]\n";
    }
}

my$firstLine=1；
而（）{
如果（$firstLine）{
$firstLine=0；
}
否则{
my@columns=split（/\t+/）；
打印输出文件“$columns[0]\t\t$columns[1]\t$columns[2]\t$columns[3]\t$columns[11]\t$columns[12]\t$columns[15]\t$columns[20]\t$columns[21]\n”；
}
}

有没有更好的方法来做到这一点，也许没有$firstLine？或者有没有办法直接从第2行开始读取内嵌

提前谢谢

您可以第一次为其分配一个虚拟变量：

#!/usr/bin/perl
use strict;
use warnings;

open my $fh, '<','a.txt' or die $!;

my $dummy=<$fh>;   #First line is read here
while(<$fh>){
        print ;
}
close($fh);

#/usr/bin/perl
严格使用；
使用警告；
打开我的$fh，“可以读取文件句柄中的文件，然后可以使用数组或while循环来迭代行。对于while循环，@Guru为您提供了解决方案。对于阵列，它将如下所示：
#!/usr/bin/perl
use strict;
use warnings;

open (my $fh, '<','a.txt')  or die "cant open the file: $! \n";
my @array = <$fh>;

my $dummy = shift (@array);   << this is where the headers are stored.

foreach (@array)
{
   print $_."\n";
}
close ($fh);

#/usr/bin/perl
严格使用；
使用警告；
打开（我的$fh，我总是使用$。
（当前行号）来实现这一点：
#!/usr/bin/perl
use strict;
use warnings;

open my $fh, '<', 'myfile.txt' or die "$!\n";

while (<$fh>) {
    next if $. < 2; # Skip first line

    # Do stuff with subsequent lines
}

！/usr/bin/perl
严格使用；
使用警告；
打开我的$fh，“您的代码在以下形式中可能会更优雅：
my $first;
while (...) {
    $first++ or next; 

    # do whatever you want
};

但它仍然很好。@Guru的答案在CPU周期方面更好，但i/o通常比单个if消耗的CPU周期多几个数量级。
让我们获取一些数据。我对每个人的技术进行了基准测试
#!/usr/bin/env perl

sub flag_in_loop {
    my $file = shift;

    open my $fh, $file;

    my $first = 1;
    while(<$fh>) {
        if( $first ) {
            $first = 0;
        }
        else {
            my $line = $_;
        }
    }

    return;
}

sub strip_before_loop {
    my $file = shift;

    open my $fh, $file;

    my $header = <$fh>;
    while(<$fh>) {
        my $line = $_;
    }

    return;
}

sub line_number_in_loop {
    my $file = shift;

    open my $fh, $file;

    while(<$fh>) {
        next if $. < 2;

        my $line = $_;
    }

    return;
}

sub inc_in_loop {
    my $file = shift;

    open my $fh, $file;

    my $first;
    while(<$fh>) {
        $first++ or next;

        my $line = $_;
    }

    return;
}

sub slurp_to_array {
    my $file = shift;

    open my $fh, $file;

    my @array = <$fh>;
    shift @array;

    return;
}


my $Test_File = "/usr/share/dict/words";
print `wc $Test_File`;

use Benchmark;

timethese shift || -10, {
    flag_in_loop        => sub { flag_in_loop($Test_File); },
    strip_before_loop   => sub { strip_before_loop($Test_File); },
    line_number_in_loop => sub { line_number_in_loop($Test_File); },
    inc_in_loop         => sub { inc_in_loop($Test_File); },
    slurp_to_array      => sub { slurp_to_array($Test_File); },
};

我震惊地发现my@array=
的速度非常慢。考虑到perl解释器中正在进行的所有工作，我本以为这是最快的。然而，它是唯一一个分配内存来容纳所有行的解释器，这可能是性能滞后的原因
使用$。
是另一个惊喜。这可能是访问magic global的成本，也可能是进行数字比较的成本
而且，正如算法分析所预测的，将头检查代码放在循环之外是最快的。但速度不太快。可能不足以担心是否使用下两个最快的头检查代码。
我有一个类似的问题。我的解决方案如下-对于解压缩或Gzip文件：
print STDERR "\nReading input file...\n";
if ($file =~ /.gz$/) {
    open(IN, "gunzip -c $file | grep -v '##' |") or die " *** ERROR ***     Cannot open pipe to [ $file ]!\n";
    } else {
        open(IN, "cat $file | grep -v '##' |") or die " *** ERROR ***     Cannot open [ $file ]!\n";
}

我不知道基准测试，但它对我来说很好
最好的
Sander
对我来说，使用拼接似乎是最简单、最干净的方法：
open FILE, "<$ARGV[0]";
my @file = <FILE>;
splice(@file, 0, 1);

openfile”，作为旁注，数组切片和连接将消除大量重复的代码。print OUTFILE“$columns[0]\t\t”；print OUTFILE join（“\t”），@columns[1,2,3,11,12,15,20,21]）；print OUTFILE“\n”
我必须研究连接。我是Perl新手。谢谢！稍微整理一下：print OUTFILE“$columns[0]\t\t.join（“\t”，@列[1,2,3,11,12,15,20,21]）。\n；fh不应该有$，因为它是一个文件句柄。但这看起来是最有效的解决方案。谢谢！这是一个词法文件句柄；现在它确实是首选。@JimDavis那些日子已经过去了。作为一种通用技术，它的性能较低，因为您的循环现在必须在每次迭代中进行额外的检查。它也很混乱性能损失是给定的，但由于它看起来更整洁，因此可以忽略不计，值得一试。如果你觉得它在循环中杂乱无章，那一定是味道的问题。“杂乱无章”这意味着它增加了您必须了解的代码量，以了解循环中发生了什么，但它只适用于第一次迭代。我将它与Guru的最佳情况进行比较，即将代码放在循环之外，而不是OP。通过将整个文件存储在数组中，这可能会消耗大量内存。这比比从磁盘顺序读取文件更有效。15年前，“大量内存”很重要。
open FILE, "<$ARGV[0]";
my @file = <FILE>;
splice(@file, 0, 1);