如何在perl编程中快速读取.gz文件中的数据_Perl

如何在perl编程中快速读取.gz文件中的数据

perl

如何在perl编程中快速读取.gz文件中的数据,perl,Perl,我正在读一个大约3GB的.gz文件。我正在使用PERL程序生成一个模式。我可以grep的模式，但它是需要太长的时间来处理。有人能帮我快速处理吗 use strict ; use warnings ; use Compress::Zlib; my $file = "test.gz"; my $gz = gzopen ($file, "rb") or die "Error Reading $file: $gzerrno"; while ($gz->gzreadline($_) > 0 )

我正在读一个大约3GB的.gz文件。我正在使用PERL程序生成一个模式。我可以grep的模式，但它是需要太长的时间来处理。有人能帮我快速处理吗

use strict ;
use warnings ;
use Compress::Zlib;
my $file = "test.gz";
my $gz = gzopen ($file, "rb") or die "Error Reading $file: $gzerrno";
while ($gz->gzreadline($_) > 0 ) {
if (/pattern/) {
print "$_----->PASS\n";
}
}
die "Error reading $file: $gzerrno" if $gzerrno != Z_STREAM_END;
$gz ->gzclose();

你能帮我做些什么吗

提前感谢

我已经编写了一个脚本，它记录了各种方法读取gz文件所需的时间。我也发现

Compress:：Zlib

非常慢

use strict;
use warnings;
use autodie ':all';
use Compress::Zlib;
use Time::HiRes 'time';

my $file = '/home/con/Documents/snp150.txt.gz';
# time zcat execution
my $start_zcat = Time::HiRes::time();
open my $zcat, "zcat $file |";
while (<$zcat>) {
  #      print $_;
}
close $zcat;
my $end_zcat = Time::HiRes::time();
# time Compress::Zlib reading
my $start_zlib = Time::HiRes::time();
my $gz = gzopen($file, 'r') or die "Error reading $file: $gzerrno";
while ($gz->gzreadline($_) > 0) {#http://blog-en.openalfa.com/how-to-read-and-write-compressed-files-in-perl
#       print "$_";# Process the line read in $_
}
$gz->gzclose();
my $end_zlib = Time::HiRes::time();

printf("zlib took %lf seconds.\n", $end_zlib - $start_zlib);
printf("zcat took %lf seconds.\n", $end_zcat - $start_zcat);

使用严格；
使用警告；
使用autodie'：all'；
使用Compress:：Zlib；
使用时间：：租用“时间”；
my$file='/home/con/Documents/snp150.txt.gz'；
#执行时间
我的$startzcat=Time:：HiRes:：Time（）；
打开我的$zcat，“zcat$file |”；
而（）{
#打印美元；
}
关闭$zcat；
我的$endzcat=Time:：HiRes:：Time（）；
#时间压缩：：Zlib读取
我的$start_zlib=Time:：HiRes:：Time（）；
my$gz=gzopen（$file，'r'）或die“读取$file时出错：$gzerno”；
而（$gz->gzreadline（$)>0）{#http://blog-en.openalfa.com/how-to-read-and-write-compressed-files-in-perl
#打印“$”；#处理读入的行$_
}
$gz->gzclose（）；
我的$end_zlib=Time:：HiRes:：Time（）；
printf（“zlib花费了%lf秒。\n”，$end_zlib-$start_zlib）；
printf（“zcat花费了%lf秒。\n”，$end\u zcat-$start\u zcat）；

使用这个脚本，我发现通过

zcat

阅读要比

Compress:：Zlib

快7倍（！），这当然会因计算机和文件而异。

您是否尝试过使用

zgrep

？请参阅相关答案。抱歉，Robby，我正在创建一个脚本，用于自动处理用于数据操作的少量文件。处理.gz文件的任务之一。所以我不能在这里使用grep命令。不，但至少你可以比较速度，基于此，确定您的perl文件读取是否可以进一步优化。另一个想法：模式本身也可以简化为在正则表达式引擎上更温和一点。一旦找到模式，您是否需要继续读取？在单核计算机上运行与在多核计算机上运行有什么区别吗？@Shawn我不知道