Performance 从数据文件句柄读取_Performance_Perl

Performance 从数据文件句柄读取

performance perl

Performance 从数据文件句柄读取,performance,perl,Performance,Perl,我的perl模块需要使用一个大约309000行长的查找表当前，将表加载到数组中的部分（大致）如下所示： use strict; use warnings; # load all the data from below my @ref_data; while (<DATA>) { push @ref_data, $_ } close DATA; __DATA__ 00004f15ed000023f2 00005015fc000623ec 000051160a000b23

我的perl模块需要使用一个大约309000行长的查找表

当前，将表加载到数组中的部分（大致）如下所示：

use strict;
use warnings;

# load all the data from below
my @ref_data; 
while (<DATA>) {
    push @ref_data, $_
}
close DATA;

__DATA__
00004f15ed000023f2
00005015fc000623ec
000051160a000b23e7
000052161d001523e2
0000531631002223de
0000541645002e23da
... etc ...

这两种方法都比只在所有数据行中放置

qw（…）

或编辑源代码以一次加载一个数组项要快得多

我可以猜测，大约100毫秒的大部分时间是磁盘时间，但是

while

循环是初始化数组的最快方法，还是使用其他Perl构造可以更快一些？

DATA

是嵌入到脚本中的特殊文件句柄。它与通常从文件中读取数据没有多大区别。不过，我认为使用300k行的内联数据可能不是一种理想的方法

您看过可存储的

了吗？您可能会发现，您可以存储
和检索
您的数据结构-您可能需要保留您的文件以供初始加载
或者，您是否真的需要将所有参考数据保存在内存中？直接访问内存很快，但如果不进行顺序键处理，您可能会发现按需查找数据库样式更好
如果不能做到这一点，您可能还会发现，有一个单独的异步处理文件的“加载程序”线程也可能是一个选项，因为虽然加载到内存中仍然需要时间，但在加载数据时，您的程序可以继续运行
但从根本上说，您正在从磁盘连续读取大量数据。它将始终受到磁盘速度的限制。更快的磁盘意味着更快的负载。解决方法是从磁盘移动到内存（如数据库）
 数据
是嵌入到脚本中的特殊文件句柄。它与通常从文件中读取数据没有多大区别。不过，我认为使用300k行的内联数据可能不是一种理想的方法
您看过可存储的了吗？您可能会发现，您可以存储
和检索
您的数据结构-您可能需要保留您的文件以供初始加载
或者，您是否真的需要将所有参考数据保存在内存中？直接访问内存很快，但如果不进行顺序键处理，您可能会发现按需查找数据库样式更好
如果不能做到这一点，您可能还会发现，有一个单独的异步处理文件的“加载程序”线程也可能是一个选项，因为虽然加载到内存中仍然需要时间，但在加载数据时，您的程序可以继续运行
但从根本上说，您正在从磁盘连续读取大量数据。它将始终受到磁盘速度的限制。更快的磁盘意味着更快的负载。解决方法是从磁盘移动到内存（如数据库）
 我对三种方法做了一些基准测试。我使用了一个外部文件进行读取（而不是\uuuuuu DATA\uuuuu
）。该文件包含300万行您正在使用的确切数据
方法是对文件进行slurping，逐行读取文件，并使用上文提到的Storable
。每个任务运行100次。以下是结果，结果表明，一旦使用Storable
存储，它的速度远远快于其他两个（比行快118%，比slurping快45%）：
以下是我使用的代码：
#!/usr/bin/perl

use warnings;
use strict;

use Benchmark qw(cmpthese timethese);
use Storable;

my $file = 'in.txt';

storeit();

cmpthese(100, {
    'by_line' => \&by_line,
    'by_slurp' => \&by_slurp,
    'by_store' => \&by_store,
});

sub by_line {

    open my $fh, '<', $file
      or die "Can't open $file: $!";

    my @ref_data;

    for my $line (<$fh>){
        push @ref_data, $line;
    }
}

sub by_slurp {

    open my $fh, '<', $file
      or die "Can't open $file: $!";

    my @ref_data = <$fh>;
}  

sub storeit {
    open my $fh, '<', $file
      or die "Can't open $file: $!";

    my @ref_data = <$fh>;
    close $fh;

    store \@ref_data, 'store.dat';
}

sub by_store{

    my @ref_data = retrieve('store.dat');
}

#/usr/bin/perl
使用警告；
严格使用；
使用基准qw（cmp这些时间）；
使用可储存材料；
my$file='in.txt'；
storeit（）；
cmpthese（100{
“按行”=>\&按行，
'by_slurp'=>\&by_slurp，
“按存储”=>\&按存储，
});
分包线{
打开我的$fh，我对三种方法进行了一些基准测试。我使用了一个外部文件进行读取（而不是\uuuuuu DATA\uuuuuuu
）。该文件由300万行您使用的确切数据组成
这些方法是对文件进行slurp，逐行读取文件，并使用上文提到的Storable
。每个任务运行100次。下面的结果表明，使用Storable
存储后，它的速度远远快于其他两种方法（比逐行存储快118%，比slurp存储快45%）：
以下是我使用的代码：
#!/usr/bin/perl

use warnings;
use strict;

use Benchmark qw(cmpthese timethese);
use Storable;

my $file = 'in.txt';

storeit();

cmpthese(100, {
    'by_line' => \&by_line,
    'by_slurp' => \&by_slurp,
    'by_store' => \&by_store,
});

sub by_line {

    open my $fh, '<', $file
      or die "Can't open $file: $!";

    my @ref_data;

    for my $line (<$fh>){
        push @ref_data, $line;
    }
}

sub by_slurp {

    open my $fh, '<', $file
      or die "Can't open $file: $!";

    my @ref_data = <$fh>;
}  

sub storeit {
    open my $fh, '<', $file
      or die "Can't open $file: $!";

    my @ref_data = <$fh>;
    close $fh;

    store \@ref_data, 'store.dat';
}

sub by_store{

    my @ref_data = retrieve('store.dat');
}

！/usr/bin/perl
使用警告；
严格使用；
使用基准qw（cmp这些时间）；
使用可储存材料；
my$file='in.txt'；
storeit（）；
cmpthese（100{
“按行”=>\&按行，
'by_slurp'=>\&by_slurp，
“按存储”=>\&按存储，
});
分包线{
打开我的$fh，@Borodin我会更新问题对不起，我的算术错了-那是60毫秒！你真的需要它来运行超过100毫秒吗？值得一提的是，我复制了你的结果（perl 5.16，Windows 7）。my@values=；
平均192ms；while循环平均119ms。有趣的是，my$values=[]
花费的时间更长：240ms。我认为它可能更快（不需要复制数组），但没有。@Borodin我会更新问题对不起，我的算术错了-那是60ms！你真的需要它运行速度超过100ms吗？值得一提的是，我复制了你的结果（perl 5.16，Windows 7）.my@values=；
平均192ms；while循环平均119ms。有趣的是，my$values=[]；花费的时间更长：240ms。我认为它可能更快（无需复制数组），但不是。
#!/usr/bin/perl

use warnings;
use strict;

use Benchmark qw(cmpthese timethese);
use Storable;

my $file = 'in.txt';

storeit();

cmpthese(100, {
    'by_line' => \&by_line,
    'by_slurp' => \&by_slurp,
    'by_store' => \&by_store,
});

sub by_line {

    open my $fh, '<', $file
      or die "Can't open $file: $!";

    my @ref_data;

    for my $line (<$fh>){
        push @ref_data, $line;
    }
}

sub by_slurp {

    open my $fh, '<', $file
      or die "Can't open $file: $!";

    my @ref_data = <$fh>;
}  

sub storeit {
    open my $fh, '<', $file
      or die "Can't open $file: $!";

    my @ref_data = <$fh>;
    close $fh;

    store \@ref_data, 'store.dat';
}

sub by_store{

    my @ref_data = retrieve('store.dat');
}