Perl中的大容量文件处理_Perl

Perl中的大容量文件处理

perl

Perl中的大容量文件处理,perl,Perl,我有一个需要花费大量时间的perl程序。有人能建议调优选项吗。要求 Perl程序在数据库检索后执行一些文件处理，并根据数据库中存在的值进行进一步处理。所以逻辑是 my $sql="select KEY,VALUE from TABLEA"; my $sth = $dbh->prepare($sql); $sth->execute; while ( my @row = $sth->fetchrow_array( ) ) { $tagdata{@

我有一个需要花费大量时间的perl程序。有人能建议调优选项吗。
要求
Perl程序在数据库检索后执行一些文件处理，并根据数据库中存在的值进行进一步处理。所以逻辑是

my $sql="select KEY,VALUE from TABLEA";    
 my $sth = $dbh->prepare($sql);
    $sth->execute;
while ( my @row = $sth->fetchrow_array( ) ) {
        $tagdata{@row[0]} = @row[1];
}

my @keysfind = grep { $tagdata{$_} eq $value } keys %tagdata;
            foreach (@keysfind)
            {

表A包含300万行。现在在perl程序中，经过如此多的文件处理后，我需要找到给定值的键。键是唯一的，但值不是唯一的。
所以，关键是找到了以下逻辑

my @keysfind = grep { $tagdata{$_} eq $value } keys %tagdata;
            foreach (@keysfind)
            {

并基于

@keysfind

完成处理。这个过程需要花费大量时间，因为它（查找密钥）是在一个循环中运行的（10万次）。
我尝试的选项是
1）使用

fetchall\u hashref

而不是

fetchrow\u数组

。虽然速度有点快，但不多。
2）不使用散列，而是将所有这些操作移动到数据库中，即基于值获取密钥，但问题是，该值获取循环运行了10万次，这意味着它将有这些数量的数据库调用，尽管查询很简单

my @keysfind = grep { $tagdata{$_} eq $value } keys %tagdata;
            foreach (@keysfind)
            {

有人能提出更好的方法来处理这个问题吗。

如果可以，让数据库来完成这项艰巨的工作：

my @keysfind = grep { $tagdata{$_} eq $value } keys %tagdata;
            foreach (@keysfind)
            {

my $sql = 'select KEY, VALUE from TABLEA where VALUE = ?';    
my $sth = $dbh->prepare($sql);
$sth->execute($value);

如果可以，请让数据库完成以下艰巨工作：

my @keysfind = grep { $tagdata{$_} eq $value } keys %tagdata;
            foreach (@keysfind)
            {

my $sql = 'select KEY, VALUE from TABLEA where VALUE = ?';    
my $sth = $dbh->prepare($sql);
$sth->execute($value);

最好的解决方案可能是委托他人查找数据库的键，如chorobas的答案所示

my @keysfind = grep { $tagdata{$_} eq $value } keys %tagdata;
            foreach (@keysfind)
            {

仅出于学术目的，这里提供了一种在不使用数据库的情况下在固定时间内查找匹配键的方法。我们只需要一个反向散列，将值映射到一个键数组：

my @keysfind = grep { $tagdata{$_} eq $value } keys %tagdata;
            foreach (@keysfind)
            {

my %tagdata;
my %reverse_tagdata;
my $sth = $dbh->prepare('select KEY,VALUE from TABLEA');
$sth->execute;
while ( my ($key, $value) = $sth->fetchrow_array ) {
    $tagdata{$key} = $value;
    push @{ $reverse_tagdata{$value} }, $key; # add key to matching values
}

...;

my $value = ...;
my @found_keys = @{ $reverse_tagdata{$value} }; # one simple hash lookup
for my $key (@found_keys) { 
  ...;
}

最好的解决方案可能是委托他人查找数据库的键，如chorobas的答案所示

my @keysfind = grep { $tagdata{$_} eq $value } keys %tagdata;
            foreach (@keysfind)
            {

仅出于学术目的，这里提供了一种在不使用数据库的情况下在固定时间内查找匹配键的方法。我们只需要一个反向散列，将值映射到一个键数组：

my @keysfind = grep { $tagdata{$_} eq $value } keys %tagdata;
            foreach (@keysfind)
            {

my %tagdata;
my %reverse_tagdata;
my $sth = $dbh->prepare('select KEY,VALUE from TABLEA');
$sth->execute;
while ( my ($key, $value) = $sth->fetchrow_array ) {
    $tagdata{$key} = $value;
    push @{ $reverse_tagdata{$value} }, $key; # add key to matching values
}

...;

my $value = ...;
my @found_keys = @{ $reverse_tagdata{$value} }; # one simple hash lookup
for my $key (@found_keys) { 
  ...;
}

我怀疑，通过循环数百万个值，您的性能可能会优于数据库查询。但是：在优化时，编写两个解决方案，然后进行基准测试。性能从来都不是显而易见的。我怀疑通过循环数百万个值，您是否能够超越数据库查询。但是：在优化时，编写两个解决方案，然后进行基准测试。性能从来都不明显。我改代码使用反向哈希而不是grep，而且速度更快。谢谢你。或者，我已经将我的业务逻辑移到了数据库中，并对这两个选项进行了性能评估。我将代码改为使用反向哈希而不是grep，而且速度更快。谢谢你。或者，我已经将业务逻辑移动到数据库，并对这两个选项进行性能评估。