Perl 将值推入数组哈希时出错_Perl

Perl 将值推入数组哈希时出错

perl

Perl 将值推入数组哈希时出错,perl,Perl,我正在分析来自psiblast的输出报告。我使用COG比对并在基因数据库中搜索匹配（同源物）。我想做的一件事是找出哪些基因与多个COG匹配。我的部分脚本如下我在创建一个数组时遇到了特别的问题，该数组包含分配给多个COG的基因的所有COG 我得到以下错误“不能使用字符串（“COG0003”）作为数组引用，而在parse_POG_reports.pl第26行，第67行使用“strict refs”。” 我已经看过了其他关于将元素放入数组散列的帖子。但我认为当一个基因与同一个COG有2个匹配项，并且

我正在分析来自psiblast的输出报告。我使用COG比对并在基因数据库中搜索匹配（同源物）。我想做的一件事是找出哪些基因与多个COG匹配。我的部分脚本如下

我在创建一个数组时遇到了特别的问题，该数组包含分配给多个COG的基因的所有COG

我得到以下错误“不能使用字符串（“COG0003”）作为数组引用，而在parse_POG_reports.pl第26行，第67行使用“strict refs”。”

我已经看过了其他关于将元素放入数组散列的帖子。但我认为当一个基因与同一个COG有2个匹配项，并且它试图将同一COG推入数组（即样本输入的最后2行）时，可能会发生错误。这有意义吗？如果是，我如何避免这个问题

use strict;
use warnings;

my %maxBits;my %COGhit_count;
my $Hohits={};my %COGhits;

my $COG_psi_report=$ARGV[0];
open (IN, $COG_psi_report) or die "cannot open $COG_psi_report\n";
while (my $line=<IN>){
    next if ($line =~/^#/);
    chomp $line;
    my @columns = split(/\t/,$line);
    my $bits=$columns[11];
    my $COG=$columns[0];
    my $hit=$columns[1];
    my $Eval=$columns[10];
    next if ($Eval > 0.00001); # threshold for significant hits set by DK
    $COGhit_count{$hit}++; # count how many COGs each gene is homologous to
    $COGhits{$hit}=$COG;
    if ($COGhit_count{$hit}>1) {
            push @{$COGhits{$hit}}, $COG; #
    }
    ## for those that there are multiple hits we need to select top hit ##
    if (!exists $maxBits{$hit}){
            $maxBits{$hit}=$bits;
    }
    elsif (exists $maxBits{$hit} && $bits > $maxBits{$hit}){
            $maxBits{$hit}=$bits;
    }
    $Hohits->{$hit}->{$bits}=$COG;
}
close (IN);

您需要去掉第24行（倒数）：

在其中，您将

$COGhits{$hit}

设置为标量值（

$COG

）。稍后，在第26行中，您试图将

$COGhits{$hit}

作为一个数组取消引用，以推入其中。这不起作用，因为里面有一个标量

只需删除

if

并将这些行更改为此。现在所有的

$hit

都存储在数组引用中，所以应该可以做到这一点

$COGhit_count{$hit}++; # count how many COGs each gene is homologous to
push @{$COGhits{$hit}}, $COG;

$COGhits的输出

：

$VAR4 = {
      '158802708-stool2_revised_C1077267_1_gene26470' => [
                                                           'POG0002'
                                                         ],
      '764062976-stool2_revised_C999233_1_gene54902' => [
                                                          'POG0002'
                                                        ],
      '764184357-stool1_revised_scaffold22981_1_gene47608' => [
                                                                'POG0002'
                                                              ],
      '765701615-stool1_revised_C1349270_1_gene168522' => [
                                                            'POG0002'
                                                          ],
      '763901136-stool1_revised_scaffold39447_1_gene145241' => [
                                                                 'POG0002'
                                                               ],
      '160502038-stool1_revised_scaffold47906_2_gene161164' => [
                                                                 'POG0003',
                                                                 'POG0003'
                                                               ]
    };

但是，如果同时需要标量和数组ref，请尝试以下代码不过，我不推荐这个

$COGhit_count{$hit}++; # count how many COGs each gene is homologous to
if ($COGhit_count{$hit} == 1) {
  $COGhits{$hit}=$COG;             # Save as scalar
}
elsif ($COGhit_count{$hit} == 2) { # If we've just found the second hit,
  my $temp = $COGhits{$hit};       # save the first and convert $COGhits{$hit}
  $COGhits{$hit} = [];             # to an array ref, then push both the old and
  push @{$COGhits{$hit}}, $temp, $COG; # the new value in it.
} elsif ($COGhit_count{$hit} > 2) {
  push @{$COGhits{$hit}}, $COG;    # Just push the new value in
}

思考：您可能首先使用了

$COGhits{$hit}=$COG

，但随后注意到有时可能有多个值，因此您添加了

推送

行，但您没有意识到您实际上必须替换旧行。

它准确地告诉您做错了什么

$COGhits{$hit}=$COG;  # <--- scalar
if ($COGhit_count{$hit}>1) {
        push @{$COGhits{$hit}}, $COG; # <--- array
}

但是，每当您想以不同的方式访问混合树时，就必须编写分支逻辑。使用标量所节省的性能在某种程度上被测试和分支所消耗

所以我不再做太多这样的事了。我事先决定关系是1-1还是1-n。下面的例程可以在一定程度上使处理这类表更加简单

sub get_list_from_hash { 
    my ( $hash, $key ) = @_;
    my $ref = \$hash->{ $key };
    return unless defined( $$ref );
    return ref( $$ref ) ? @$$ref : $$ref;
}

sub store_in_hash { 
    $_[0] = {} unless ref $_[0];
    my ( $hash, $key, @values ) = @_;
    my @defined = grep {; defined } @values;
    unless ( @defined ) { 
        delete $hash->{ $key };
        return;
    }

    my $ref = \$hash->{ $key };
    if ( ref $$ref ) { 
        push @$$ref, @defined;
    }
    elsif ( defined $$ref ) { 
        $$ref = [ $$ref, @defined ];
    }
    elsif ( @values > 1 ) { 
        @$$ref = @defined;
    }
    else { 
        ( $$ref ) = @defined;
    }
}

谢谢-我直到最近才真正意识到接受这个东西，will doI也这么认为，但我以前试过$COGhits{$hit}[0]=$COG；首先，然后按照simbabque的建议删除第24行（这会导致空数组，例如数组（0xaf9b58）），删除第24行没有意义，除非命中到中心距的映射为1-1。删除赋值是有意义的，特别是如果您希望

$COGhit_count{$hit}

的值永远大于1。但是分配给数组的第一个元素没有任何意义。当然，关于映射你是对的。我没想过。

$COGhits{$hit}=$COG;  # <--- scalar
if ($COGhit_count{$hit}>1) {
        push @{$COGhits{$hit}}, $COG; # <--- array
}

my $ref = \$hashref->{ $key }; # autovivifies slot as simple scalar.
                               # it starts out as undefined.
if ( ref $$ref ) {             # ref $ref will always be true
    push @$$ref, $value;
}
else { 
    $$ref = defined( $$ref ) ? [ $$ref, $value ] : $value;
}

sub get_list_from_hash { 
    my ( $hash, $key ) = @_;
    my $ref = \$hash->{ $key };
    return unless defined( $$ref );
    return ref( $$ref ) ? @$$ref : $$ref;
}

sub store_in_hash { 
    $_[0] = {} unless ref $_[0];
    my ( $hash, $key, @values ) = @_;
    my @defined = grep {; defined } @values;
    unless ( @defined ) { 
        delete $hash->{ $key };
        return;
    }

    my $ref = \$hash->{ $key };
    if ( ref $$ref ) { 
        push @$$ref, @defined;
    }
    elsif ( defined $$ref ) { 
        $$ref = [ $$ref, @defined ];
    }
    elsif ( @values > 1 ) { 
        @$$ref = @defined;
    }
    else { 
        ( $$ref ) = @defined;
    }
}