Perl 是否可以使用单个散列计算两列中的重复数?
我的输入数据如下。从下面的数据中,我想唯一化p1 p2。。p5和第一列,并获取它们的计数Perl 是否可以使用单个散列计算两列中的重复数?,perl,hash,Perl,Hash,我的输入数据如下。从下面的数据中,我想唯一化p1 p2。。p5和第一列,并获取它们的计数 ID M N cc1 1 p1 cc1 10 p2 cc1 10 p2 cc2 1 p1 cc2 2 p5 cc3 2 p1 cc3 2 p4 我原以为结果是这样的 ID M p1 p2 p3 p4 p5 cc1 3 1 2 0 0 0 cc3 2 1 0 0 1 0 cc2 2 1 0 0
ID M N
cc1 1 p1
cc1 10 p2
cc1 10 p2
cc2 1 p1
cc2 2 p5
cc3 2 p1
cc3 2 p4
我原以为结果是这样的
ID M p1 p2 p3 p4 p5
cc1 3 1 2 0 0 0
cc3 2 1 0 0 1 0
cc2 2 1 0 0 0 1
为此,我尝试了散列和散列,我得到了我期望的输出。但我怀疑是否可以通过使用单个散列来实现这一点。?因为相同的数据存储在两个不同的散列中
my (%hash,$hash2);
<$fh>;
while (<$fh>)
{
my($first,$second,$thrid) = split("\t");
$hash{$first}{$thrid}++; #I tried $hash{$first}++{$thrid}++ It throws syntax error
$hash2{$first}++; #it is possible to reduce this hash
}
my @ar = qw(p1 p2 p3 p4 p5);
$, = "\t";
print @ar,"\n";
foreach (keys %hash)
{
print "$_\t$hash2{$_}\t";
foreach my $ary(@ar)
{
if(!$hash{$_}{$ary})
{
print "0\t";
}
else
{
print "$hash{$_}{$ary}\t";
}
}
print "\n";
}
不需要使用2个哈希。只能使用哈希的哈希。我刚刚修改了你的代码。请参阅下面的代码
use strict;
use warnings;
my %hash;
<DATA>;
while (<DATA>)
{
chomp;
my($first,$second,$thrid) = split("\t");
$hash{$first}{$thrid}++; #I tried $hash{$first}++{$thrid}++ It throws syntax error
}
my @ar = qw(p1 p2 p3 p4 p5);
$, = "\t";
print @ar,"\n";
foreach (keys %hash)
{
# print "$_\t$hash2{$_}\t";
my @in = values $hash{$_};
my $cnt = eval(join("+",@in));
print "$_\t$cnt\t";
foreach my $ary(@ar)
{
if(!$hash{$_}{$ary})
{
print "0\t";
}
else
{
print "$hash{$_}{$ary}\t";
}
}
print "\n";
}
您有散列的散列来存储数据。第一个键是id,第二个键是N。只需计算id的值,它就会给出您想要的总值 我可能会这样做:
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
my %count_of;
#read the header row
chomp( my @header = split ' ', <DATA> );
while (<DATA>) {
my ( $ID, $M, $N ) = split;
$count_of{ $ID }{ $N }++;
}
#print Dumper \%count_of;
#setup the output headers. We could autodetect, but some of these (p3) are entirely empty.
my @p_headers = qw ( p1 p2 p3 p4 p5 );
#if you did want to:
#my @p_headers = sort keys %{{map { $_ => 1 } map { keys %{$count_of{$_}} } keys %count_of }};
#will give p1 p2 p4 p5.
print join "\t", qw ( ID M ), @p_headers, "\n";
foreach my $ID ( sort keys %count_of ) {
my $total = 0;
$total += $_ for values %{ $count_of{$ID} };
print join "\t",
$ID,
$total,
( map { $count_of{$ID}{$_} // 0 } @p_headers ),
"\n";
}
__DATA__
ID M N
cc1 1 p1
cc1 10 p2
cc1 10 p2
cc2 1 p1
cc2 2 p5
cc3 2 p1
cc3 2 p4
我不明白你最后一行的内容。为什么p4和p5的值为4和1?输入中没有10_覆盖率_Contigs_contig_3和SSR为的行p5@Borodin抱歉,编辑了一个小的打字错误。