Arrays 基于其他列中的值从列中检索唯一值_Arrays_Perl_Data Structures

Arrays 基于其他列中的值从列中检索唯一值

arrays perl data-structures

Arrays 基于其他列中的值从列中检索唯一值,arrays,perl,data-structures,Arrays,Perl,Data Structures,我有一张这样的桌子 symbol length id A 10 id_1 A 15 id_2 A 15 id_3 B 20 id_4 B 25 id_5 ... ... ... 我想在新表中打印以下内容 symbol length id A 15 id_2; id_3

我有一张这样的桌子

symbol    length    id
A         10        id_1
A         15        id_2
A         15        id_3
B         20        id_4
B         25        id_5
...       ...       ...

我想在新表中打印以下内容

symbol    length    id
A         15        id_2; id_3
B         25        id_5
...       ...       ...

所以我想循环遍历

符号

列。当此列中存在重复值时，我希望打印数字长度值最大的行（例如：符号B）。当最大

length

值相等时，我想合并

id

列中的值（示例：符号A）并打印这一新行

我应该如何在perl中做到这一点？

perl中用于合并重复项的工具是散列。散列是键-值对，但有用的部分是-值可以是数组（引用）

我的建议是这样的：

#!/usr/bin/perl
use strict;
use warnings;

my %length_of;
my %ids_of;

my $heading_row = <DATA>;
while (<DATA>) {
    my ( $symbol, $length, $id ) = split;
    if ( not defined $length_of{$symbol} or $length_of{$symbol} < $length ) {
        $length_of{$symbol} = $length;
    }
    push( @{ $ids_of{$symbol}{$length} }, $id );
}

print join( "\t", "symbol", "length", "ids" ), "\n";
foreach my $symbol ( sort keys %ids_of ) {
    my $length = $length_of{$symbol};
    print join( "\t",
        $symbol, 
        $length,
        join( "; ", @{ $ids_of{$symbol}{$length} } ) ),
        "\n";
}

__DATA__
symbol    length    id
A         10        id_1
A         15        id_2
A         15        id_3
B         20        id_4
B         25        id_5

#/usr/bin/perl
严格使用；
使用警告；
我的%length_；
我的%ids\u；
我的$heading_row=；
而（）{
my（$symbol，$length，$id）=拆分；
if（未定义{$symbol}的$length_或{$symbol}<$length的$length_）{
{$symbol}的$length_=$length；
}
push（@{$id_of{$symbol}{$length}}，$id）；
}
打印联接（“\t”、“符号”、“长度”、“ID”）、“\n”；
foreach my$符号（排序键%ids\u of）{
my$length={$symbol}的$length_；
打印连接（“\t”，
$symbol，
$length，
join（“；”，@{$id_of{$symbol}{$length}）），
“\n”；
}
__资料__
符号长度id
A 10 id\u 1
一张15号的支票
一张15号的支票
B 20 id_4
B 25 id_5

这样做的目的是-迭代数据，并保存最高的

length

值（在

%length\u of

中）。它还根据符号和长度（在%ids\u of中）存储每个ID。它保留了所有数据，因此如果您有大量数据，这可能不是非常有效

perl中用于合并重复项的工具是散列。散列是键-值对，但有用的部分是-值可以是数组（引用）

我的建议是这样的：

#!/usr/bin/perl
use strict;
use warnings;

my %length_of;
my %ids_of;

my $heading_row = <DATA>;
while (<DATA>) {
    my ( $symbol, $length, $id ) = split;
    if ( not defined $length_of{$symbol} or $length_of{$symbol} < $length ) {
        $length_of{$symbol} = $length;
    }
    push( @{ $ids_of{$symbol}{$length} }, $id );
}

print join( "\t", "symbol", "length", "ids" ), "\n";
foreach my $symbol ( sort keys %ids_of ) {
    my $length = $length_of{$symbol};
    print join( "\t",
        $symbol, 
        $length,
        join( "; ", @{ $ids_of{$symbol}{$length} } ) ),
        "\n";
}

__DATA__
symbol    length    id
A         10        id_1
A         15        id_2
A         15        id_3
B         20        id_4
B         25        id_5

#/usr/bin/perl
严格使用；
使用警告；
我的%length_；
我的%ids\u；
我的$heading_row=；
而（）{
my（$symbol，$length，$id）=拆分；
if（未定义{$symbol}的$length_或{$symbol}<$length的$length_）{
{$symbol}的$length_=$length；
}
push（@{$id_of{$symbol}{$length}}，$id）；
}
打印联接（“\t”、“符号”、“长度”、“ID”）、“\n”；
foreach my$符号（排序键%ids\u of）{
my$length={$symbol}的$length_；
打印连接（“\t”，
$symbol，
$length，
join（“；”，@{$id_of{$symbol}{$length}）），
“\n”；
}
__资料__
符号长度id
10号身份证
一张15号的支票
一张15号的支票
B 20 id_4
B 25 id_5

这样做的目的是-迭代数据，并保存最高的

length

值（在

%length\u of

中）。它还根据符号和长度（在%ids\u of中）存储每个ID。它保留了所有数据，因此如果您有大量数据，这可能不是非常有效

只需记住最后一个符号和长度并累积ID：

#! /usr/bin/perl
use warnings;
use strict;

my ($last_l, $last_s, @i);

sub out {
    print "$last_s\t$last_l\t", join(";", @i), "\n"
}

while (<>) {
    my ($s, $l, $i) = split;
    out() if $last_s and $s ne $last_s;
    undef @i if $last_l < $l;
    push @i, $i;
    $last_s = $s;
    $last_l = $l;
}
out();

#/usr/bin/perl
使用警告；
严格使用；
我的（$last_l，$last_s，@i）；
转出{
打印“$last\u s\t$last\u l\t”，加入（“；”，@i），“\n”
}
而（）{
我的（$s，$l，$i）=分割；
out（）如果$last_s和$s ne$last_s；
如果$last_l<$l，则在i时取消定义；
推@i，$i；
$last_s=$s；
$last_l=$l；
}
out（）；

只需记住最后一个符号和长度并累积ID：

#! /usr/bin/perl
use warnings;
use strict;

my ($last_l, $last_s, @i);

sub out {
    print "$last_s\t$last_l\t", join(";", @i), "\n"
}

while (<>) {
    my ($s, $l, $i) = split;
    out() if $last_s and $s ne $last_s;
    undef @i if $last_l < $l;
    push @i, $i;
    $last_s = $s;
    $last_l = $l;
}
out();

#/usr/bin/perl
使用警告；
严格使用；
我的（$last_l，$last_s，@i）；
转出{
打印“$last\u s\t$last\u l\t”，加入（“；”，@i），“\n”
}
而（）{
我的（$s，$l，$i）=分割；
out（）如果$last_s和$s ne$last_s；
如果$last_l<$l，则在i时取消定义；
推@i，$i；
$last_s=$s；
$last_l=$l；
}
out（）；

此方法通过使用

符号

和

length

列中的值作为键，并将

id

列中的值添加为数组引用，从而构建a。对于您提供的简单数据集，实际上并不需要如此复杂的数据结构，但在数据未排序的情况下，下面显示的方法可能更灵活

我使用（，它是核心发行版的一部分）中的

max

函数来获取每个

符号的最大length
值，并使用Data:：Dumper
来帮助可视化事物
use Data::Dumper ;
use List::Util 'max';
use v5.16; 

my (%hash, @lines) ;

while ( <DATA>) {
    chomp ;
    next if $. == 1 ;
    push @lines,  [ split ] ;
}

for (@lines) { 
    push @{ $hash{ $_->[0] }{ $_->[1] } }, $_->[2] ;
}

say "This is your %hash:\n", Dumper \%hash; 

for my $symbol ( keys %hash ) {
    my $max =  max ( keys $hash{$symbol} ) ;
    say "$symbol \t",  "$max \t",  join "; ", @{ $hash{$symbol}{$max} };
}

__DATA__
symbol    length    id
A         10        id_1
A         15        id_2
A         15        id_3
B         20        id_4
B         25        id_5

这种方法通过使用符号
和长度
列中的值作为键，并添加id
列中的值作为数组引用来构建数组。对于您提供的简单数据集，实际上并不需要如此复杂的数据结构，但在数据未排序的情况下，下面显示的方法可能更灵活
我使用（，它是核心发行版的一部分）中的max
函数来获取每个符号的最大length
值，并使用Data:：Dumper
来帮助可视化事物
use Data::Dumper ;
use List::Util 'max';
use v5.16; 

my (%hash, @lines) ;

while ( <DATA>) {
    chomp ;
    next if $. == 1 ;
    push @lines,  [ split ] ;
}

for (@lines) { 
    push @{ $hash{ $_->[0] }{ $_->[1] } }, $_->[2] ;
}

say "This is your %hash:\n", Dumper \%hash; 

for my $symbol ( keys %hash ) {
    my $max =  max ( keys $hash{$symbol} ) ;
    say "$symbol \t",  "$max \t",  join "; ", @{ $hash{$symbol}{$max} };
}

__DATA__
symbol    length    id
A         10        id_1
A         15        id_2
A         15        id_3
B         20        id_4
B         25        id_5

表的排序是否如示例中所示？是的，根据符号，然后根据长度这张表按示例中所示进行排序？是的，根据符号，然后根据长度我的表在文件大小上有36000行，这将在内存中保留大部分行，并增加开销，因为它就是这样工作的。大约10字节的36000行不需要担心。我的表在文件大小上有36000行需要更多的时间-它会将大部分内容保留在内存中，并增加开销，因为它就是这样工作的。36000行10字节左右的数据不需要担心。