Perl 根据字段对多行进行排序

Perl 根据字段对多行进行排序,perl,perl-module,line-processing,Perl,Perl Module,Line Processing,我这里有一个多行的记录,我要做的是根据类型和行标题中的6位数字对它们进行排序1 记录如下: HEADER1|TYPE1|123456|JOHN SMITH INFO|M|34|SINGLE INFO|SGT STATUS|KIA MSG|NONE HEADER1|TYPE3|654123|DANICA CLYNE INFO|F|20|SINGLE STATUS|MIA MSG|HELP MSG1|| HEADER1|TYPE2|987456|NIDALEE LANE INFO|F|26|MARR

我这里有一个多行的记录,我要做的是根据类型和行标题中的6位数字对它们进行排序1

记录如下:

HEADER1|TYPE1|123456|JOHN SMITH
INFO|M|34|SINGLE
INFO|SGT
STATUS|KIA
MSG|NONE
HEADER1|TYPE3|654123|DANICA CLYNE
INFO|F|20|SINGLE
STATUS|MIA
MSG|HELP
MSG1||
HEADER1|TYPE2|987456|NIDALEE LANE
INFO|F|26|MARRIED
STATUS|INJURED
MSG|NONE
HEADER1|TYPE1|123456|JOHN CONNOR
INFO|M|34|SINGLE
STATUS|KIA
MSG|NONE
HEADER1|TYPE4|123789|CAITLYN MIST
INFO|F|19|SINGLE
INFO|||
STATUS|NONE
MSG|NONE
HEADER1|TYPE2|987456|NIDALEE CROSS
INFO|F|26|MARRIED
STATUS|INJURED
MSG|NONE
输出应如下所示: 它对符合规则的行进行排序

HEADER1|TYPE1|123456|JOHN SMITH
INFO|M|34|SINGLE
INFO|SGT
STATUS|KIA
MSG|NONE
HEADER1|TYPE1|123456|JOHN CONNOR
INFO|M|34|SINGLE
STATUS|KIA
MSG|NONE
HEADER1|TYPE2|987456|NIDALEE LANE
INFO|F|26|MARRIED
STATUS|INJURED
MSG|NONE
HEADER1|TYPE2|987456|NIDALEE CROSS
INFO|F|26|MARRIED
STATUS|INJURED
MSG|NONE
HEADER1|TYPE3|654123|DANICA CLYNE
INFO|F|20|SINGLE
STATUS|MIA
MSG|HELP
MSG1||
HEADER1|TYPE4|123789|CAITLYN MIST
INFO|F|19|SINGLE
INFO|||
STATUS|NONE
MSG|NONE

如果您不关心性能,每个“记录”由4行组成:

# Assume STDIN since the question didn't say anything
my $line_index = 0;
my (@records, @record);
# Slurp in all records into array of quadruplets
while (<>) {
    if (0 == $line_index) {
        push @records, [];
    };
    $records[-1]->[$line_index] = $_; # -1 lets you access last element of array.
    $line_index++;
    $line_index = 0 if $line_index == 4; # better done via "%" 
}

# Sort the array. Since we sort by type+id, 
# we can simply sort the first strings alphabetically.
my @records_sorted = sort { $a->[0] cmp $b->[0] } @records;

foreach my $record (@records_sorted) {
    print join("", @$record); # Newlines never stripped, no need to append
}
从@行创建@记录的另一个选项是
List::Gen

use List::Gen qw/by/; 
foreach my $record (by 4 => @lines) {
    push @records, $record;
}

请注意,上述代码假设所有的#都是6位数字。如果不是这样,则需要稍微修改代码:

use List::Gen qw/by/;
my @lines = File::Slurp::read_file("my_file.txt");
my @records;
foreach my $record (by 4 => @lines) {
    my @sort_by = split(m#/#, $record->[0]);
    push @records, [ $record, \@sort_by ];
}
my @records_sorted = sort { 
                             $a->[1]->[1] cmp $b->[1]->[1] 
                          || $a->[1]->[2] <=> $b->[1]->[1]
                     } @records;
foreach my $record (@records_sorted) {
    print join("", @{$record->[0]}); 
}
使用列表::Gen qw/by/;
my@lines=File::Slurp::read_File(“my_File.txt”);
我的@记录;
foreach my$记录(按4=>@行){
我的@sort_by=split(m#/#,$record->[0]);
推送@records,[$record,\@sort\u by];
}
my@records\u sorted=sort{
$a->[1]->[1]cmp$b->[1]->[1]
||$a->[1]->[2]$b->[1]->[1]
}@记录;
foreach my$record(@records\u排序){
打印联接(“,@{$record->[0]});
}

更新:由于OP决定每个记录的输入文件可能有任意行,因此更新的代码如下:

my (@records, @record);
# Slurp in all records into array of quadruplets
while (<>) {
    if (/HEADER1/) {
        my @sort_by = split(m#/#);            
        push @records, [[], \@sort_by];
    };
    push @{ $records[-1]->[0] }, $_;
}
my @records_sorted = sort { 
                             $a->[1]->[1] cmp $b->[1]->[1] 
                          || $a->[1]->[2] <=> $b->[1]->[1]
                     } @records;
foreach my $record (@records_sorted) {
    print join("", @{$record->[0]}); 
}
my(@records,@record);
#将所有记录拼成四元组数组
而(){
如果(/HEADER1/){
我的@sort#u by=split(m#/#);
推送@records,[[],\@sort\u by];
};
推送{$records[-1]->[0]},$\;
}
my@records\u sorted=sort{
$a->[1]->[1]cmp$b->[1]->[1]
||$a->[1]->[2]$b->[1]->[1]
}@记录;
foreach my$record(@records\u排序){
打印联接(“,@{$record->[0]});
}
这是我的解决方案

#!/bin/perl

use warnings;
use strict;

# Read in the file
open(my $fh, '<', "./record.txt") or DIE $!;
my @lines = <$fh>;
my @records;

# populate @records with each element having 4 lines
for ( my $index = 0; $index < scalar @lines; $index+=4 ) {
    push @records, join("", ($lines[$index], $lines[$index+1], $lines[$index+2], $lines[$index+3]));
}

# sort by type and then by numbers
@records =  map { $_->[0] }
            sort { $a->[1] cmp $b->[1] || $a->[2] cmp $b->[2] }
            map { [ $_ , (split('\|', $_))[1], (split('\|', $_))[2] ] }
            @records;

print "@records";
#/bin/perl
使用警告;
严格使用;
#读入文件
打开(我的$fh,[0]}
排序{$a->[1]cmp$b->[1]| |$a->[2]cmp$b->[2]}
映射{[$\'(split('\\',$)[1],(split('\\\',$)[2]}
@新记录;
打印“@记录”;

使用List::MoreUtils“apply”并将输入\记录\分隔符设置为“HEADER”,代码可能如下所示

#!/usr/bin/perl
use strict;
use warnings;
use List::MoreUtils qw/ apply /;

my $fname =  'dup_data.txt';

open (my $input_fh, '<', $fname) or die "Unable to read '$fname' because $!";
open (my $OUTPUTA, ">", $fname .".reformat")
    or die "$0: could not write to '$fname.reformat'. $!";

{
    local $/ = "HEADER";

    print $OUTPUTA map{ "HEADER$_->[0]"}
                  sort {$a->[1] <=> $b->[1] || $a->[2] <=> $b->[2]}
                   map {[$_, /TYPE(\d+)\|(\d+)/]}
                  grep $_, apply {chomp} <$input_fh>;
}
close $input_fh or die $!;
close $OUTPUTA or die $!;
!/usr/bin/perl
严格使用;
使用警告;
使用列表::MoreUtils qw/apply/;
my$fname='dup_data.txt';

打开(我的$input_fh,'你有没有尝试过的代码?@kjprice Im仍在制定代码?数字是否与类型相对应?例如,类型1总是有数字123456,或者你想让它们先按类型再按数字排序?@Matt我想让它按类型和数字排序请注意-这不是最优雅或最简单的idiomatic可能的Perl代码。我试图让它更适合新手。谢谢DVK,它不必总是4行,我应该对此做什么调整?@Soncire-取决于您的文件的外观。我根据您的问题回答。如果您的文件格式不同,最好作为新问题发布,因为更改文件格式对我不太公平在我发布了一个综合答案后输入文件:)哦,我明白了,非常感谢DVKwhat如果信息是两行还是MSG?就像这个标题1 | TYPE1 | 123456 | JOHN SMITH信息| M | 34 |单一信息|未嵌入状态| KIA MSG | NONE MSG1 NONE |你只需改变阅读方式(每2行而不是循环中的4行阅读),但排序仍然是一样的。你也要按消息排序吗?不,matt,只按类型和数字排序,我只想让你知道记录并不总是由4行组成。Shey matt,你更新的解决方案有效:)我希望,我能够理解每行的工作原理!警告-对于一个大文件,这可能会变得非常缓慢:)
#!/bin/perl

use warnings;
use strict;


open(my $fh, '<', "./record.txt") or DIE $!;
my @lines = <$fh>;
my $temp = join ("", @lines);
my @records = split("HEADER1", "$temp");
my @new_records;

for my $rec (@records){
    push @new_records, "HEADER1" . $rec;
}
shift @new_records;



@records =  map { $_->[0] }
            sort { $a->[1] cmp $b->[1] || $a->[2] cmp $b->[2] }
            map { [ $_ , (split('\|', $_))[1], (split('\|', $_))[2] ] }
            @new_records;



print "@records";
#!/usr/bin/perl
use strict;
use warnings;
use List::MoreUtils qw/ apply /;

my $fname =  'dup_data.txt';

open (my $input_fh, '<', $fname) or die "Unable to read '$fname' because $!";
open (my $OUTPUTA, ">", $fname .".reformat")
    or die "$0: could not write to '$fname.reformat'. $!";

{
    local $/ = "HEADER";

    print $OUTPUTA map{ "HEADER$_->[0]"}
                  sort {$a->[1] <=> $b->[1] || $a->[2] <=> $b->[2]}
                   map {[$_, /TYPE(\d+)\|(\d+)/]}
                  grep $_, apply {chomp} <$input_fh>;
}
close $input_fh or die $!;
close $OUTPUTA or die $!;