Linux 使用无序的多部分密钥对文件进行排序

Linux 使用无序的多部分密钥对文件进行排序,linux,sorting,Linux,Sorting,使用Linux工具的任意组合而不使用任何功能齐全的编程语言,我如何对该列表进行排序 A,C 1 C,B 2 B,A 3 进入 没有申请任何选美比赛,这似乎很接近: #!/bin/bash while read one two; do one=`echo $one | sed -e 's/,/\n/g' | sort | sed -e ' 1 {h; d} $! {H; d} H; g; s/\n/,/g; '` echo $one $two done | sort 更改内部字段分隔符,然后将

使用Linux工具的任意组合而不使用任何功能齐全的编程语言,我如何对该列表进行排序

A,C 1
C,B 2
B,A 3
进入


没有申请任何选美比赛,这似乎很接近:

#!/bin/bash
while read one two; do
one=`echo $one | sed -e 's/,/\n/g' | sort | sed -e '
1 {h; d}
$! {H; d}
H; g; s/\n/,/g;
'`
echo $one $two
done | sort

更改内部字段分隔符,然后将前两个字母与>:

( 
IFS=" ,"; 
while read a b n; do 
    if [ "$a" \> "$b" ]; then 
        echo "$b,$a $n"; 
    else 
        echo "$a,$b $n"; 
    fi; 
done; 
) <<EOF | sort 
A,C 1
C,B 2
B,A 3
EOF

万一有人感兴趣。我对任何建议都不太满意。可能是因为我希望视图线解决方案,而据我所知,这样的解决方案并不存在。 不管怎样,我确实写了一个实用程序,叫做ljoin,类似于数据库中的left join,它完全符合我的要求:D

#!/usr/bin/perl
=head1 NAME

ljoin.pl - Utility to left join files by specified key column(s)

=head1 SYNOPSIS

ljoin.pl [OPTIONS] <INFILE1>..<INFILEN> <OUTFILE>

To successfully join rows one must suply at least one input file and exactly one output file. Input files can be real file names or a patern, like [ABC].txt or *.in etc.


=head1 DESCRIPTION

This utility merges multiple file into one using specified column as a key

=head2 OPTIONS

=item --field-separator=<separator>, -fs <separator>

Specifies what string should be used to separate columns in plain file. Default value for this option is tab symbol.

=item --no-sort-fields, -no-sf

Do not sort columns when creating a key for merging files

=item --complex-key-separator=<separator>, -ks <separator>

Specifies what string should be used to separate multiple values in multikey column. For example "A B" in one file can be presented as "B A" meaning that this application should somehow understand that this is the same key. Default value for this option is space symbol.

=item --no-sort-complex-keys, -no-sk

Do not sort complex column values when creating a key for merging files

=item --include-primary-field, -i

Specifies whether key which is used to find matching lines in multiple files should be included in the output file. First column in output file will be the key in any case, but in case of complex column the value of first column will be sorted. Default value for this option is false.

=item --primary-field-index=<index>, -f <index>

Specifies index of the column which should be used for matching lines.  You can use multiple instances of this option to specify a multi-column key made of more than one column like this "-f 0 -f 1"

=item --help, -?

Get help and documentation

=cut


use strict;
use warnings;
use Getopt::Long;
use Pod::Usage;

my $fieldSeparator = "\t";
my $complexKeySeparator = " ";
my $includePrimaryField = 0;
my $containsTitles = 0;
my $sortFields = 1;
my $sortComplexKeys = 1;
my @primaryFieldIndexes;

GetOptions(
    "field-separator|fs=s" => \$fieldSeparator,
    "sort-fields|sf!" => \$sortFields,
    "complex-key-separator|ks=s" => \$complexKeySeparator,
    "sort-complex-keys|sk!" => \$sortComplexKeys,
    "contains-titles|t!" => \$containsTitles,
    "include-primary-field|i!" => \$includePrimaryField,
    "primary-field-index|f=i@" => \@primaryFieldIndexes,
    "help|?!" => sub { pod2usage(0) }
) or pod2usage(2);

pod2usage(0) if $#ARGV < 1;

push @primaryFieldIndexes, 0 if $#primaryFieldIndexes < 0;

my %primaryFieldIndexesHash;
for(my $i = 0; $i <= $#primaryFieldIndexes; $i++)
{
    $primaryFieldIndexesHash{$i} = 1;
}

print "fieldSeparator = $fieldSeparator\n";
print "complexKeySeparator = $complexKeySeparator \n";
print "includePrimaryField = $includePrimaryField\n";
print "containsTitles = $containsTitles\n";
print "primaryFieldIndexes = @primaryFieldIndexes\n";
print "sortFields = $sortFields\n";
print "sortComplexKeys = $sortComplexKeys\n";

my $fieldsCount = 0;
my %keys_hash = ();
my %files = ();
my %titles = ();


# Read columns into a memory
foreach my $argnum (0 .. ($#ARGV - 1)) 
{
    # Find files with specified pattern
    my $filePattern = $ARGV[$argnum];
    my @matchedFiles = < $filePattern >;
    foreach my $inputPath (@matchedFiles) 
    {
        open INPUT_FILE, $inputPath or die $!;

        my %lines;
        my $lineNumber = -1;
        while (my $line = <INPUT_FILE>) 
        {
            next if $containsTitles && $lineNumber == 0;

            # Don't use chomp line. It doesn't handle unix input files on windows and vice versa
            $line =~ s/[\r\n]+$//g;

            # Skip lines that don't have columns
            next if $line !~ m/($fieldSeparator)/;

            # Split fields and count them (store maximum number of columns in files for later use)
            my @fields = split($fieldSeparator, $line);
            $fieldsCount = $#fields+1 if $#fields+1 > $fieldsCount;

            # Sort complex key
            my @multipleKey;
            for(my $i = 0; $i <= $#primaryFieldIndexes; $i++)
            {
                my @complexKey = split ($complexKeySeparator, $fields[$primaryFieldIndexes[$i]]);
                @complexKey = sort(@complexKey) if $sortFields;
                push @multipleKey, join($complexKeySeparator, @complexKey)
            }

            # sort multiple keys and create key string
            @multipleKey = sort(@multipleKey) if $sortFields;
            my $fullKey = join $fieldSeparator, @multipleKey;

            $lines{$fullKey} = \@fields;
            $keys_hash{$fullKey} = 1;
        }
        close INPUT_FILE;

        $files{$inputPath} = \%lines;
    }
}

# Open output file
my $outputPath = $ARGV[$#ARGV];
open OUTPUT_FILE, ">" . $outputPath or die $!;
my @keys = sort keys(%keys_hash); 

# Leave blank places for key columns
for(my $pf = 0; $pf <= $#primaryFieldIndexes; $pf++)
{
    print OUTPUT_FILE $fieldSeparator;
}

# Print column headers
foreach my $argnum (0 .. ($#ARGV - 1)) 
{
    my $filePattern = $ARGV[$argnum];
    my @matchedFiles = < $filePattern >;
    foreach my $inputPath (@matchedFiles) 
    {
        print OUTPUT_FILE $inputPath;

        for(my $f = 0; $f < $fieldsCount - $#primaryFieldIndexes - 1; $f++)
        {
            print OUTPUT_FILE $fieldSeparator;
        }
    }
}

# Print merged columns
print OUTPUT_FILE "\n";
foreach my $key ( @keys )
{
    print OUTPUT_FILE $key;

    foreach my $argnum (0 .. ($#ARGV - 1)) 
    {
        my $filePattern = $ARGV[$argnum];
        my @matchedFiles = < $filePattern >;
        foreach my $inputPath (@matchedFiles) 
        {
            my $lines = $files{$inputPath};

            for(my $i = 0; $i < $fieldsCount; $i++)
            {
                next if exists $primaryFieldIndexesHash{$i} && !$includePrimaryField;
                print OUTPUT_FILE $fieldSeparator;
                print OUTPUT_FILE $lines->{$key}->[$i] if exists $lines->{$key}->[$i];
            }
        }
    }

    print OUTPUT_FILE "\n";
}
close OUTPUT_FILE;

试试。把它放进一个文件,比如sort.sh。使其可执行:chmod a+x sort.sh。然后,如果您的输入数据在input.txt中,则执行./sort.sh
#!/usr/bin/perl
=head1 NAME

ljoin.pl - Utility to left join files by specified key column(s)

=head1 SYNOPSIS

ljoin.pl [OPTIONS] <INFILE1>..<INFILEN> <OUTFILE>

To successfully join rows one must suply at least one input file and exactly one output file. Input files can be real file names or a patern, like [ABC].txt or *.in etc.


=head1 DESCRIPTION

This utility merges multiple file into one using specified column as a key

=head2 OPTIONS

=item --field-separator=<separator>, -fs <separator>

Specifies what string should be used to separate columns in plain file. Default value for this option is tab symbol.

=item --no-sort-fields, -no-sf

Do not sort columns when creating a key for merging files

=item --complex-key-separator=<separator>, -ks <separator>

Specifies what string should be used to separate multiple values in multikey column. For example "A B" in one file can be presented as "B A" meaning that this application should somehow understand that this is the same key. Default value for this option is space symbol.

=item --no-sort-complex-keys, -no-sk

Do not sort complex column values when creating a key for merging files

=item --include-primary-field, -i

Specifies whether key which is used to find matching lines in multiple files should be included in the output file. First column in output file will be the key in any case, but in case of complex column the value of first column will be sorted. Default value for this option is false.

=item --primary-field-index=<index>, -f <index>

Specifies index of the column which should be used for matching lines.  You can use multiple instances of this option to specify a multi-column key made of more than one column like this "-f 0 -f 1"

=item --help, -?

Get help and documentation

=cut


use strict;
use warnings;
use Getopt::Long;
use Pod::Usage;

my $fieldSeparator = "\t";
my $complexKeySeparator = " ";
my $includePrimaryField = 0;
my $containsTitles = 0;
my $sortFields = 1;
my $sortComplexKeys = 1;
my @primaryFieldIndexes;

GetOptions(
    "field-separator|fs=s" => \$fieldSeparator,
    "sort-fields|sf!" => \$sortFields,
    "complex-key-separator|ks=s" => \$complexKeySeparator,
    "sort-complex-keys|sk!" => \$sortComplexKeys,
    "contains-titles|t!" => \$containsTitles,
    "include-primary-field|i!" => \$includePrimaryField,
    "primary-field-index|f=i@" => \@primaryFieldIndexes,
    "help|?!" => sub { pod2usage(0) }
) or pod2usage(2);

pod2usage(0) if $#ARGV < 1;

push @primaryFieldIndexes, 0 if $#primaryFieldIndexes < 0;

my %primaryFieldIndexesHash;
for(my $i = 0; $i <= $#primaryFieldIndexes; $i++)
{
    $primaryFieldIndexesHash{$i} = 1;
}

print "fieldSeparator = $fieldSeparator\n";
print "complexKeySeparator = $complexKeySeparator \n";
print "includePrimaryField = $includePrimaryField\n";
print "containsTitles = $containsTitles\n";
print "primaryFieldIndexes = @primaryFieldIndexes\n";
print "sortFields = $sortFields\n";
print "sortComplexKeys = $sortComplexKeys\n";

my $fieldsCount = 0;
my %keys_hash = ();
my %files = ();
my %titles = ();


# Read columns into a memory
foreach my $argnum (0 .. ($#ARGV - 1)) 
{
    # Find files with specified pattern
    my $filePattern = $ARGV[$argnum];
    my @matchedFiles = < $filePattern >;
    foreach my $inputPath (@matchedFiles) 
    {
        open INPUT_FILE, $inputPath or die $!;

        my %lines;
        my $lineNumber = -1;
        while (my $line = <INPUT_FILE>) 
        {
            next if $containsTitles && $lineNumber == 0;

            # Don't use chomp line. It doesn't handle unix input files on windows and vice versa
            $line =~ s/[\r\n]+$//g;

            # Skip lines that don't have columns
            next if $line !~ m/($fieldSeparator)/;

            # Split fields and count them (store maximum number of columns in files for later use)
            my @fields = split($fieldSeparator, $line);
            $fieldsCount = $#fields+1 if $#fields+1 > $fieldsCount;

            # Sort complex key
            my @multipleKey;
            for(my $i = 0; $i <= $#primaryFieldIndexes; $i++)
            {
                my @complexKey = split ($complexKeySeparator, $fields[$primaryFieldIndexes[$i]]);
                @complexKey = sort(@complexKey) if $sortFields;
                push @multipleKey, join($complexKeySeparator, @complexKey)
            }

            # sort multiple keys and create key string
            @multipleKey = sort(@multipleKey) if $sortFields;
            my $fullKey = join $fieldSeparator, @multipleKey;

            $lines{$fullKey} = \@fields;
            $keys_hash{$fullKey} = 1;
        }
        close INPUT_FILE;

        $files{$inputPath} = \%lines;
    }
}

# Open output file
my $outputPath = $ARGV[$#ARGV];
open OUTPUT_FILE, ">" . $outputPath or die $!;
my @keys = sort keys(%keys_hash); 

# Leave blank places for key columns
for(my $pf = 0; $pf <= $#primaryFieldIndexes; $pf++)
{
    print OUTPUT_FILE $fieldSeparator;
}

# Print column headers
foreach my $argnum (0 .. ($#ARGV - 1)) 
{
    my $filePattern = $ARGV[$argnum];
    my @matchedFiles = < $filePattern >;
    foreach my $inputPath (@matchedFiles) 
    {
        print OUTPUT_FILE $inputPath;

        for(my $f = 0; $f < $fieldsCount - $#primaryFieldIndexes - 1; $f++)
        {
            print OUTPUT_FILE $fieldSeparator;
        }
    }
}

# Print merged columns
print OUTPUT_FILE "\n";
foreach my $key ( @keys )
{
    print OUTPUT_FILE $key;

    foreach my $argnum (0 .. ($#ARGV - 1)) 
    {
        my $filePattern = $ARGV[$argnum];
        my @matchedFiles = < $filePattern >;
        foreach my $inputPath (@matchedFiles) 
        {
            my $lines = $files{$inputPath};

            for(my $i = 0; $i < $fieldsCount; $i++)
            {
                next if exists $primaryFieldIndexesHash{$i} && !$includePrimaryField;
                print OUTPUT_FILE $fieldSeparator;
                print OUTPUT_FILE $lines->{$key}->[$i] if exists $lines->{$key}->[$i];
            }
        }
    }

    print OUTPUT_FILE "\n";
}
close OUTPUT_FILE;