Perl 在unix/per/tcl中，如何根据B列的降序对A列进行唯一排序？_Perl_Unix_Tcl

Perl 在unix/per/tcl中，如何根据B列的降序对A列进行唯一排序？

perl unix tcl

Perl 在unix/per/tcl中，如何根据B列的降序对A列进行唯一排序？,perl,unix,tcl,Perl,Unix,Tcl,我有一个类似下面的csv文件 Column A, Column B cat,30 cat,40 dog,10 elephant,23 dog,3 elephant,37 如何根据列上的最大对应值对列A进行唯一排序 B栏我想得到的结果是 Column A, Column B cat,40 elephant,37 dog,10 救命啊 $ sort -t, -k1,1 -k2,2nr filename | awk -F, '!a[$1]++' Column A, Column B cat,40

我有一个类似下面的csv文件

Column A, Column B
cat,30
cat,40
dog,10
elephant,23
dog,3
elephant,37

如何根据列上的最大对应值对列A进行唯一排序 B栏

我想得到的结果是

Column A, Column B
cat,40
elephant,37
dog,10

救命啊

$ sort -t, -k1,1 -k2,2nr filename | awk -F, '!a[$1]++'
Column A, Column B
cat,40
dog,10
elephant,37

如果您想要您的特定输出，它需要多一点编码，因为标题行

$ sort -t, -k1,1 -k2nr filename | awk -F, 'NR==1{print "999999\t"$0;next} !a[$1]++{print $2"\t"$0}' | sort -k1nr | cut -f2-
Column A, Column B
cat,40
elephant,37
dog,10

另一种选择是先删除标题，然后在末尾添加标题

$ h=$(head -1 filename); sed 1d filename | sort -t, -k1,1 -k2nr | awk -F, '!a[$1]++' | sort -t, -k2nr | sed '1i'"$h"''

显然：

#!/usr/bin/env perl
use strict;
use warnings;

#print header row
print scalar <>;
my %seen;
#iterate the magic filehandle (file specified on command line or 
#stdin - e.g. like grep/sed)
while (<>) {
    chomp; #strip trailing linefeed
    #split this line on ','
    my ( $key, $value ) = split /,/;

    #save this value if previous is lower or non existant
    if ( not defined $seen{$key}
        or $seen{$key} < $value )
    {
        $seen{$key} = $value;
    }
}

#sort, comparing values in %seen 
foreach my $key ( sort { $seen{$b} <=> $seen{$a} } keys %seen ) {
    print "$key,$seen{$key}\n";
}

#/usr/bin/env perl
严格使用；
使用警告；
#打印标题行
打印标量；
我看到的百分比；
#迭代magic filehandle（在命令行或
#标准数据（如grep/sed）
而（）{
chomp；#条尾送料
#在“，”上拆分此行
我的（$key，$value）=拆分/，/；
#如果“上一个”较低或不存在，请保存此值
if（未定义$seen{$key}
或$SEED{$key}<$value）
{
$seen{$key}=$value；
}
}
#排序，比较%中的值
foreach my$键（排序{$SEED{$b}$SEED{$a}}键%SEED）{
打印“$key，$seen{$key}\n”；
}

我对卡拉卡法的答案已经+1了。它既简单又优雅

我的回答是karakfa头球处理的扩展。如果你喜欢，请随时+1我的答案，但“最佳答案”应该去卡拉卡法。（当然，除非您更喜欢另一个答案中的一个！：）

如果您的输入如您在问题中所述，那么我们可以通过看到$2不是数字来识别标题。因此，以下内容不考虑标题：

$ sort -t, -k1,1 -k2,2nr filename | awk -F, '!a[$1]++'

您可以使用以下方法交替剥离收割台：

$ sort -t, -k1,1 -k2,2nr filename | awk -F, '$2~/^[0-9]+$/&&!a[$1]++'

这会大大降低速度，因为正则表达式的计算时间可能比简单的数组赋值和数值测试要长。我使用正则表达式进行数值测试，以便允许

，否则计算结果将为“false”

接下来，如果要保留标题，但先打印它，则可以在流的末尾处理输出：

$ sort -t, -k1,1 -k2,2nr filename | awk -F, '$2!~/^[0-9]+$/{print;next} !a[$1]++{b[$1]=$0} END{for(i in b){print b[i]}}'

在不将额外数组存储在内存中的情况下实现相同效果的最后一个选项是再次处理输入。这在IO方面成本更高，但在内存方面成本更低：

$ sort -t, -k1,1 -k2,2nr filename | awk -F, 'NR==FNR&&$2!~/^[0-9]+$/{print;nextfile} $2~/^[0-9]+$/&&!a[$1]++' filename -

一种可能的Tcl解决方案：

# read the contents of the file into a list of lines
set f [open data.csv]
set lines [split [string trim [chan read $f]] \n]
chan close $f

# detach the header
set lines [lassign $lines header]

# map the list of lines to a list of tuples
set tuples [lmap line $lines {split $line ,}]

# use an associative array to get unique tuples in a flat list
array set uniqueTuples [concat {*}[lsort -index 1 -integer $tuples]]

# reassemble the tuples, sorted by name
set tuples [lmap {a b} [lsort -stride 2 -index 0 [array get uniqueTuples]] {list $a $b}]

# map the tuples to csv lines and insert the header
set lines [linsert [lmap tuple $tuples {join $tuple ,}] 0 $header]

# convert the list of lines into a data string
set data [join $lines \n]

puts [gets stdin]
set seen [dict create]

while {[gets stdin line] >= 0} {
    lassign [split $line ,] key value
    if {![dict exists $seen $key] || [dict get $seen $key] < $value} {
        dict set seen $key $value
    }
}

dict for {key val} [lsort -stride 2 -index 0 $seen] {
    puts $key,$val
}

此解决方案假设一个简化的数据集，其中没有引用的元素。如果有带引号的元素，则应使用

csv

模块，而不是

split

命令

另一个受Perl解决方案启发的解决方案：

# read the contents of the file into a list of lines
set f [open data.csv]
set lines [split [string trim [chan read $f]] \n]
chan close $f

# detach the header
set lines [lassign $lines header]

# map the list of lines to a list of tuples
set tuples [lmap line $lines {split $line ,}]

# use an associative array to get unique tuples in a flat list
array set uniqueTuples [concat {*}[lsort -index 1 -integer $tuples]]

# reassemble the tuples, sorted by name
set tuples [lmap {a b} [lsort -stride 2 -index 0 [array get uniqueTuples]] {list $a $b}]

# map the tuples to csv lines and insert the header
set lines [linsert [lmap tuple $tuples {join $tuple ,}] 0 $header]

# convert the list of lines into a data string
set data [join $lines \n]

puts [gets stdin]
set seen [dict create]

while {[gets stdin line] >= 0} {
    lassign [split $line ,] key value
    if {![dict exists $seen $key] || [dict get $seen $key] < $value} {
        dict set seen $key $value
    }
}

dict for {key val} [lsort -stride 2 -index 0 $seen] {
    puts $key,$val
}

put[get stdin]
设置所见[dict create]
而{[gets stdin line]>=0}{
lassign[拆分$line，]键值
如果{！[dict exists$seen$key]| |[dict get$seen$key]<$value}{
dict set seen$key$value
}
}
{key val}[lsort-stride 2-索引0$seen]的dict{
放入$key，$val
}

文档：，，，替换，，，

另一个perl

perl -MList::Util=max -F, -lane '
    if ($.==1) {print; next}
    $val{$F[0]} = max $val{$F[0]}, $F[1];
} {
    print "$_,$val{$_}" for reverse sort {$val{$a} <=> $val{$b}} keys %val;
' file

perl-MList:：Util=max-F，-lane'
如果（$.==1）{print；next}
$val{$F[0]}=max$val{$F[0]}，$F[1]；
} {
为反向排序{$val{$a}$val{$b}键%val打印“$$val{$}”；
"档案"

没有带引号的字段，字段中没有逗号？您将使用哪种语言？用多种语言询问是结束此问题的一个很好的理由…