Linux 使用awk和sed对重复数据块进行头和尾排序_Linux_Sorting_Awk

Linux 使用awk和sed对重复数据块进行头和尾排序

linux sorting awk

Linux 使用awk和sed对重复数据块进行头和尾排序,linux,sorting,awk,Linux,Sorting,Awk,我的文件由多个重复数据块组成，例如： # header1 # header2 # header3 # header4 5846 4 1 1579 0 0.943 0.944 2.004 -0.477 6276 4 2 859 775 0 0.936 0.948 1.892 2.000 -0.836 6311 4 3 5075 6225 5757 0

我的文件由多个重复数据块组成，例如：

# header1
# header2
# header3
# header4
 5846 4 1 1579 0         0.943         0.944         2.004        -0.477
 6276 4 2 859 775 0         0.936         0.948         1.892         2.000        -0.836
 6311 4 3 5075 6225 5757 0         0.637         0.622         0.400         1.663         2.000        -0.729
 6381 4 2 2815 4471 0         0.934         0.925         1.861         2.000        -0.737
 159 4 2 2275 4444 0         0.928         0.936         1.867         2.000        -0.745
 442 5 4 504 1979 3483 584 0         0.910         0.937         0.945         0.931         3.898         0.000         1.420
 504 4 2 2230 442 0         0.895         0.910         1.815         2.000        -0.769
 584 4 2 442 7135 0         0.931         0.813         1.748         2.000        -0.666
 1549 5 4 6293 1979 2256 4130 0         0.908         0.924         0.948         0.932         3.847         0.000         1.407
 6329 4 3 6999 1927 5757 0         0.129         0.917         0.531         1.579         2.000        -0.739
# tail1
# tail2
# header1
# header2
# header3
# header4
 6104 4 2 2815 3250 0         0.933         0.926         1.866         2.000        -0.729
 7035 3 6 45 7395 7220 5576 7135 5046 0         0.320         0.182         0.586         0.721         0.295         0.759         2.864         0.000         1.239
 7220 4 3 5892 7035 454 0         0.566         0.586         0.704         1.856         2.000        -0.724
 7395 3 6 45 576 2060 3326 7035 7263 0         0.685         0.341         0.493         0.594         0.182         0.256         2.692         0.000         1.128
 454 5 4 7220 6363 1851 3638 0         0.704         0.913         0.941         0.935         3.575         0.000         1.150
 7146 4 2 838 2830 0         0.905         0.927         1.844         2.000        -0.729
 7135 3 5 584 7035 5576 887 5046 0         0.813         0.295         0.249         0.242         0.542         2.192         0.000         1.101
 838 5 4 7146 2723 7250 2816 0         0.905         0.937         0.923         0.926         3.814         0.000         1.481
 877 5 4 111 887 1884 6108 0         0.916         0.913         0.938         0.937         3.787         0.000         1.450
 887 4 3 7135 877 5372 0         0.242         0.913         0.622         1.780         2.000        -0.722
# tail1
# tail2
# header1
# header2
# header3
# header4
....

每个数据块有4条头行和2条尾行，在头行和尾行之间有10条不同列的数据行。因此，每个数据块总共有16行。具有相同格式的数据将重复，直到文件结束

我希望仅根据数据块内的第一列对这些数据进行排序，使示例如下所示：

# header1
# header2
# header3
# header4
 159 4 2 2275 4444 0         0.928         0.936         1.867         2.000        -0.745
 442 5 4 504 1979 3483 584 0         0.910         0.937         0.945         0.931         3.898         0.000         1.420
 504 4 2 2230 442 0         0.895         0.910         1.815         2.000        -0.769
 584 4 2 442 7135 0         0.931         0.813         1.748         2.000        -0.666
 1549 5 4 6293 1979 2256 4130 0         0.908         0.924         0.948         0.932         3.847         0.000         1.407
 5846 4 1 1579 0         0.943         0.944         2.004        -0.477
 6276 4 2 859 775 0         0.936         0.948         1.892         2.000        -0.836
 6311 4 3 5075 6225 5757 0         0.637         0.622         0.400         1.663         2.000        -0.729
 6329 4 3 6999 1927 5757 0         0.129         0.917         0.531         1.579         2.000        -0.739
 6381 4 2 2815 4471 0         0.934         0.925         1.861         2.000        -0.737
# tail1
# tail2
# header1
# header2
# header3
# header4
 454 5 4 7220 6363 1851 3638 0         0.704         0.913         0.941         0.935         3.575         0.000         1.150
 838 5 4 7146 2723 7250 2816 0         0.905         0.937         0.923         0.926         3.814         0.000         1.481
 877 5 4 111 887 1884 6108 0         0.916         0.913         0.938         0.937         3.787         0.000         1.450
 887 4 3 7135 877 5372 0         0.242         0.913         0.622         1.780         2.000        -0.722
 6104 4 2 2815 3250 0         0.933         0.926         1.866         2.000        -0.729
 7035 3 6 45 7395 7220 5576 7135 5046 0         0.320         0.182         0.586         0.721         0.295         0.759         2.864         0.000         1.239
 7135 3 5 584 7035 5576 887 5046 0         0.813         0.295         0.249         0.242         0.542         2.192         0.000         1.101
 7146 4 2 838 2830 0         0.905         0.927         1.844         2.000        -0.729
 7220 4 3 5892 7035 454 0         0.566         0.586         0.704         1.856         2.000        -0.724
 7395 3 6 45 576 2060 3326 7035 7263 0         0.685         0.341         0.493         0.594         0.182         0.256         2.692         0.000         1.128
# tail1
# tail2
# header1
# header2
# header3
# header4
....

换句话说，我希望将每5行排序到第14行，保留头行和尾行不变。对于每个数据块：

Line 1~4 = header = just print
Line 5~14 = data = sort by column #1 and print 
Line 15~16 = tail = just print 
....

对于单个数据块，我可以使用如下内容：

sort -gk1 data.txt > sorted_data.txt

但是对于具有多个头行和尾行的重复数据块，不确定我能做什么。我想我需要使用awk和NR来选择输入文件的目标数据，然后使用sort命令，但我找不到实现这一点的方法。

您可以在

awk

中执行此操作，但必须小心构建两个数组，一个包含数字行开头的数字，另一个包含整个数字行。在找到编号记录时，将数组按顺序排序，然后在找到第一个

“#tail”

行时输出保存完整行的存储数组。所有其他记录都只是输出，例如

awk '
    $1~/^[0-9]+/ {                          # rule1 - lines beginning with numbers
        for (i=1; i<=n && $1>a[i]; i++) {}  # scan forward until $1 sorts after a[i]
        for (j = n; j>=i; j--) {            # move existing elements up by 1 from i
            a[j+1] = a[j];
            b[j+1] = b[j]
        }
        a[i] = $1                           # add current record in sort order 
        b[i] = $0
        n++                                 # increment element count
        next                                # skip to next record
    }
    n && $1~/#tail/ {                       # rule2 - handle 1st #tail after numbers
        for (i=1; i<=n; i++)                # loop outputting sorted lines
            print b[i]
        print $0                            # print current #tail
        n=0                                 # zero array element count
        delete a                            # delete both arrays
        delete b
        next                                # skip to next record
    }
    { print }                               # rule3 - all other records, just print
' datablock

仔细检查一下，如果有问题，请告诉我。

根据karakfa的回答和Ed的评论：

awk'BEGIN{cmd=“sort-gk1”}/^[ht]/{close（cmd）；print；next}{print|cmd}data.txt

@tink No，它不会对每个数据块的输入数据进行排序。它对整个文件中的数据进行排序，但不考虑数据块。因此，一次对整个数据执行排序。“awk'BEGIN{cmd=“sort-gk1”}/^#/{close（cmd）；print；next}{print | cmd}'data.txt”这很有效，但您能写一个答案来解释每个部分的细节吗？例如，[ht]在你的答案中做了什么？此外，如果标题和尾部不使用#而只是随机文本，该怎么办？在这种情况下，我想我需要使用行号…它在您提供的代码段上按预期工作。如果你的实际数据不是这样的话，我帮不了你。

awk'BEGIN{cmd=“sort-gk1”}/^[ht]/{close（cmd）；print；next}{print | cmd}data.txt

。。没关系，还是个傻瓜。