Unix 从文件头匹配的空格分隔文件中删除列_Unix_Sed_Awk

Unix 从文件头匹配的空格分隔文件中删除列

unix sed awk

Unix 从文件头匹配的空格分隔文件中删除列,unix,sed,awk,Unix,Sed,Awk,我有一个空格分隔的输入文本文件。我想使用sed或awk删除列标题为size的列输入文件： id quantity colour shape size colour shape size colour shape size 1 10 blue square 10 red triangle 8 pink circle 3 2 12 yellow pentagon 3 orange rectangle 9 purple oval 6 期望输出： id quantity colour shape c

我有一个空格分隔的输入文本文件。我想使用sed或awk删除列标题为size的列

输入文件：

id quantity colour shape size colour shape size colour shape size
1 10 blue square 10 red triangle 8 pink circle 3
2 12 yellow pentagon 3 orange rectangle 9 purple oval 6

期望输出：

id quantity colour shape colour shape colour shape
1 10 blue square red triangle pink circle
2 12 yellow pentagon orange rectangle purple oval

给定固定的文件格式：

cut -f 1-4,6-7,9-10 infile

给定固定的文件格式：

cut -f 1-4,6-7,9-10 infile

使用

awk

的通用解决方案。

开始

块中有一个硬编码变量（

列_至_删除

），用于指示要删除的字段的位置。然后，脚本将计算每个字段的宽度，并删除与变量位置匹配的字段

假设

infle

包含问题的内容和

script.awk

的以下内容：

BEGIN {
    ## Hard-coded positions of fields to delete. Separate them with spaces.
    columns_to_delete = "5 8 11"

    ## Save positions in an array to handle it better.
    split( columns_to_delete, arr_columns )
}


## Process header.
FNR == 1 { 

    ## Split header with a space followed by any non-space character.
    split( $0, h, /([[:space:]])([^[:space:]])/, seps )

    ## Use FIELDWIDTHS to handle fixed format of data. Set that variable with
    ## length of each field, taking into account spaces.
    for ( i = 1; i <= length( h ); i++ ) { 
        len = length( h[i] seps[i] )
        FIELDWIDTHS = FIELDWIDTHS " " (i == 1 ? --len : i == length( h ) ? ++len : len)
    }   

    ## Re-calculate fields with new FIELDWIDTHS variable.
    $0 = $0
}

## Process header too, and every line with data.
{
    ## Flag to know if 'p'rint to output a field.
    p = 1 

    ## Go throught all fields, if found in the array of columns to delete, reset
    ## the 'print' flag.
    for ( i = 1; i <= NF; i++ ) { 
        for ( j = 1; j <= length( arr_columns ); j++ ) { 
            if ( i == arr_columns[j] ) { 
                p = 0 
                break
            }   
        }   

        ## Check 'print' flag and print if set.
        if ( p ) { 
            printf "%s", $i
        }
        else {
            printf " " 
        }
        p = 1 
    }   
    printf "\n"
}

具有以下输出：

id  quantity colour shape    colour shape      colour  shape    
1   10       blue   square   red    triangle   pink    circle   
2   12       yellow pentagon orange rectangle  purple   oval

编辑：哦，刚才意识到输出不正确，因为两个字段之间存在连接。修复这将是太多的工作，因为在开始处理任何内容之前，需要检查每行的最大列大小。但通过这个剧本，我希望你能理解。现在没时间，也许我可以稍后再修，但不确定

编辑2：修复了为删除的每个字段添加额外空间的问题。这比预期的容易：-）

编辑3：参见注释

我修改了

BEGIN

块，以检查是否提供了一个额外的变量作为参数

BEGIN {
    ## Check if a variable 'delete_col' has been provided as argument.
    if ( ! delete_col ) { 
        printf "%s\n", "Usage: awk -v delete_col=\"column_name\" -f script.awk " ARGV[1]
        exit 0
    }   

}

并将计算要删除的列数的过程添加到

FNR==1

模式中：

## Process header.
FNR == 1 { 

    ## Find column position to delete given the name provided as argument.
    for ( i = 1; i <= NF; i++ ) { 
        if ( $i == delete_col ) { 
            columns_to_delete = columns_to_delete " " i
        }   
    }   

    ## Save positions in an array to handle it better.
    split( columns_to_delete, arr_columns )

    ## ...
    ## No modifications from here until the end. Same code as in the original script.
    ## ...
}

结果将是相同的。

使用

awk