Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/csharp-4.0/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Bash 使文件看起来不乱_Bash_Delimiter - Fatal编程技术网

Bash 使文件看起来不乱

Bash 使文件看起来不乱,bash,delimiter,Bash,Delimiter,我有一个文件看起来乱七八糟: contig_1 bin.0013 Rhizobium flavum (taxid 1335061) contig_2 Alphaproteobacteria (taxid 28211) contig_3 bin.009 contig_4 bin.008 unclassified (taxid 0) contig_5 bin.001 Fluviicoccus keume

我有一个文件看起来乱七八糟:

contig_1  bin.0013 Rhizobium           flavum    (taxid 1335061)
contig_2           Alphaproteobacteria (taxid    28211)
contig_3  bin.009
contig_4  bin.008  unclassified        (taxid    0)
contig_5  bin.001  Fluviicoccus        keumensis (taxid 1435465)
contig_12 bin.003
我希望它能够正确地显示以制表符分隔的列,并在其为空时显示零:

contig_1    bin.0013    Rhizobium flavum (taxid 1335061)
contig_2    0           Alphaproteobacteria (taxid 28211)
contig_3    bin.009     0
contig_4    bin.008     unclassified (taxid 0)
contig_5    bin.001     Fluviicoccus keumensis (taxid 1435465)
contig_12   bin.003     0
如果我使用像sed的/s/s这样的smth,那么除了1-2列和2-3列之外,/g文件名逗号会插入到任何地方。

如果您选择awk,请尝试以下操作:

awk -v OFS="\t" '
NR==FNR {
    # in the 1st pass, detect the starting positions of the 2nd field and the 3rd
    sub(" +$", "")      # it avoids misdetection due to extra trailing blanks
    if (match($0, "[^[:blank:]]+[[:blank:]]+")) {
        # RLENGTH holds the ending position of the 1st blank
        if (col2 == 0 || RLENGTH < col2) col2 = RLENGTH + 1
        if (match($0, "[^[:blank:]]+[[:blank:]]+[^[:blank:]]+[[:blank:]]+")) {
            # RLENGTH holds the ending position of the 2nd blank
            if (col3 == 0 || RLENGTH < col3) col3 = RLENGTH + 1
        }
    }
    next
}
{
    # in the 2nd pass, extract the substrings in the fixed position and reformat them
    # by removing extra spaces and putting "0" if the fiels is empty
    c1 = substr($0, 1, col2 - 1); sub(" +$", "", c1); if (c1 == "") c1 = "0"
    c2 = substr($0, col2, col3 - col2); sub(" +$", "", c2); if (c2 == "") c2 = "0"
    c3 = substr($0, col3); gsub(" +", " ", c3); if (c3 == "") c3 = "0"
#   print c1, c2, c3            # use this for the tab-separated output
    printf("%-12s%-12s%-s\n", c1, c2, c3)
}' file file
该过程包括两个过程。在第1遍中,它检测字段的起始位置。 在第二遍中,它使用在第一遍中计算的位置剪切各个字段。 我选择了printf来直观地对齐输出。可以切换到制表符分隔的值 取决于偏好。
试一下列命令我试过了,但没有解决问题。这并不像你想象的那么简单。查看您的第一个文件,我假设应该有5个字段config_1、bin.0013、Rhizobium、flavum、tax id。。。。但结果只有3个。是的,这来自输入文件,名称为Rhizobium flavum taxid 1335061,带有空格。虽然我希望有办法
contig_1    bin.0013    Rhizobium flavum (taxid 1335061)
contig_2    0           Alphaproteobacteria (taxid 28211)
contig_3    bin.009     0
contig_4    bin.008     unclassified (taxid 0)
contig_5    bin.001     Fluviicoccus keumensis (taxid 1435465)
contig_12   bin.003     0