Awk 用多个数组连接字符串_Awk

Awk 用多个数组连接字符串

awk

Awk 用多个数组连接字符串,awk,Awk,我正在尝试将特定字符串重新排列到相应的列中。这是输入 String 1: 47/13528 String 2: 55(s) String 3: String 4: 114(n) String 5: 225(s), 26/10533-10541 String 6: 103/13519 String 7: 10(s), 162(n) String 8: 152/12345,12346 (d=dead, n=null, s=strike) 每个值中的字母表都是标志

我正在尝试将特定字符串重新排列到相应的列中。这是输入

String 1:  47/13528 
String 2:  55(s) 
String 3:   
String 4:  114(n) 
String 5:  225(s), 26/10533-10541 
String 6:  103/13519 
String 7:  10(s), 162(n) 
String 8:  152/12345,12346
(d=dead, n=null, s=strike)

每个值中的字母表都是标志（d=死，n=null，s=罢工）。值（数字）为“String 1”的字符串将为47c1 等：

我试图通过修改以前的代码来解析它，似乎运气不好

{
    for (i=1; i<=NF; i++) {
        num  = $i+0
        abbr = $i
        gsub(/[^[:alpha:]]/,"",abbr)
        list[abbr] = list[abbr] num " c " val ORS
    }
}
END {
    n = split("dead null strike",types)
    for (i=1; i<=n; i++) {
        name = types[i]
        abbr = substr(name,1,1)
        printf "name,list[abbr]\n" 
    }
}

交叉检查的细目：

dead
none 

null
47c1@SP13528;114c4;103c6@SP13519;162c7;152c8@SP12345;152c8@SP12346;26c5@SP10533-10541;;162c7

strike
55c2;225c5;10c7

我通常的做法是：

首先对数据进行预处理，使一行上有一个信息

然后对数据进行预处理，使其在一列中按行显示一个信息

然后就很容易了——只需在

awk

中的某个数组中累积列并打印它们

以下代码：

cat <<EOF |
String 1:  47/13528 
String 2:  55(s) 
String 3:   
String 4:  114(n) 
String 5:  225(s), 26/10533-10541 
String 6:  103/13519 
String 7:  10(s), 162(n) 
String 8:  152/12345,12346
(d=dead, n=null, s=strike) 
EOF
sed '
    # filter only lines with String
    /^String \([0-9]*\): */!d;
    # Remove the String
    # Remove the : and spaces
    s//\1 /
    # remove trailing spaces
    s/ *$//
    # Remove lines with nothing
    /^[0-9]* *$/d
    # remove the commas and split lines on comma
    # by moving them to separate lines
    # repeat that until a comma is found
    : a
    /\([0-9]*\) \(.*\), *\(.*\)/{
        s//\1 \2\n\1 \3/
        ba
    }
' | sed '
    # we should be having two fields here
    # separated by a single space
    /^[^ ]* [^ ]*$/!{
        s/.*/ERROR: "&"/
        q1
    }
    # Move the name in braces to separate column
    /(\(.\))$/{
        s// \1/
        b not
    } ; {
        # default is n
        s/$/ n/
    } ; : not
    # shuffle first and second field
    # to that <num>c<num>(@SP<something>)? format
    # if second field has a "/"
    \~^\([0-9]*\) \([0-9]*\)/\([^ ]*\)~{
        # then add a SP
        s//\2c\1@SP\3/
        b not2
    } ; {
        # otherwise just do a "c" between
        s/\([0-9]*\) \([0-9]*\)/\2c\1/
    } ; : not2
' |
sort -n -k1 |
# now it's trivial
awk '
{ 
    out[$2] = out[$2] (!length(out[$2])?"":";") $1
}

function outputit(name, idx) {
    print name
    if (length(out[idx]) == 0) {
        print "none"
    } else {
        print out[idx]
    }
    printf "\n"
}

END{
    outputit("dead", "d")
    outputit("null", "n")
    outputit("strike", "s")
}
'

我相信输出与您的排序顺序相匹配，

分隔列表，您似乎先对第一列进行排序，然后再对第二列进行排序，我只是用排序
这里有一个用于解析文件的awk脚本
BEGIN {
    types["d"]; types["n"]; types["s"]
    deft = "n"; OFS = ","; sep = ";"
}

$1=="String" {
    gsub(/[)(]/,""); gsub(",", " ")    # general line subs
    for (i=3;i<=NF;i++) {
        if (!gsub("/","c"$2+0"@SP", $i)) $i = $i"c"$2+0    # make all subs on items
        for (t in types) { if (gsub(t, "", $i)) { x=t; break }; x=deft } #find type
        items[x] = items[x]? items[x] sep $i: $i    # append for type found
    }
}

END {
    print "dead" OFS "null" OFS "strike"
    print items["d"] OFS items["n"] OFS items["s"]
}

输出：
> awk -f tst.awk file
dead,null,strike
,47c1@SP13528;114c4;26c5@SP10533-10541;103c6@SP13519;162c7;152c8@SP12345;12346c8,55c2;225c5;10c7


你的描述在一些重要细节上发生了变化，比如我们如何决定一个项目的类型或者它们是如何分开的，直到现在你的输入和输出都与之不一致，但总的来说，我认为你可以很容易地在这个脚本中做些什么。请记住，gsub（）
会返回所做替换的次数，同时也会返回替换次数，因此很多时候使用它作为条件是很方便的。
请您也以代码的形式发布您的努力，然后让我们知道。另外，您的示例不清楚，所以请添加您如何获得预期输出的逻辑。谢谢。调试您的脚本并不困难。首先，您会看到它按字面意思打印name，list[abbr]
，因此您必须删除引号才能打印变量。然后您可以看到它打印的内容，如果它不是所需的输出，您可以在循环中添加一些print
语句，以查看对一行执行的操作。此行字符串7:10（s），162（n）
为s
创建一个项目，为n
创建一个项目。这一行字符串5:225（s），26/10533-10541
应该为s
和n
分别设置一个项目，正如您所说的，当不存在类型时，n
是默认项。但在您的示例输出中，此行为s
（基于第一个s
？）生成了两个项目，这使得您的描述无法确定您真正想要的输出。很抱歉，错过了这一项，预期的输出已编辑，但我在输出中看到两个值排序为s
，不符合您的描述：26c5@SP10533-10541
和162c7
。非常感谢您，您帮了我很大的忙。“完成部分”是做什么的。你能给我指一本参考手册吗？这不是一个术语（我会按照你说的改变这个评论）。我的意思是“完成它们”。在该行之后，这些项目将作为字符串准备就绪，如果存在，还将添加一个n
或s
。第一个gsub（）。所以if！（gsub（））
表示如果没有更换，则表示此项目与26/1234
不同，但与25n相似，因此附加c号部分。通过这一行，我们一次完成所有替换。此外，GNU awk字符串函数参考非常有用，因为我们刚刚意识到152c8@SP12345;12346c8应该是，152c8@SP12345;152c8@SP12346，但由于不一致，我又犯了一次错误。无论如何，谢谢你抽出时间。
dead
none

null
26c5@SP10533-10541;47c1@SP13528;103c6@SP13519;114c4;152c8@SP12345;162c7;12346c8

strike
10c7;55c2;225c5

BEGIN {
    types["d"]; types["n"]; types["s"]
    deft = "n"; OFS = ","; sep = ";"
}

$1=="String" {
    gsub(/[)(]/,""); gsub(",", " ")    # general line subs
    for (i=3;i<=NF;i++) {
        if (!gsub("/","c"$2+0"@SP", $i)) $i = $i"c"$2+0    # make all subs on items
        for (t in types) { if (gsub(t, "", $i)) { x=t; break }; x=deft } #find type
        items[x] = items[x]? items[x] sep $i: $i    # append for type found
    }
}

END {
    print "dead" OFS "null" OFS "strike"
    print items["d"] OFS items["n"] OFS items["s"]
}

String 1:  47/13528 
String 2:  55(s) 
String 3:   
String 4:  114(n) 
String 5:  225(s), 26/10533-10541 
String 6:  103/13519 
String 7:  10(s), 162(n) 
String 8:  152/12345,12346
(d=dead, n=null, s=strike) 

> awk -f tst.awk file
dead,null,strike
,47c1@SP13528;114c4;26c5@SP10533-10541;103c6@SP13519;162c7;152c8@SP12345;12346c8,55c2;225c5;10c7