Bash 在两个单独的字段中拆分列
总结 我有一个CSV文件,它被转换为.DAT。我有一个AWK文件,它假设要对DAT文件进行映射。AWK文件中的代码如下所示 DAT文件的内容如下所示(以制表符分隔): 我要做的是:Bash 在两个单独的字段中拆分列,bash,awk,Bash,Awk,总结 我有一个CSV文件,它被转换为.DAT。我有一个AWK文件,它假设要对DAT文件进行映射。AWK文件中的代码如下所示 DAT文件的内容如下所示(以制表符分隔): 我要做的是: 注释列必须分为两个不同的字段:STRING_COM和STRING_STATUS-done 值列应在“NUMB”中重命名-完成 将标题与列保持在一起,顺序为“完成” 未完成 4. if the VALUE is ":" then NUMB is null if the VALUE is ":" and COM
4. if the VALUE is ":" then NUMB is null
if the VALUE is ":" and COMMENT "c" then NUMB is null and STRING_COM is "c"
if the VALUE is ":" and COMMENT "u" then NUMB is null and STRING_STATUS is "u"
if the VALUE is "14,385" and COMMENT "d" then NUMB is "14385" and STRING(both) is null
if the VALUE is "14,385" and COMMENT "du" then NUMB is "14385" and STRING_STATUS is "u"
if the VALUE is ":" and COMMENT "cd" then NUMB is null and STRING_COM is "c"
if the VALUE is ":" and COMMENT "bc" then NUMB is null and STRING_COM is "c" and STRING_STATUS is "b"
if the VALUE is ":" and COMMENT "z" then NUMB is 0 and STRING_STATUS is "z"
BEGIN {
FS=","; OFS="\t";
a["ODT"]=1;a["AGE"]=1;a["CDT"]=1;a["CO"]=1;
a["SEX"]=1;a["TIME"]=1;a["VALUE"]=1;a["COMMENT"]=1;
}
NR==1 {
{ $a["VALUE"] = "NUMB" ; $a["COMMENT"] = "STRING_COM" ; $9 = "STRING_STATUS" ; print ; next }
$a["VALUE"]=="14,385" && $a["COMMENT"] == "d" { $a["VALUE"] = "14385" ; $a["COMMENT"] = $9 = "" }
$a["VALUE"]=="14,385" && $a["COMMENT"] == "du" { $a["VALUE"] = "14385" ; $a["COMMENT"] = "" ; $9 = "u" }
$a["VALUE"] != ":" { print ; next }
$a["COMMENT"] == "z" { $a["VALUE"] = "0" ; $a["COMMENT"] = "" ; $9 = "z" }
$a["COMMENT"] != "z" { $a["VALUE"] = "" }
$NF=substr($NF,1,length($NF)-1);
for(i=1;i<=NF;i++) if($i in a) a[$i]=i;
}
{ print $a["ODT"],$a["AGE"],$a["CDT"],$a["CO"],$a["SEX"],$a["TIME"],NR==1?"NUMB":$a["VALUE"],
NR==1?"STRING_COM"OFS"STRING_STATUS":($a["COMMENT"]?""OFS$a["COMMENT"]:$a["COMMENT"]);
}
先谢谢你
我已经按照你的建议更新了代码,但它不起作用。只有错误
这就是你的意思吗?我通常会采取的方法是添加更多的条件块,同样对于已经实现的规则
BEGIN {
FS=","; OFS="\t";
}
NR==1 { $7 = "NUMB" ; $8 = "STRING_COM" ; $9 = "STRING_STATUS" ; print ; next }
$7=="14,385" && $8 == "d" { $7 = "14385" ; $8 = $9 = "" }
$7=="14,385" && $8 == "du" { $7 = "14385" ; $8 = "" ; $9 = "u" }
$7 != ":" { print ; next }
$8 == "z" { $7 = "0" ; $8 = "" ; $9 = "z" }
$8 != "z" { $7 = "" }
...
{ print }
它可能遗漏了一些您的代码已经解决的问题,而我没有完全理解,但这正是我构建脚本的精神所在
假设数组a
应该能够容纳字段顺序混乱的输入,您可以
- 使用数组字段而不是编号字段或
- 在管道中运行预处理器,以按正确顺序排列字段
$9
是您新填充的STRING\u STATUS
字段。该提案的真正目的是以文字的方式实施您的规则,并减少一些弯路。它需要修补,是的。$a
永远不能用于一件事。它只是a
,不是$
前缀。此外,我不确定如何处理>STRING\u COM
和STRING\u STATUS
在您的模型中工作。不过,无论如何,您都必须用适当的数组表达式替换我的$9
。我怀疑您想从注释字段和a[“STRING\u STATUS]初始化a[/code>
为空…尽管我不清楚您的实际目标语义。
ODT AGE CDT CO SEX TIME NUMB COMMENT
P3 Y6-8 AWT EE F 2011 1297
P4 Y3-4 ESP RR M 2011 6940 cd
P1 Y7-9 UDK FF F 2011 : du
PL Y3-9 EUP SS F 2011 : d
P9 Y_5 ACT DD F 2011 : cd
P6 Y5-9 UAK DF M 2011 : z
ODT AGE CDT CO SEX TIME NUMB STRING_COM STRING_STATUS
P3 Y6-8 AWT EE F 2011 1297
P4 Y3-4 ESP RR M 2011 6940 c
P1 Y7-9 UDK FF F 2011 u
PL Y3-9 EUP SS F 2011
P9 Y_5 ACT DD F 2011 c
P6 Y5-9 UAK DF M 2011 0 z
BEGIN {
FS=","; OFS="\t";
}
NR==1 { $7 = "NUMB" ; $8 = "STRING_COM" ; $9 = "STRING_STATUS" ; print ; next }
$7=="14,385" && $8 == "d" { $7 = "14385" ; $8 = $9 = "" }
$7=="14,385" && $8 == "du" { $7 = "14385" ; $8 = "" ; $9 = "u" }
$7 != ":" { print ; next }
$8 == "z" { $7 = "0" ; $8 = "" ; $9 = "z" }
$8 != "z" { $7 = "" }
...
{ print }