使用'转换文本;sed';或';awk&x27;
我有一个非常大的输入集,看起来像这样:使用'转换文本;sed';或';awk&x27;,sed,awk,transform,Sed,Awk,Transform,我有一个非常大的输入集,看起来像这样: Label: foo, Other text: text description... <insert label> Item: item description... <insert label> Item: item description... Label: bar, Other text:... <insert label> Item:... Label: baz, Other text:...
Label: foo, Other text: text description...
<insert label> Item: item description...
<insert label> Item: item description...
Label: bar, Other text:...
<insert label> Item:...
Label: baz, Other text:...
<insert label> Item:...
<insert label> Item:...
<insert label> Item:...
...
awk '$1=="Label:" {label=$2; sub(/,$/, "", label);}
$1=="<insert" && $2=="label>" {$1=" "; $2=label;}
{print $0;}' file
这可以用sed、awk或其他unix工具完成吗?如果是这样,我可以怎么做?您可以这样使用awk:
Label: foo, Other text: text description...
<insert label> Item: item description...
<insert label> Item: item description...
Label: bar, Other text:...
<insert label> Item:...
Label: baz, Other text:...
<insert label> Item:...
<insert label> Item:...
<insert label> Item:...
...
awk '$1=="Label:" {label=$2; sub(/,$/, "", label);}
$1=="<insert" && $2=="label>" {$1=" "; $2=label;}
{print $0;}' file
awk'$1==“Label:{Label=$2;sub(/,$/,”“,Label);}
$1==“{$1=”“;$2=标签;}
{打印$0;}文件
使用sed的一种解决方案
:
## When line beginning with the 'label' string.
/^Label/ {
## Save content to 'hold space'.
h
## Get the string after the label (removing all other characters)
s/^[^ ]*\([^,]*\).*$/\1/
## Save it in 'hold space' and get the original content
## of the line (exchange contents).
x
## Print and read next line.
b
}
###--- Commented this wrong behaviour ---###
#--- G
#--- s/<[^>]*>\(.*\)\n\(.*\)$/\2\1/
###--- And fixed with this ---###
## When line begins with '<insert label>'
/<insert label>/ {
## Append the label name to the line.
G
## And substitute the '<insert label>' string with it.
s/<insert label>\(.*\)\n\(.*\)$/\2\1/
}
script.sed的内容
:
## When line beginning with the 'label' string.
/^Label/ {
## Save content to 'hold space'.
h
## Get the string after the label (removing all other characters)
s/^[^ ]*\([^,]*\).*$/\1/
## Save it in 'hold space' and get the original content
## of the line (exchange contents).
x
## Print and read next line.
b
}
###--- Commented this wrong behaviour ---###
#--- G
#--- s/<[^>]*>\(.*\)\n\(.*\)$/\2\1/
###--- And fixed with this ---###
## When line begins with '<insert label>'
/<insert label>/ {
## Append the label name to the line.
G
## And substitute the '<insert label>' string with it.
s/<insert label>\(.*\)\n\(.*\)$/\2\1/
}
像这样运行:
sed -f script.sed infile
结果:
Label: foo, Other text: text description...
foo Item: item description...
foo Item: item description...
Label: bar, Other text:...
bar Item:...
Label: baz, Other text:...
baz Item:...
baz Item:...
baz Item:...
这是我的label.awk文件:
/^Label:/ {
label = $2
sub(/,$/, "", label)
}
/<insert label>/ {
sub(/<insert label>/, label)
}
1
获取错误:
sed:2:script.sed:无效的命令代码I
。我是否使用了不同版本的sed
?@Manish:是。它是一个GNU扩展,用于忽略要匹配的字符串的大小写。已修改程序以匹配(包括大小写)确切的单词。现在可以使用,但如果文件中有非-“”行,则不能使用。我已将您的最后一行更改为/!s/\n.*/;s/\(.*)\n\(.*)$/\2\1/
来处理这个问题。(另外,让我们具体地匹配“”,文件中可能还有其他类似的“标记”)而不是更改最后一行,而是将最后两行更改为:/{G;s/\(.*)\n\(.*)$/\2\1/}
Awesome!谢谢大家。所有的答案都奏效了。遗憾的是,我只能接受其中一个,而这正是我所选择的。如果您要锚定该模式,您最好使用sub
而不是gsub
。你不需要在单引号中加上行继续符。@glennjackman:非常感谢你的建议和编辑。谢谢。