使用'转换文本;sed';或';awk&x27;

使用'转换文本;sed';或';awk&x27;,sed,awk,transform,Sed,Awk,Transform,我有一个非常大的输入集,看起来像这样: Label: foo, Other text: text description... <insert label> Item: item description... <insert label> Item: item description... Label: bar, Other text:... <insert label> Item:... Label: baz, Other text:...

我有一个非常大的输入集,看起来像这样:

Label: foo, Other text: text description...
   <insert label> Item: item description...
   <insert label> Item: item description...
Label: bar, Other text:...
   <insert label> Item:...
Label: baz, Other text:...
   <insert label> Item:...
   <insert label> Item:...
   <insert label> Item:...
...
awk '$1=="Label:" {label=$2; sub(/,$/, "", label);} 
     $1=="<insert" && $2=="label>" {$1=" "; $2=label;}
     {print $0;}' file

这可以用sed、awk或其他unix工具完成吗?如果是这样,我可以怎么做?

您可以这样使用awk:

Label: foo, Other text: text description...
   <insert label> Item: item description...
   <insert label> Item: item description...
Label: bar, Other text:...
   <insert label> Item:...
Label: baz, Other text:...
   <insert label> Item:...
   <insert label> Item:...
   <insert label> Item:...
...
awk '$1=="Label:" {label=$2; sub(/,$/, "", label);} 
     $1=="<insert" && $2=="label>" {$1=" "; $2=label;}
     {print $0;}' file
awk'$1==“Label:{Label=$2;sub(/,$/,”“,Label);}
$1==“{$1=”“;$2=标签;}
{打印$0;}文件

使用
sed的一种解决方案

## When line beginning with the 'label' string.
/^Label/ {
    ## Save content to 'hold space'.
    h   

    ## Get the string after the label (removing all other characters)
    s/^[^ ]*\([^,]*\).*$/\1/

    ## Save it in 'hold space' and get the original content
    ## of the line (exchange contents).
    x   

    ## Print and read next line.
    b   
}
###--- Commented this wrong behaviour ---###    
#--- G
#--- s/<[^>]*>\(.*\)\n\(.*\)$/\2\1/

###--- And fixed with this ---###
## When line begins with '<insert label>'
/<insert label>/ {
    ## Append the label name to the line.
    G   

    ## And substitute the '<insert label>' string with it.
    s/<insert label>\(.*\)\n\(.*\)$/\2\1/
}
script.sed的内容

## When line beginning with the 'label' string.
/^Label/ {
    ## Save content to 'hold space'.
    h   

    ## Get the string after the label (removing all other characters)
    s/^[^ ]*\([^,]*\).*$/\1/

    ## Save it in 'hold space' and get the original content
    ## of the line (exchange contents).
    x   

    ## Print and read next line.
    b   
}
###--- Commented this wrong behaviour ---###    
#--- G
#--- s/<[^>]*>\(.*\)\n\(.*\)$/\2\1/

###--- And fixed with this ---###
## When line begins with '<insert label>'
/<insert label>/ {
    ## Append the label name to the line.
    G   

    ## And substitute the '<insert label>' string with it.
    s/<insert label>\(.*\)\n\(.*\)$/\2\1/
}
像这样运行:

sed -f script.sed infile
结果:

Label: foo, Other text: text description...
    foo Item: item description...
    foo Item: item description...
Label: bar, Other text:...
    bar Item:...
Label: baz, Other text:...
    baz Item:...
    baz Item:...
    baz Item:...

这是我的label.awk文件:

/^Label:/ {
    label = $2
    sub(/,$/, "", label)
}

/<insert label>/ {
    sub(/<insert label>/, label)
}

1

获取错误:
sed:2:script.sed:无效的命令代码I
。我是否使用了不同版本的
sed
?@Manish:是。它是一个GNU扩展,用于忽略要匹配的字符串的大小写。已修改程序以匹配(包括大小写)确切的单词。现在可以使用,但如果文件中有非-“”行,则不能使用。我已将您的最后一行更改为
/!s/\n.*/;s/\(.*)\n\(.*)$/\2\1/
来处理这个问题。(另外,让我们具体地匹配“”,文件中可能还有其他类似的“标记”)而不是更改最后一行,而是将最后两行更改为:
/{G;s/\(.*)\n\(.*)$/\2\1/}
Awesome!谢谢大家。所有的答案都奏效了。遗憾的是,我只能接受其中一个,而这正是我所选择的。如果您要锚定该模式,您最好使用
sub
而不是
gsub
。你不需要在单引号中加上行继续符。@glennjackman:非常感谢你的建议和编辑。谢谢。