使用'转换文本；sed'；或'；awk&x27；_Sed_Awk_Transform

使用'转换文本；sed'；或'；awk&x27；

sed awk

使用'转换文本；sed'；或'；awk&x27；,sed,awk,transform,Sed,Awk,Transform,我有一个非常大的输入集，看起来像这样： Label: foo, Other text: text description... <insert label> Item: item description... <insert label> Item: item description... Label: bar, Other text:... <insert label> Item:... Label: baz, Other text:...

我有一个非常大的输入集，看起来像这样：

Label: foo, Other text: text description...
   <insert label> Item: item description...
   <insert label> Item: item description...
Label: bar, Other text:...
   <insert label> Item:...
Label: baz, Other text:...
   <insert label> Item:...
   <insert label> Item:...
   <insert label> Item:...
...

awk '$1=="Label:" {label=$2; sub(/,$/, "", label);} 
     $1=="<insert" && $2=="label>" {$1=" "; $2=label;}
     {print $0;}' file

这可以用sed、awk或其他unix工具完成吗？如果是这样，我可以怎么做？

您可以这样使用awk：

Label: foo, Other text: text description...
   <insert label> Item: item description...
   <insert label> Item: item description...
Label: bar, Other text:...
   <insert label> Item:...
Label: baz, Other text:...
   <insert label> Item:...
   <insert label> Item:...
   <insert label> Item:...
...

awk '$1=="Label:" {label=$2; sub(/,$/, "", label);} 
     $1=="<insert" && $2=="label>" {$1=" "; $2=label;}
     {print $0;}' file

awk'$1==“Label:{Label=$2；sub（/，$/，”“，Label）；}
$1==“{$1=”“；$2=标签；}
{打印$0；}文件

使用

sed的一种解决方案

：

## When line beginning with the 'label' string.
/^Label/ {
    ## Save content to 'hold space'.
    h   

    ## Get the string after the label (removing all other characters)
    s/^[^ ]*\([^,]*\).*$/\1/

    ## Save it in 'hold space' and get the original content
    ## of the line (exchange contents).
    x   

    ## Print and read next line.
    b   
}
###--- Commented this wrong behaviour ---###    
#--- G
#--- s/<[^>]*>\(.*\)\n\(.*\)$/\2\1/

###--- And fixed with this ---###
## When line begins with '<insert label>'
/<insert label>/ {
    ## Append the label name to the line.
    G   

    ## And substitute the '<insert label>' string with it.
    s/<insert label>\(.*\)\n\(.*\)$/\2\1/
}

script.sed的内容

：

## When line beginning with the 'label' string.
/^Label/ {
    ## Save content to 'hold space'.
    h   

    ## Get the string after the label (removing all other characters)
    s/^[^ ]*\([^,]*\).*$/\1/

    ## Save it in 'hold space' and get the original content
    ## of the line (exchange contents).
    x   

    ## Print and read next line.
    b   
}
###--- Commented this wrong behaviour ---###    
#--- G
#--- s/<[^>]*>\(.*\)\n\(.*\)$/\2\1/

###--- And fixed with this ---###
## When line begins with '<insert label>'
/<insert label>/ {
    ## Append the label name to the line.
    G   

    ## And substitute the '<insert label>' string with it.
    s/<insert label>\(.*\)\n\(.*\)$/\2\1/
}

像这样运行：

sed -f script.sed infile

结果：

Label: foo, Other text: text description...
    foo Item: item description...
    foo Item: item description...
Label: bar, Other text:...
    bar Item:...
Label: baz, Other text:...
    baz Item:...
    baz Item:...
    baz Item:...

这是我的label.awk文件：

/^Label:/ {
    label = $2
    sub(/,$/, "", label)
}

/<insert label>/ {
    sub(/<insert label>/, label)
}

1

获取错误：

sed:2:script.sed:无效的命令代码I

。我是否使用了不同版本的

sed

？@Manish:是。它是一个GNU扩展，用于忽略要匹配的字符串的大小写。已修改程序以匹配（包括大小写）确切的单词。现在可以使用，但如果文件中有非-“”行，则不能使用。我已将您的最后一行更改为

/！s/\n.*/；s/\（.*）\n\（.*）$/\2\1/

来处理这个问题。（另外，让我们具体地匹配“”，文件中可能还有其他类似的“标记”）而不是更改最后一行，而是将最后两行更改为：

/{G；s/\（.*）\n\（.*）$/\2\1/}

Awesome！谢谢大家。所有的答案都奏效了。遗憾的是，我只能接受其中一个，而这正是我所选择的。如果您要锚定该模式，您最好使用

sub

而不是

gsub

。你不需要在单引号中加上行继续符。@glennjackman:非常感谢你的建议和编辑。谢谢。