字符串之间的Unix打印模式_Unix_Awk_Sed_Pattern Matching

字符串之间的Unix打印模式

unix awk sed

字符串之间的Unix打印模式,unix,awk,sed,pattern-matching,Unix,Awk,Sed,Pattern Matching,我有一个文件，其中有如下内容开始和停止代表一个区块 START X | 123 Y | abc Z | +=- STOP START X | 456 Z | +%$ STOP START X | 789 Y | ghi Z | !@# STOP 我希望将每个块的X和Y值按以下格式打印： 123 ~~ abc 456 ~~ 789 ~~ ghi 如果是单次出现的START/STOP，sed-n'/START/，/STOP/p'。由于这是重复的，我需要您的帮助。对于任何涉及处理多行的问题，Se

我有一个文件，其中有如下内容<代码>开始和停止代表一个区块

START
X | 123
Y | abc
Z | +=-
STOP
START
X | 456
Z | +%$
STOP
START
X | 789
Y | ghi
Z | !@#
STOP

我希望将每个块的

和

值按以下格式打印：

123 ~~ abc
456 ~~ 
789 ~~ ghi

如果是单次出现的

START

STOP

，

sed-n'/START/，/STOP/p'

。由于这是重复的，我需要您的帮助。

对于任何涉及处理多行的问题，Sed总是错误的选择。在20世纪70年代中期，当awk被发明时，sed所有的神秘构造都已经过时了

每当您的输入中有名称-值对时，我发现创建一个数组将每个名称映射到它的值，然后按名称访问数组是很有用的。在这种情况下，将GNU awk用于多字符RS和删除数组：

$ cat tst.awk
BEGIN {
    RS = "\nSTOP\n"
    OFS=" ~~ "
}
{
    delete n2v
    for (i=2;i<=NF;i+=3) {
        n2v[$i] = $(i+2)
    }
    print n2v["X"], n2v["Y"]
}

$ gawk -f tst.awk file
123 ~~ abc
456 ~~ 
789 ~~ ghi

$cat tst.awk
开始{
RS=“\n停止\n”
OFS=“~”
}
{
删除n2v
对于（i=2；iSed）来说，对于任何涉及处理多行的问题，Sed总是错误的选择。Sed所有用于处理多行的神秘构造在20世纪70年代中期awk发明时就已经过时了
每当您的输入中有名称-值对时，我发现创建一个数组将每个名称映射到它的值，然后通过名称访问数组是很有用的。在这种情况下，使用GNU awk进行多字符和删除数组：
$ cat tst.awk
BEGIN {
    RS = "\nSTOP\n"
    OFS=" ~~ "
}
{
    delete n2v
    for (i=2;i<=NF;i+=3) {
        n2v[$i] = $(i+2)
    }
    print n2v["X"], n2v["Y"]
}

$ gawk -f tst.awk file
123 ~~ abc
456 ~~ 
789 ~~ ghi

$cat tst.awk
开始{
RS=“\n停止\n”
OFS=“~”
}
{
删除n2v
对于（i=2；i，基于我自己的解决方案：
试验
基于我自己的解决方案：
试验
因为我喜欢脑筋急转弯（不是因为这类事情在sed中很实用），一个可能的sed解决方案是
sed -n '/START/,/STOP/ { //!H; // { g; /^$/! { s/.*\nX | \([^\n]*\).*/\1 ~~/; ta; s/.*/~~/; :a G; s/\n.*Y | \([^\n]*\).*/ \1/; s/\n.*//; p; s/.*//; h } } }'

这项工作如下：
/START/,/STOP/ {                        # between two start and stop lines
  //! H                                 # assemble the lines in the hold buffer
                                        # note that // repeats the previously
                                        # matched pattern, so // matches the
                                        # start and end lines, //! all others.

  // {                                  # At the end
    g                                   # That is: When it is one of the
    /^$/! {                             # boundary lines and the hold buffer
                                        # is not empty

      s/.*\nX | \([^\n]*\).*/\1 ~~/     # isolate the X value, append ~~

      ta                                # if there is no X value, just use ~~
      s/.*/~~/
      :a 

      G                                 # append the hold buffer to that
      s/\n.*Y | \([^\n]*\).*/ \1/       # and isolate the Y value so that
                                        # the pattern space contains X ~~ Y

      s/\n.*//                          # Cutting off everything after a newline
                                        # is important if there is no Y value
                                        # and the previous substitution did
                                        # nothing

      p                                 # print the result

      s/.*//                            # and make sure the hold buffer is
      h                                 # empty for the next block.
    }
  }
}

因为我喜欢脑筋急转弯（不是因为这类事情在sed中很实用），一个可能的sed解决方案是
sed -n '/START/,/STOP/ { //!H; // { g; /^$/! { s/.*\nX | \([^\n]*\).*/\1 ~~/; ta; s/.*/~~/; :a G; s/\n.*Y | \([^\n]*\).*/ \1/; s/\n.*//; p; s/.*//; h } } }'

这项工作如下：
/START/,/STOP/ {                        # between two start and stop lines
  //! H                                 # assemble the lines in the hold buffer
                                        # note that // repeats the previously
                                        # matched pattern, so // matches the
                                        # start and end lines, //! all others.

  // {                                  # At the end
    g                                   # That is: When it is one of the
    /^$/! {                             # boundary lines and the hold buffer
                                        # is not empty

      s/.*\nX | \([^\n]*\).*/\1 ~~/     # isolate the X value, append ~~

      ta                                # if there is no X value, just use ~~
      s/.*/~~/
      :a 

      G                                 # append the hold buffer to that
      s/\n.*Y | \([^\n]*\).*/ \1/       # and isolate the Y value so that
                                        # the pattern space contains X ~~ Y

      s/\n.*//                          # Cutting off everything after a newline
                                        # is important if there is no Y value
                                        # and the previous substitution did
                                        # nothing

      p                                 # print the result

      s/.*//                            # and make sure the hold buffer is
      h                                 # empty for the next block.
    }
  }
}

我喜欢将值存储在一个数组中的想法，+1和道德+1也可以添加一个解释：）！哈哈，有趣的是，你最终为解释你的答案而道歉；）是的，读起来确实很有用。我喜欢将值存储在数组中的想法，+1和道德+1，因为它还添加了一个解释：）！哈哈，很有趣，你最终为解释你的答案而道歉；）是的，读我能说什么确实很有用。我得到了一些答案。谢谢大家。有一个示例一段数据，Wintermute解决方案需要0m0.151s，Ed Morton需要0m0.160s，fedorqui需要0m0.163s。再次感谢大家，sed和awk解决方案之间的执行速度永远不会是一个问题。只需尝试修改其中一个，例如，为每一行读取打印一条调试语句，或者在执行的次数结束时打印一个计数nd一个“Y”或….这里有l
命令用于此。：P但是，认真地说，你会想使用一种awk解决方案。我不同意awk总是更好（主要是因为它没有反向引用），但这里没有竞争。我的意思是，看看这个，看看@fedorqui的解决方案。其中一个是人类可读的，另一个是我的。你不想在7%的运行时间内引入不可维护的代码。我写这篇文章是为了好玩。我能说什么呢？我得到了一些答案。谢谢大家。有了一个数据示例，Wintermute解决方案需要0.151s，Ed morton需要0m0.160s，fedorqui需要0m0.163s。再次感谢大家sed和awk解决方案之间的执行速度永远不会成为问题。只需尝试修改其中一个，例如，为每一行读取打印一条调试语句，或在找到“Y”的次数结束时打印计数或者……这里有l
命令来实现这一点。：P不过，认真地说，您可能希望使用一种awk解决方案。我不同意awk总是更好（主要是因为它没有反向引用），但这里没有竞争。我的意思是，看看这个，再看看@fedorqui的解决方案。其中一个是人类可读的，另一个是我的。你不想在7%的运行时间内引入无法维护的代码。我写这个是为了好玩。