Awk 我有一个可以工作的脚本代码,但是如何使这个脚本代码更“优雅”?

Awk 我有一个可以工作的脚本代码,但是如何使这个脚本代码更“优雅”?,awk,Awk,一些背景。我有两个文件A和B,其中包含我需要提取的数据 对于文件A,我只需要最后两行,如下所示: RMM: 17 -0.221674395053E+01 0.59892E-04 0.00000E+00 31 0.259E-03 1 F= -.22167440E+01 E0= -.22167440E+01 d E =-.398708E-10 mag= 2.0000 Total CPU time used (se

一些背景。我有两个文件A和B,其中包含我需要提取的数据

对于文件A,我只需要最后两行,如下所示:

RMM:  17    -0.221674395053E+01    0.59892E-04    0.00000E+00    31   0.259E-03
    1 F= -.22167440E+01 E0= -.22167440E+01  d E =-.398708E-10  mag=     2.0000
                  Total CPU time used (sec):        0.364
                        User time (sec):        0.355
                      System time (sec):        0.009
                     Elapsed time (sec):        1.423

               Maximum memory used (kb):        9896.
               Average memory used (kb):           0.

                      Minor page faults:         2761
                      Major page faults:            4
             Voluntary context switches:           24
mainfolder1[tab/space]subfolder1[tab/space][all the extracted info separated by tab]
mainfolder2[tab/space]subfolder2[tab/space][all the extracted info separated by tab]
mainfolder3[tab/space]subfolder3[tab/space][all the extracted info separated by tab]
...
mainfoldern[tab/space]subfoldern[tab/space][all the extracted info separated by tab]
我需要提取以下数字:

-1st Line, 2nd field (17)
-1st Line 4th field (0.59892E-04)
-2nd Line, 1st field (1)
-2nd Line, 3rd field (-.22167440E+01)
-2nd Line, 5th field (-.22167440E+01)
-2nd Line, 8th field (-.398708E-10)
-2nd Line, 10th field (2.0000)
 -1st line, 6th field (0.364)
 -2nd line, 4th field (0.355)
 -3rd line, 4th field (0.009)
 -4th line, 4th field (1.423)
 -6th line, 5th field (9896.)
 -7th line, 5th field (0.)
对于文件B,我只需要最后11行,如下所示:

RMM:  17    -0.221674395053E+01    0.59892E-04    0.00000E+00    31   0.259E-03
    1 F= -.22167440E+01 E0= -.22167440E+01  d E =-.398708E-10  mag=     2.0000
                  Total CPU time used (sec):        0.364
                        User time (sec):        0.355
                      System time (sec):        0.009
                     Elapsed time (sec):        1.423

               Maximum memory used (kb):        9896.
               Average memory used (kb):           0.

                      Minor page faults:         2761
                      Major page faults:            4
             Voluntary context switches:           24
mainfolder1[tab/space]subfolder1[tab/space][all the extracted info separated by tab]
mainfolder2[tab/space]subfolder2[tab/space][all the extracted info separated by tab]
mainfolder3[tab/space]subfolder3[tab/space][all the extracted info separated by tab]
...
mainfoldern[tab/space]subfoldern[tab/space][all the extracted info separated by tab]
我需要提取以下数字:

-1st Line, 2nd field (17)
-1st Line 4th field (0.59892E-04)
-2nd Line, 1st field (1)
-2nd Line, 3rd field (-.22167440E+01)
-2nd Line, 5th field (-.22167440E+01)
-2nd Line, 8th field (-.398708E-10)
-2nd Line, 10th field (2.0000)
 -1st line, 6th field (0.364)
 -2nd line, 4th field (0.355)
 -3rd line, 4th field (0.009)
 -4th line, 4th field (1.423)
 -6th line, 5th field (9896.)
 -7th line, 5th field (0.)
我的输出应该是这样的:

RMM:  17    -0.221674395053E+01    0.59892E-04    0.00000E+00    31   0.259E-03
    1 F= -.22167440E+01 E0= -.22167440E+01  d E =-.398708E-10  mag=     2.0000
                  Total CPU time used (sec):        0.364
                        User time (sec):        0.355
                      System time (sec):        0.009
                     Elapsed time (sec):        1.423

               Maximum memory used (kb):        9896.
               Average memory used (kb):           0.

                      Minor page faults:         2761
                      Major page faults:            4
             Voluntary context switches:           24
mainfolder1[tab/space]subfolder1[tab/space][all the extracted info separated by tab]
mainfolder2[tab/space]subfolder2[tab/space][all the extracted info separated by tab]
mainfolder3[tab/space]subfolder3[tab/space][all the extracted info separated by tab]
...
mainfoldern[tab/space]subfoldern[tab/space][all the extracted info separated by tab]
下面是我的脚本代码:

for m in ./*/; do
main=$(basename "$m")
for s in "$m"*/; do
    sub=$(basename "$s")
vdata=$(tail -n2 ./$main/$sub/A | awk -F'[ =]+' NR==1'{a=$2;b=$4;next}{print s,a,$2,$4,$6,$9, $11}')
ctime=$(tail -n11 ./$main/$sub/B |head -n1|awk '{print $6}')
utime=$(tail -n10 ./$main/$sub/B |head -n1|awk '{print $4}')
stime=$(tail -n9 ./$main/$sub/B |head -n1|awk '{print $4}')
etime=$(tail -n8 ./$main/$sub/B |head -n1|awk '{print $4}')
maxmem=$(tail -n6 ./$main/$sub/B |head -n1|awk '{print $5}')
avemem=$(tail -n5 ./$main/$sub/B |head -n1|awk '{print $5}')
c=$(echo $sub| cut -c 2-)
    echo "$m $c $vdata $ctime $utime $stime $etime $maxmem $avemem"
done
done > output

现在,第四行,vdata部分,实际上是前一个论坛问题中的循环行。我不完全明白。我希望我的文件B代码和文件a的awk代码一样优雅。我该怎么做?谢谢

对于文件B,请尝试以下操作:

tail -n11 B | awk -F':' '{ print $2 }'
array=($(tail -n11 B | awk -F':' '{ print $2 }'))
for value in "${array[@]}"
do
    echo $value
done
for m in ./*/; do
  main=$(basename "$m")
  for s in "$m"*/; do
    sub=$(basename "$s")
    fileA="${main}/${sub}/A"
    fileB="${main}/${sub}/B"
    awk -v sizeA=$(wc -l < "$fileA") -v sizeB=$(wc -l < "$fileB") '
        NR==FNR {
            if ( FNR == (sizeA-1) ) { split($0,p) }
            if ( FNR == sizeA )     { split($0,a) }
            next
        }
        { b[sizeB + 1 - FNR] = $NF }
        END {
            split(FILENAME,f,"/")
            print f[1], f[2], p[2], p[4], a[1], a[3], a[5], a[8], a[10], b[11], b[10], b[9], b[8], b[6], b[5]
        }
    ' "$fileA" "$fileB"
  done
done > output
如果需要保留这些值,然后进行回显,可以执行以下操作:

tail -n11 B | awk -F':' '{ print $2 }'
array=($(tail -n11 B | awk -F':' '{ print $2 }'))
for value in "${array[@]}"
do
    echo $value
done
for m in ./*/; do
  main=$(basename "$m")
  for s in "$m"*/; do
    sub=$(basename "$s")
    fileA="${main}/${sub}/A"
    fileB="${main}/${sub}/B"
    awk -v sizeA=$(wc -l < "$fileA") -v sizeB=$(wc -l < "$fileB") '
        NR==FNR {
            if ( FNR == (sizeA-1) ) { split($0,p) }
            if ( FNR == sizeA )     { split($0,a) }
            next
        }
        { b[sizeB + 1 - FNR] = $NF }
        END {
            split(FILENAME,f,"/")
            print f[1], f[2], p[2], p[4], a[1], a[3], a[5], a[8], a[10], b[11], b[10], b[9], b[8], b[6], b[5]
        }
    ' "$fileA" "$fileB"
  done
done > output
您应该研究find和xargs,因为每次您在shell中编写循环只是为了操纵文本,您的方法是错误的,但为了保持简单并保留原始结构,听起来您可以使用以下方法:

tail -n11 B | awk -F':' '{ print $2 }'
array=($(tail -n11 B | awk -F':' '{ print $2 }'))
for value in "${array[@]}"
do
    echo $value
done
for m in ./*/; do
  main=$(basename "$m")
  for s in "$m"*/; do
    sub=$(basename "$s")
    fileA="${main}/${sub}/A"
    fileB="${main}/${sub}/B"
    awk -v sizeA=$(wc -l < "$fileA") -v sizeB=$(wc -l < "$fileB") '
        NR==FNR {
            if ( FNR == (sizeA-1) ) { split($0,p) }
            if ( FNR == sizeA )     { split($0,a) }
            next
        }
        { b[sizeB + 1 - FNR] = $NF }
        END {
            split(FILENAME,f,"/")
            print f[1], f[2], p[2], p[4], a[1], a[3], a[5], a[8], a[10], b[11], b[10], b[9], b[8], b[6], b[5]
        }
    ' "$fileA" "$fileB"
  done
done > output
请注意,上面的命令只会打开每个B文件1次,而不是6次

awk 'NR==1{print $6} NR==2{print $4} NR==3{print $4} ...'
您可以通过以下方式简化:

NR==2 || NR==3 || NR==4
但这似乎很难维持。或者可以使用数组:

awk 'BEGIN{a[1]=6;a[2]=4...} NR in a{ print $a[NR]}'
但我想你真的只是想:

awk '{print $NF}' ORS=\\t
您不需要第1行的第6个字段。您需要最后一个字段


不要试图将输出收集到变量中以进行回显,而是添加ORS=\\t以获得制表符分隔的输出,并将其打印到脚本的标准输出。

-vdata行上的F'[=]+'将=添加到awk的FS字段拆分值中,以便字段$8不包含前导=。NR==1{…}块正在存储第一行的值,以便在第二行的打印输出中使用。谢谢!我将调查find和xargs:谢谢你的回答很简洁,我想我会用你的