使用awk、sed和grep处理文本文件

使用awk、sed和grep处理文本文件,sed,awk,grep,Sed,Awk,Grep,我的输入文件: 20110512075615 Constanta 1.0041 1013.41 9999.0 0 0.0 0 20110512075630 Constanta 1.0021 1013.45 9999.0 0 0.0 0 20110512075645 Constanta 1.0031 1013.47 9999.0 0 0.0 0 20110512075700 Constanta 1.0018 1013.47 9999.0 0 0.0 0 20110512075730 Constan

我的输入文件:

20110512075615 Constanta 1.0041 1013.41 9999.0 0 0.0 0
20110512075630 Constanta 1.0021 1013.45 9999.0 0 0.0 0
20110512075645 Constanta 1.0031 1013.47 9999.0 0 0.0 0
20110512075700 Constanta 1.0018 1013.47 9999.0 0 0.0 0
20110512075730 Constanta 1.0038 1013.48 9999.0 0 0.0 0
20110512075745 Constanta 1.0023 1013.48 9999.0 0 0.0 0
20110512075800 Constanta 9999.0000 1013.46 13.2 0 0.0 0
20110512075815 Constanta 1.0038 1013.45 13.2 0 0.0 0
20110512075830 Constanta 1.0040 1013.50 13.2 0 0.0 0
20110512075845 Constanta 1.0034 1013.50 13.2 0 0.0 0
20110512075900 Constanta 1.0050 1013.45 13.2 0 0.0 0
20110512075915 Constanta 1.0060 1013.48 13.2 0 0.0 0
20110512075930 Constanta 1.0056 1013.45 13.2 0 0.0 0
20110512080000 Constanta 1.0066 1013.50 13.2 0 0.0 0
20110512080015 Constanta 1.0067 1013.49 13.2 0 0.0 0
20110512080100 Constanta 1.0065 1013.48 13.2 0 0.0 0
20110512080115 Constanta 9999.0000 1013.51 13.2 0 0.0 0
20110512080130 Constanta 1.0065 1013.51 13.2 0 0.0 0
20110512080145 Constanta 1.0079 1013.49 13.2 0 0.0 0
20110512080200 Constanta 1.0072 1013.51 13.2 0 0.0 0
20110512080215 Constanta 1.0084 1013.51 13.2 0 0.0 0
我的输出文件:

   YY/MM/DD HH -Level- Atm.Prs -Tw-
   201105120757        1.0018    1013.47    9999.0     0    0.0     0
   201105120759        1.0050    1013.45    13.2     0    0.0     0
   201105120800  9999.0000       1.0066    1013.50    13.2     0    0.0     0
   201105120801        1.0065    1013.48    13.2     0    0.0     0
   201105120802  9999.0000       1.0072    1013.51    13.2     0    0.0     0
我的代码:

   #! /bin/bash
   FILE="Constanta20110513.txt"
   # 1) remove column two(='Constanta')
   awk '{$2="";print}' $FILE | column -t > tmpfile
   # 2) remove lines with '9999.0000'  
   cat tmpfile | sed -e '/9999.[0-9]/d'  >> final.tmp
   # 3) remove first three lines
   awk 'NR>3' final.tmp >> myfile.tmp
   # 4) count lines between '....00' si '....00': 
   #if >= 3, keep only the line with '...00' and delete the other lines
   #if < 3, do the same, and put '9999' on column two

   output=$(grep -n '00\s*$' myfile.tmp | sed 's/\s*$/ /')
   array=($output $(cat myfile.tmp | wc -l))

   for (( i=0; i<${#array[@]}-1; i++ )); do
     index1=$(echo "${array[$i]}" | grep -o '^[0-9]*') 
     index2=$(echo "${array[$i+1]}" | grep -o '^[0-9]*')

     if [ $(( index2 - index1 )) -ge 3 ]; then
        echo $(echo "${array[$i]}" | grep -o '[0-9]*$') >> temp.tmp
     else
        echo $(echo "${array[$i]}" | grep -o '[0-9]*$') 9999.0000 >> temp.tmp
     fi

  done

   # 5) delete last two characters from first column(=00)
   awk '{sub(/..$/,"",$1)} 1' temp.tmp >> output.tmp
  # 6) insert header
  echo 'YY/MM/DD HH -Level- Atm.Prs -Tw-' | cat - output.tmp >> output2.tmp
  #save
  mv output2.tmp $FILE
#/bin/bash
FILE=“Constanta20110513.txt”
#1)移除第二列(‘康斯坦塔’)
awk'{$2=”“;print}'$FILE | column-t>tmpfile
#2)拆下带有“9999.0000”的管路
cat tmpfile | sed-e'/9999.[0-9]/d'>>final.tmp
#3)拆下前三条线路
awk'NR>3'final.tmp>>myfile.tmp
#4)计算“..00”si“..00”之间的行数:
#如果>=3,则仅保留带有“…00”的行,并删除其他行
#如果<3,则执行相同操作,并在第二列中输入'9999'
输出=$(grep-n'00\s*$'myfile.tmp | sed's/\s*$/'))
数组=($output$(cat myfile.tmp | wc-l))
对于((i=0;i>temp.tmp
其他的
echo$(echo“${array[$i]}”| grep-o'[0-9]*$')9999.0000>>temp.tmp
fi
完成
#5)从第一列(=00)中删除最后两个字符
awk'{sub(/…$/,“”,$1)}1'temp.tmp>>output.tmp
#6)插入标题
echo'YY/MM/DD HH-Level-Atm.Prs-Tw-'| cat-output.tmp>>output2.tmp
#拯救
mv output2.tmp$文件
我的问题在第4步:不工作,临时文件temp.tmp不是create。 我想问题就在这里:
grep-n'00\s*$'myfile.tmp | sed's/\s*$/'

提前非常感谢。

这里是一次完成1到3:

awk '{$2="";sub(/  /," ")} !/9999.[0-9]/ && t++>2' $FILE
不确定第4步中你想算什么,你能说得更清楚些吗。

我基于Jotne的工作,在第1-3步添加了一个函数来处理第4步。应将以下内容放入可执行文件(我称之为
awko
)中,并像
awko Constanta20110513.txt
一样运行:

#!/usr/bin/awk -f

BEGIN { print "YY/MM/DD HH -Level- Atm.Prs -Tw-" }

# absorb jotne's work for #1-3 more or less
{$2="";sub(/  /," ")}
/9999.0000/ || NR<=3 { next }

/^[0-9]{12}00/ { output_line() } # deal with the "00" lines

END { output_line() } # output the final "00" stored in last

function output_line() {
    if( last_nr != 0 ) {
        if( NR-last_nr < 3 ) {
            temp = $0          # save off the current line
            $0 = last          # reset it to the last "00" line
            $2 = "9999.0000"   # make $2 what you want
            print $0
            $0 = temp          # restore $0 from temp
        }
        if( NR-last_nr >= 3 ) { print last }
    }
    $1 = substr( $1, 1, 12 )   # drop the "00" from $1
    last = $0; last_nr = NR;   # store some variables
    }

你能给我们看一下你想要的输出吗?在第4步,我想计算最后两个字符为'00'的行之间的行数(在第一列);如果结果>=3:仅保留结尾处有“00”的行,并删除另一行;如果结果为@N0741337,则工作完美!非常非常感谢!
YY/MM/DD HH -Level- Atm.Prs -Tw-
201105120757 1.0018 1013.47 9999.0 0 0.0 0
201105120759 1.0050 1013.45 13.2 0 0.0 0
201105120800 9999.0000 1013.50 13.2 0 0.0 0
201105120801 1.0065 1013.48 13.2 0 0.0 0
201105120802 9999.0000 1013.51 13.2 0 0.0 0