从awk中的三个以上连续行计算值_Awk

从awk中的三个以上连续行计算值

awk

从awk中的三个以上连续行计算值,awk,Awk,我正在处理一个文件，如下所示： site Date time value1 value2 0023 2014-01-01 00:00 32.0 23.7 0023 2014-01-01 01:00 38.0 29.9 0023 2014-01-01 02:00 85.0 26.6 0023 2014-01-01 03:00 34.0 25.3 0023 2014-01-01 04:00 37.0 23.8 0023 2014-01-01 05:00 80.0 20.3 0023 2014-01-

我正在处理一个文件，如下所示：

site Date time value1 value2
0023 2014-01-01 00:00 32.0 23.7
0023 2014-01-01 01:00 38.0 29.9
0023 2014-01-01 02:00 85.0 26.6
0023 2014-01-01 03:00 34.0 25.3
0023 2014-01-01 04:00 37.0 23.8
0023 2014-01-01 05:00 80.0 20.3
0023 2014-01-01 06:00 90.0 20.0
0023 2014-01-01 07:00 180.0 20.0
0023 2014-01-01 08:00 30.0 20.0

第一列为现场，第二列为日期（2014年全年），第三列为时间（每天从00:00到23:00），第四列和第五列为数值。我需要根据以下条件比较第4列和第5列：

对于每个站点（第1列），如果第4列是第5列的3倍以上，并且此模式持续超过或等于3小时，加上最大值必须大于100，则打印所有符合标准的行，并计算每个站点存在的案例数。总共有大约150个站点，每个站点每天都有小时数据。这是我想要的输出：

0023 2014-01-01 05:00 80.0 20.3 1
0023 2014-01-01 06:00 90.0 20.0 1
0023 2014-01-01 07:00 180.0 20.0 1
0023 2014-06-30 23:00 200.0 30.3 2
0023 2014-07-01 00:00 303.0 30.3 2
0023 2014-07-01 01:00 134.0 30.3 2
0025 2014-07-01 01:00 136.0 25.3 1           
0025 2014-07-01 02:00 116.0 25.3 1
0025 2014-07-01 03:00 106.0 25.3 1

非常感谢您的帮助

我试着猜测你的想法，首先是我的测试输入：

#cat in2

还有我的awk程序：#cat msr

#/bin/bash
（（$#！=1））&&{echo“Usage$0 inp_file”；退出1；}
awk'
BEGIN{ix=0；stn=-1；}#ix:filo的索引，stn:sitnum不存在
$1~“[^0-9]”| | NF=5 | |$4$5~“[^0-9.]”{下一步；}|不跳过任何数据行
$1 != stn{chck_prnt（）；stn=$1；stc=1；}#新建站点，将计数器设置为1
$4<3*5{chck_prnt（）；next；}#打破了col4>3*col5
{filo[ix++]=$0；如果（$4>mxvl）mxvl=$4；}将候选数据放入filo并刷新mxvl
结束{chck_prnt（）；}#不再有数据行
#prnt的func（如果需要）&clr filo、mxvl
函数chck_prnt（i）{（i是局部变量）
如果（ix>=3&&mxvl>100）{prnt条件
对于（i=0；i@László），代码有一个问题：我刚刚发现输入不是连续的，如下所示：
0023 2014-01-01 21:00 90.0 20
 0023 2014-01-01 22:00 80.0 20
 0023 2014-01-01 23:00 130.0 20
 0023 2014-01-02 16:00 130.0 20
 0023 2014-01-02 17:00 200.0 30.3
 0023 2014-01-02 18:00 303.0 30.3

以下是代码的输出：
0023 2014-01-01 21:00 90.0 20 1
0023 2014-01-01 22:00 80.0 20 1
0023 2014-01-01 23:00 130.0 20 1
0023 2014-01-02 16:00 130.0 20 1
0023 2014-01-02 17:00 200.0 30.3 1
0023 2014-01-02 18:00 303.0 30.3 1

但期望的输出应为：
0023 2014-01-01 21:00 90.0 20   1  
0023 2014-01-01 22:00 80.0 20    1
0023 2014-01-01 23:00 130.0 20   1
0023 2014-01-02 16:00 130.0 20   2
0023 2014-01-02 17:00 200.0 30.3 2 
0023 2014-01-02 18:00 303.0 30.3 2

很抱歉将其作为答案发布，因为它太长，无法作为评论发布。@Kelly，你的评论不适合回答我的问题（在我看来）。
但我试着再次猜测真正的规格是什么
我希望关键的想法是：
当两条连续线之间的时间差大于
1小时我们还需要打印候选人（来自filo），如果“标准”适用于他们
我需要创建一个加号函数来计算时间差。
当filo不是空的时候，在主体部分有一个加号来调用它。
我还放了一个过滤行来检查输入中的日期和时间格式
请注意：
我的chk1h（）函数就足够了，
但还有其他可能计算时间戳之间的时间差：
1/maketime（）函数在gawk中
2/bash shell日期命令，例如：日期“+%s”-d“2014-03-28 11:48:30”
我用你的输入和其他人检查了程序，一切正常
如果还存在一些问题，您需要给出一个较长且完整的代表性输入序列。
不要给出两个列表（输入和输出）。
输入列表就足够了，在要显示的行的末尾写上相应的站点计数器。
不打印的行没有第6列
cat msr2
#!/bin/bash
(($#!=1))&& { echo "Usage $0 inp_file"; exit 1; }
awk '
 BEGIN {ix=0; stn=-1;}                             # ix: index of filo, stn: non-exist
 $1~"[^0-9]" || NF!=5 || $4 $5 ~ "[^0-9.]" {next;} # skip no data lines
 $2 " " $3 !~ "^[1-2][0-9][0-9][0-9]-[0-1][0-9]-[0-3][0-9] [0-2][0-9].00$" {  # dt&tm format filtering
     printf("Unexpected dt,tm format:\nInput ln:%d\nContent: %s\n",NR,$0); exit(1);}

 $1 != stn  {chck_prnt(); stn=$1; stc=1;}          # new site, set counter to 1
 $4 < 3*$5  {chck_prnt(); next;}                   # broken the col4>3*col5
 ix         {if(!chk1h(ld, lt, $2, $3))chck_prnt();} # filo not empty-->need to chck 1h diff
            {filo[ix++]=$0; ld=$2; lt=$3;          # put into filo & set last dt,tm,mxvl
                            if($4>mxvl)mxvl=$4;}
 END        {chck_prnt();}                         # no more data line

 function chck_prnt(  i){                          # (i is a local var) 
    if(ix>=3 && mxvl>100){                         # prnt condition 
        for(i=0; i<ix; i++)printf("%s %d\n", filo[i],stc); # prnt all filo
        stc++;                                     # increase counter at site
    }
    ix=0; mxvl=0;                                  # clr filo & maxvl
 }

 function chk1h(d1,t1,d2,t2,  h1,h2,dy,dm,dd){     # ret 1 if dt of current ln - last dt in filo == 1h othrwise 0
   h1=substr(t1,1,2); h2=substr(t2,1,2);
   if(h2-h1==1 && d1==d2)return(1);                # most of case in same day 1h
   if(h1!=23||h2!="00")return(0);                  # not 1h
   split(d1,v1,"-"); split(d2,v2,"-");             # v1[1-3]=ymd last in filo, v2[1-3] current
   dy=v2[1]-v1[1];                                 # diff of year
   dm=v2[2]-v1[2];                                 # diff of month
   dd=v2[3]-v1[3];                                 # diff of day
   if(dd==1 && !dy && !dm)return(1);               # 23h-->00h 1h in same month
   if(v2[3]!="01")return(0);                       # not 1h
   if(v1[3]==31)                                   # chng of month, three type of prev month
       if(!dy && dm==1 || dy==1 && dm==-11)return(1); # 1h
       else return(0);                             # not 1h
   if(v1[3]==30)
       if("04 06 09 11" ~ v1[2] && !dy && dm==1)return(1); # 1h
       else return(0);                             # not 1h
   if("28 29" ~ v1[3] && v1[2]=="02" && !dy && dm==1)return(1); # 1h
   return(0);                                      # not 1h
 }
' $1

但如果需要格式化输出，请按以下方式运行：
./msr2 in3|awk 'NF==6&&$1!~"[^0-9]"{printf("%s %s %s %6.1f %6.1f %3u\n",$1,$2,$3,$4,$5,$6);}'


目前，这似乎更像是一个规范，而不是一个问题。你已经尝试过什么了吗？如果是的话，请告诉我们。另外，你能提供所需的输出吗？汤姆，非常感谢你的快速响应和帮助。我对awk非常陌生，虽然我试图搜索信息，但没有任何线索，但我没有得到任何提示。请回答换句话说，我希望您的输入不符合您测试的条件。注释只允许非常小的格式。相反，您的问题应包括预期的输出。@jas，谢谢您告诉我这一点。只需添加输出，但不确定如何删除每行之间的空格，但仍保持每行分开。Tom帮助了我编辑原始部分。拉斯洛·斯齐拉吉，非常感谢你的帮助！你是我的救世主。代码有问题，我将其作为答案发布在下面，因为它太长了。谢谢你。拉斯洛，也感谢你的时间和建议。是的，我想统计每个网站符合所有标准的案例数量。如果我只算所有的案例（不考虑网站名称，只计算有多少事件）？这是两个连续行之间的1个小时的时间，但是在第二个样本中缺少了一段时间。我把上面的问题添加到我的第一个答案。在这种情况下，请给我一个绿色的蜱，因为你接受了我的答案。
0023 2014-01-01 21:00 90.0 20   1  
0023 2014-01-01 22:00 80.0 20    1
0023 2014-01-01 23:00 130.0 20   1
0023 2014-01-02 16:00 130.0 20   2
0023 2014-01-02 17:00 200.0 30.3 2 
0023 2014-01-02 18:00 303.0 30.3 2

#!/bin/bash
(($#!=1))&& { echo "Usage $0 inp_file"; exit 1; }
awk '
 BEGIN {ix=0; stn=-1;}                             # ix: index of filo, stn: non-exist
 $1~"[^0-9]" || NF!=5 || $4 $5 ~ "[^0-9.]" {next;} # skip no data lines
 $2 " " $3 !~ "^[1-2][0-9][0-9][0-9]-[0-1][0-9]-[0-3][0-9] [0-2][0-9].00$" {  # dt&tm format filtering
     printf("Unexpected dt,tm format:\nInput ln:%d\nContent: %s\n",NR,$0); exit(1);}

 $1 != stn  {chck_prnt(); stn=$1; stc=1;}          # new site, set counter to 1
 $4 < 3*$5  {chck_prnt(); next;}                   # broken the col4>3*col5
 ix         {if(!chk1h(ld, lt, $2, $3))chck_prnt();} # filo not empty-->need to chck 1h diff
            {filo[ix++]=$0; ld=$2; lt=$3;          # put into filo & set last dt,tm,mxvl
                            if($4>mxvl)mxvl=$4;}
 END        {chck_prnt();}                         # no more data line

 function chck_prnt(  i){                          # (i is a local var) 
    if(ix>=3 && mxvl>100){                         # prnt condition 
        for(i=0; i<ix; i++)printf("%s %d\n", filo[i],stc); # prnt all filo
        stc++;                                     # increase counter at site
    }
    ix=0; mxvl=0;                                  # clr filo & maxvl
 }

 function chk1h(d1,t1,d2,t2,  h1,h2,dy,dm,dd){     # ret 1 if dt of current ln - last dt in filo == 1h othrwise 0
   h1=substr(t1,1,2); h2=substr(t2,1,2);
   if(h2-h1==1 && d1==d2)return(1);                # most of case in same day 1h
   if(h1!=23||h2!="00")return(0);                  # not 1h
   split(d1,v1,"-"); split(d2,v2,"-");             # v1[1-3]=ymd last in filo, v2[1-3] current
   dy=v2[1]-v1[1];                                 # diff of year
   dm=v2[2]-v1[2];                                 # diff of month
   dd=v2[3]-v1[3];                                 # diff of day
   if(dd==1 && !dy && !dm)return(1);               # 23h-->00h 1h in same month
   if(v2[3]!="01")return(0);                       # not 1h
   if(v1[3]==31)                                   # chng of month, three type of prev month
       if(!dy && dm==1 || dy==1 && dm==-11)return(1); # 1h
       else return(0);                             # not 1h
   if(v1[3]==30)
       if("04 06 09 11" ~ v1[2] && !dy && dm==1)return(1); # 1h
       else return(0);                             # not 1h
   if("28 29" ~ v1[3] && v1[2]=="02" && !dy && dm==1)return(1); # 1h
   return(0);                                      # not 1h
 }
' $1

0023 2014-01-01 21:00 90.0 20 1
0023 2014-01-01 22:00 80.0 20 1
0023 2014-01-01 23:00 130.0 20 1
0023 2014-01-02 16:00 130.0 20 2
0023 2014-01-02 17:00 200.0 30.3 2
0023 2014-01-02 18:00 303.0 30.3 2

./msr2 in3|awk 'NF==6&&$1!~"[^0-9]"{printf("%s %s %s %6.1f %6.1f %3u\n",$1,$2,$3,$4,$5,$6);}'

0023 2014-01-01 21:00   90.0   20.0   1
0023 2014-01-01 22:00   80.0   20.0   1
0023 2014-01-01 23:00  130.0   20.0   1
0023 2014-01-02 16:00  130.0   20.0   2
0023 2014-01-02 17:00  200.0   30.3   2
0023 2014-01-02 18:00  303.0   30.3   2