如何在日志中分组并计算bash中的每个子组_Bash_Awk_Count_Grep_Grouping

如何在日志中分组并计算bash中的每个子组

bash awk grep

如何在日志中分组并计算bash中的每个子组,bash,awk,count,grep,grouping,Bash,Awk,Count,Grep,Grouping,我想分析一个日志文件。它有几个操作，每个操作包含一组子操作。我想提取按操作分组的子操作数。在sql中这很容易，但我被困在bash中了以下是该文件的简化版本： [21:30:21.538Z #a9a.012 DEBUG - - ] c.h.c.w.j.JobTrackingWorkerReporter: Reporting bulk completion: Partition: tenant-xla; Job: ingestion-4759-9-13-41;

我想分析一个日志文件。它有几个操作，每个操作包含一组子操作。我想提取按操作分组的子操作数。在sql中这很容易，但我被困在bash中了

以下是该文件的简化版本：

    [21:30:21.538Z #a9a.012 DEBUG -            -   ] c.h.c.w.j.JobTrackingWorkerReporter: Reporting bulk completion: Partition: tenant-xla; Job: ingestion-4759-9-13-41; Tasks: [ingestion-4759-9-13-41.1.43, ingestion-4759-9-13-41.1.44, ingestion-4759-9-13-41.1.41]

otherlogs stuff ...

[21:31:21.538Z #a9a.012 DEBUG -            -   ] c.h.c.w.j.JobTrackingWorkerReporter: Reporting bulk completion: Partition: tenant-xla; Job: ingestion-4757-10-17-4; Tasks: [ingestion-4757-10-17-4.1.2, ingestion-4757-10-17-4.1.1, ingestion-4757-10-17-4.1.3, ingestion-4757-10-17-4.1.4]

otherlogs stuff ...

[21:31:21.690Z #a9a.012 DEBUG -            -   ] c.h.c.w.j.JobTrackingWorkerReporter: Reporting bulk completion: Partition: tenant-xla; Job: ingestion-4757-10-18-3; Tasks: [ingestion-4757-10-18-3.1.137, ingestion-4757-10-18-3.1.139, ingestion-4757-10-18-3.1.138, ingestion-4757-10-18-3.1.140, ingestion-4757-10-18-3.1.136, ingestion-4757-10-18-3.1.141]

每个操作都是点之前的部分，其余属于任何子操作

我正在寻找如下结果，例如，我可以将其存储在文件中：

operationName            suboperationCount
ingestion-4757-10-18-3         3
ingestion-4757-10-18-4         4
ingestion-4757-10-18-3         6

我一直在尝试一些组合，比如

cat xlogs.txt | grep“摄入”| uniq | wc-w>fileresult.txt

但这只会返回全球数字

谢谢

您可以使用此

grep+uniq

命令：

grep -Eo '\bingestion-[0-9-]+' file.log | uniq -c

编辑：在OP的评论得知后，我们只需要在

任务

中包含ID，因此在这种情况下，您可以尝试以下操作，严格考虑到您的输入文件中每行只有1个

任务

字符串

awk '
{
  sub(/.*Tasks/,"Tasks")
  while(match($0,/ingestion-[0-9-]+/)){
    arr[substr($0,RSTART,RLENGTH)]++
    $0=substr($0,RSTART+RLENGTH)
  }
}
END{
  for(i in arr){
    print i,arr[i]
  }
}'  Input_file

使用awk

awk

您是否可以尝试使用显示的样本进行以下操作、编写和测试

awk'
{
而（匹配（$0，/摄入-[0-9-]+/）{
arr[substr（$0，RSTART，RLENGTH）]++
$0=substr（$0，RSTART+RLENGTH）
}
}
结束{
对于（我在arr中）{
打印i，arr[i]
}
}'输入文件

说明：添加上述内容的详细说明

awk '                                       ##Starting awk program from here.
{
  while(match($0,/ingestion-[0-9-]+/)){     ##Running while loop till match function returns a TRUE result after matching regex init.
    arr[substr($0,RSTART,RLENGTH)]++        ##Creating array arr whihc has index as matched regex substring and keep increasing its value by 1 here.
    $0=substr($0,RSTART+RLENGTH)            ##Now saving rest of the line(after the matched regx above) into current line.
  }
}
END{                                        ##Starting END block of this awk program from here.
  for(i in arr){                            ##Traversing through arr all elements here.
    print i,arr[i]                          ##printing index of array and value of array with index of i.
  }
}' Input_file                               ##mentioning Input_file name here.

我将awk添加到您的代码中，因为它是可读的：

cat xlogs.txt | grep -o -E 'ingestion[0-9-]+' | uniq -c | awk ' 
     {if (NR == 1){
        print "operationName suboperationCount" > "fileresult.txt";
     }
     print $0=$2 " " $1 >> "fileresult.txt"
     }'

您好，非常感谢，您在每个结果上都添加了一个点，但我知道这是为了避免考虑“作业：摄取”的额外出现，这使得计数是正确的。抱歉，我实际上在一个较大的文档上测试了您的解决方案，但失败了。原因是您的代码实际上检查每一行，然后编译。因此，如果出于某些原因（我的日志中发生了什么），就会出现job1[task1，task2，task3]。然后是job2[task1，task2，task3]，然后是job1[task3，task4，task5]，对于job1，我们将有两个不同的行，这不是我们所期望的。另一个解决方案中描述的循环可能是最好的方法。您好，谢谢，编译后的解决方案很棒，但它需要额外的“作业：摄取”，因此所有计数结果都必须是-1对不起，我实际上在一个更大的文档上测试了您的解决方案，但失败了。原因是您的代码实际上检查每一行，然后编译。因此，如果出于某些原因（我的日志中发生了什么），就会出现job1[task1，task2，task3]。然后是job2[task1，task2，task3]，然后是job1[task3，task4，task5]，对于job1，我们将有两个不同的行，这不是我们所期望的。另一个解决方案中描述的循环可能是最好的方法。我的缺点是，我可能从一开始就没有提供好的样品。谢谢你的回复！


$grep -o  'ingestion[\.0-9-]*\.'  file | uniq -c
      3 ingestion-4759-9-13-41.1.
      4 ingestion-4757-10-17-4.1.
      6 ingestion-4757-10-18-3.1.

cat xlogs.txt | grep -o -E 'ingestion[0-9-]+' | uniq -c | awk ' 
     {if (NR == 1){
        print "operationName suboperationCount" > "fileresult.txt";
     }
     print $0=$2 " " $1 >> "fileresult.txt"
     }'