如何使用awk按模式筛选字段_Awk

如何使用awk按模式筛选字段

awk

如何使用awk按模式筛选字段,awk,Awk,我有一个这样格式的文件： Topic:test_replication PartitionCount:1 ReplicationFactor:3 Configs:retention.ms=604800000,delete.retention.ms=86400000,cleanup.policy=delete,max.message.bytes=1000012,min.insync.replicas=2,retention.bytes=-1 Topic:teste2e_funcional

我有一个这样格式的文件：

Topic:test_replication  PartitionCount:1    ReplicationFactor:3 Configs:retention.ms=604800000,delete.retention.ms=86400000,cleanup.policy=delete,max.message.bytes=1000012,min.insync.replicas=2,retention.bytes=-1
Topic:teste2e_funcional PartitionCount:12   ReplicationFactor:3 Configs:min.cleanable.dirty.ratio=0.00001,delete.retention.ms=86400000,cleanup.policy=delete,min.insync.replicas=2,segment.ms=604800000,retention.bytes=-1
Topic:ticket_dl.replica_cloudera    PartitionCount:3    ReplicationFactor:3 Configs:message.downconversion.enable=true,file.delete.delay.ms=60000,segment.ms=604800000,min.compaction.lag.ms=0,retention.bytes=-1,segment.index.bytes=10485760,cleanup.policy=delete,message.timestamp.difference.max.ms=9223372036854775807,segment.jitter.ms=0,preallocate=false,message.timestamp.type=CreateTime,message.format.version=2.2-IV1,segment.bytes=1073741824,max.message.bytes=1000000,unclean.leader.election.enable=false,retention.ms=604800000,flush.ms=9223372036854775807,delete.retention.ms=31536000000,min.insync.replicas=2,flush.messages=9223372036854775807,compression.type=producer,index.interval.bytes=4096,min.cleanable.dirty.ratio=0.5

我只想要Topic的值（例如test_replication）和min.insync.replicas的值（例如2）

我知道用正则表达式是可能的，但我不知道怎么做。对我来说，问题在于min.insync.replicas不在同一位置，因此如果我使用awk选项-F，例如，我将获得不同的min.insync.replicas值。

请尝试以下内容

awk '
match($0,/Topic:[^ ]*/){
  topic=substr($0,RSTART+6,RLENGTH-6)
  match($0,/min\.insync\.replicas[^,]*/)
  print topic,substr($0,RSTART+20,RLENGTH-20)
  topic=""
}
'  Input_file

说明：添加上述内容的详细说明

awk '                                                  ##Starting awk program from here.
match($0,/Topic:[^ ]*/){                               ##Using match function to match regex Topic: till space comes here.
  topic=substr($0,RSTART+6,RLENGTH-6)                  ##Creating topic varwhich has sub-string of current line starting from RSTART till RLENGTH.
  match($0,/min\.insync\.replicas[^,]*/)               ##Using match again to match regex frommin to till comma here.
  print topic,substr($0,RSTART+20,RLENGTH-20)          ##Printing topic and sub-string from RSTART to till RLENGTH adding and substracting respectively here.
  topic=""                                             ##Nullify variable topic here.
}
' Input_file                                           ##Mentioning Input_file name here.

第二种解决方案：在此处添加

sed

解决方案

sed 's/Topic:\([^ ]*\).*min\.insync\.replicas=\([^,]*\).*/\1 \2/' Input_file

很抱歉之前的问题。非常简单：

awk '
  match($0,/Topic:[^ ]*/){
  topic=substr($0,RSTART+6,RLENGTH-6)
  match($0,/min\.insync\.replicas[^,]*/)
  mininsync=substr($0,RSTART+20,RLENGTH-20)
  match($0,/retention\.ms[^,]*/)
  retention=substr($0,RSTART+13,RLENGTH-13)
  print topic",",mininsync,","retention 
  topic=""
}

此信息很有用，但如果我想有一个新字段，如retention.ms（3个字段），我不能在

打印之前添加新行匹配（$0，/retention\.ms[^，]*/）
。我该怎么做？@Skiel，你能试试下面的awk'match（$0，/Topic:[^]*/）{Topic=substr（$0，RSTART+6，RLENGTH-6）；match（$0，/min\.insync\.replications[^，]*/）；$（NF+1）=Topic substr（$0，RSTART+20，RLENGTH-20）；打印；Topic=“”}输入文件
在每行创建一个新字段，让我知道这是否对你有帮助。