如何使用awk按模式筛选字段

如何使用awk按模式筛选字段,awk,Awk,我有一个这样格式的文件: Topic:test_replication PartitionCount:1 ReplicationFactor:3 Configs:retention.ms=604800000,delete.retention.ms=86400000,cleanup.policy=delete,max.message.bytes=1000012,min.insync.replicas=2,retention.bytes=-1 Topic:teste2e_funcional

我有一个这样格式的文件:

Topic:test_replication  PartitionCount:1    ReplicationFactor:3 Configs:retention.ms=604800000,delete.retention.ms=86400000,cleanup.policy=delete,max.message.bytes=1000012,min.insync.replicas=2,retention.bytes=-1
Topic:teste2e_funcional PartitionCount:12   ReplicationFactor:3 Configs:min.cleanable.dirty.ratio=0.00001,delete.retention.ms=86400000,cleanup.policy=delete,min.insync.replicas=2,segment.ms=604800000,retention.bytes=-1
Topic:ticket_dl.replica_cloudera    PartitionCount:3    ReplicationFactor:3 Configs:message.downconversion.enable=true,file.delete.delay.ms=60000,segment.ms=604800000,min.compaction.lag.ms=0,retention.bytes=-1,segment.index.bytes=10485760,cleanup.policy=delete,message.timestamp.difference.max.ms=9223372036854775807,segment.jitter.ms=0,preallocate=false,message.timestamp.type=CreateTime,message.format.version=2.2-IV1,segment.bytes=1073741824,max.message.bytes=1000000,unclean.leader.election.enable=false,retention.ms=604800000,flush.ms=9223372036854775807,delete.retention.ms=31536000000,min.insync.replicas=2,flush.messages=9223372036854775807,compression.type=producer,index.interval.bytes=4096,min.cleanable.dirty.ratio=0.5
我只想要Topic的值(例如test_replication)和min.insync.replicas的值(例如2)


我知道用正则表达式是可能的,但我不知道怎么做。对我来说,问题在于min.insync.replicas不在同一位置,因此如果我使用awk选项-F,例如,我将获得不同的min.insync.replicas值。

请尝试以下内容

awk '
match($0,/Topic:[^ ]*/){
  topic=substr($0,RSTART+6,RLENGTH-6)
  match($0,/min\.insync\.replicas[^,]*/)
  print topic,substr($0,RSTART+20,RLENGTH-20)
  topic=""
}
'  Input_file
说明:添加上述内容的详细说明

awk '                                                  ##Starting awk program from here.
match($0,/Topic:[^ ]*/){                               ##Using match function to match regex Topic: till space comes here.
  topic=substr($0,RSTART+6,RLENGTH-6)                  ##Creating topic varwhich has sub-string of current line starting from RSTART till RLENGTH.
  match($0,/min\.insync\.replicas[^,]*/)               ##Using match again to match regex frommin to till comma here.
  print topic,substr($0,RSTART+20,RLENGTH-20)          ##Printing topic and sub-string from RSTART to till RLENGTH adding and substracting respectively here.
  topic=""                                             ##Nullify variable topic here.
}
' Input_file                                           ##Mentioning Input_file name here.


第二种解决方案:在此处添加
sed
解决方案

sed 's/Topic:\([^ ]*\).*min\.insync\.replicas=\([^,]*\).*/\1 \2/' Input_file

很抱歉之前的问题。非常简单:

awk '
  match($0,/Topic:[^ ]*/){
  topic=substr($0,RSTART+6,RLENGTH-6)
  match($0,/min\.insync\.replicas[^,]*/)
  mininsync=substr($0,RSTART+20,RLENGTH-20)
  match($0,/retention\.ms[^,]*/)
  retention=substr($0,RSTART+13,RLENGTH-13)
  print topic",",mininsync,","retention 
  topic=""
}

此信息很有用,但如果我想有一个新字段,如retention.ms(3个字段),我不能在
打印之前添加新行
匹配($0,/retention\.ms[^,]*/)
。我该怎么做?@Skiel,你能试试下面的
awk'match($0,/Topic:[^]*/){Topic=substr($0,RSTART+6,RLENGTH-6);match($0,/min\.insync\.replications[^,]*/);$(NF+1)=Topic substr($0,RSTART+20,RLENGTH-20);打印;Topic=“”}输入文件
在每行创建一个新字段,让我知道这是否对你有帮助。