如何在使用KV过滤器解析后验证Logstash中的数据格式

如何在使用KV过滤器解析后验证Logstash中的数据格式,logstash,logstash-grok,logstash-configuration,Logstash,Logstash Grok,Logstash Configuration,我有如下日志消息: 2017-01-06 19:27:53,893 INFO [[ACTIVE] ExecuteThread: '10' for queue: 'weblogic.kernel.Default (self-tuning)'] com.symantec.olp.sas.log.SummaryLogAspect - {country=JP, svid=182, typ=sassum, vid=1120, xffmode=0, initallocdate=2014-12-15T01:0

我有如下日志消息:

2017-01-06 19:27:53,893 INFO [[ACTIVE] ExecuteThread: '10' for queue: 'weblogic.kernel.Default (self-tuning)'] com.symantec.olp.sas.log.SummaryLogAspect - {country=JP, svid=182, typ=sassum, vid=1120, xffmode=0, initallocdate=2014-12-15T01:08:24Z, xffip=222.230.107.165, seatcnt=1, plgid=2, api=941753644, oslocale=JPN, fng=CJ6FRE1208VMNRQG, req=/228228131/28RXAAPB1DqJj/RSLHL940/EMBXtu+/f+/Zeb/KV1Q/DTXZBFC94ZE5AOmz/mDCqB7zJOARDQO/166180202502162557303662649078783407201612&D09DEEFB7E78065D?NAM=SFlBUy1WQTE2&MFN=VkFJTyBDb3Jwb3JhdGlvbg==&MFM=VkpQMTEx&OLA=JPN&OLO=JPN, llmv=470, oslang=JPN, ctok=166180202502162557303662649078783407201612, resptime=119, epid=70D3B811A994477F957A90985109BE9D, campnid=0, remip=222.230.107.165, lictype=SOS, dbepid=70D3B811A994477F957A90985109BE9D, cid=nav1sasapppex02.msp.symantec.com1481215212435, status=10002, siid=240, skum=21356539, skup=01001230, psn=O749UPCN8KSY, cip=84.100.138.144, mname=VAIO Corporation, puid=1199, skuf=01100470, st=1481765738387, prbid=5967, mmodel=VJP111, clang=EN, pnfi=1120, cprbid=745, cpmv=7428, euip=222.230.107.165, prcdline=2, dvnm=HYAS-VA16, remdays=0, seatid=ah00s8CIdqUQyW2V, sasvid=106, xlsid=3730, baseactkey=186635290403122706518307794, coupon=651218, translogid=75033f05-9cf2-48e2-b924-fc2441d11d33}  
2017-01-06 19:28:03,894 INFO [[ACTIVE] ExecuteThread: '10' for queue: 'weblogic.kernel.Default (self-tuning)'] com.symantec.olp.sas.log.SummaryLogAspect - {country=JP, svid=182, typ=sassum, vid=1120, xffmode=0, initallocdate=2014-12-15T01:08:24Z, xffip=222.230.107.165, seatcnt=1, plgid=2, api=228228131, oslocale=JPN, fng=1TA6U8RVL0JQXA0N, req=/228228131/28RXAAPB1DqJj/RSLHL940/EMBXtu+/f+/Zeb/KV1Q/DTXZBFC94ZE5AOmz/mDCqB7zJOARDQO/166180202502162557303662649078783407201612&D09DEEFB7E78065D?NAM=SFlBUy1WQTE2&MFN=VkFJTyBDb3Jwb3JhdGlvbg==&MFM=VkpQMTEx&OLA=JPN&OLO=JPN, lpmv=470, oslang=JPN, ctok=166180202502162557303662649078783407201612, resptime=119, epid=70D3B811A994477F957A90985109BE9D, campnid=0, remip=222.230.107.165, lictype=SOS, dbepid=70D3B811A994477F957A90985109BE9D, cid=nav1sasapppex02.msp.symantec.com1481215212435, status=0000, siid=240, skum=21356539, skup=01001230, psn=28MHHH2VPR4T, cip=222.230.107.165, mname=VAIO Corporation, puid=1199, skuf=01100470, st=1481765738387, prbid=5967, mmodel=VJP111, clang=EN, pnfi=1120, cprbid=745, cpmv=1027, euip=222.230.107.165, prcdline=2, dvnm=HYAS-VA16, remdays=0, seatid=StrlisGXA4yAt1ad, sasvid=130, xlsid=2820, baseactkey=028200017462383754273799438, coupon=123456, translogid=72df4536-6038-4d1c-b213-d0ff5c3c20fb}
我使用以下grok模式来匹配这些:

(?m)%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:severity} \[%{GREEDYDATA:thread}\] %{JAVACLASS:className} - \{%{GREEDYDATA:logmsg}\}
在此之后,我使用KV过滤器分割logmsg field中的字段,并仅包括我感兴趣的字段。我的问题是:如何验证我感兴趣的字段的格式?我需要提到的一件事是——日志在logmsg中包含不同数量的字段,这就是我使用GREEDYDATA的原因

我的Logstash.conf如下:

input {   
  kafka {
    bootstrap_servers => "brokers_list"
    topics => ["transaction-log"]
    codec => "json"   
  } 
}

filter {
        grok {
            match => [ "message", "(?m)%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:severity} \[%{GREEDYDATA:thread}\] %{JAVACLASS:className} - \{%{GREEDYDATA:logmsg}\}" ]
            #overwrite => [ "message" ]
        }

        if "_grokparsefailure" not in [tags] {
           kv {
              field_split => ", "
              source => "logmsg"
              include_keys => ["api", "fng", "status", "cip", "cpmv", "translogid", "coupon", "baseactkey", "xlsid", "sasvid", "seatid", "srcHostname", "serverId" ]
              allow_duplicate_values => false
              remove_field => [ "message", "kafka.*", "logmsg"]
           }
        }

        if [api] != "228228131" {
           mutate { add_tag => "_grokparsefailure" }
        }

        date { # use timestamp from the log
          "match" => [ "timestamp", "YYYY-MM-dd HH:mm:ss,SSS" ]
          target => "@timestamp"
        }

        mutate {
          remove_field => [ "timestamp" ]  # remove unused stuff
        } 
  }

output {   
  if "_grokparsefailure" not in [tags] {      
    kafka {
        topic_id => "invalid topic"
        bootstrap_servers => "brokers_list"
        codec => json {}      
    }    
  } else {    
    kafka {
        topic_id => "valid topic"
        bootstrap_servers => "brokers_list"
        codec => json { }     
    }    
  } 
}
在使用KV filter解析之后,我正在检查api字段的值,如果它不等于228228131,那么我将向它添加_grokparsefailure标记,不再进一步处理

我希望能够验证Include_密钥中列出的字段的格式,例如cip,它是客户端IP?如何验证这些字段的数据格式?因为我的日志包含不同数量的字段,所以在grok级别我无法验证。只有在使用KV解析之后,我才能获得这些字段并验证它们。通过验证,我的意思是,验证它们是否符合ES索引中定义的类型。这是因为,如果它们不一致,我想将它们发送到卡夫卡中的无效主题

我应该使用ruby过滤器进行验证吗?如果是的话,你能给我一个样品吗?或者,我应该在KV解析之后重新生成事件,并在新创建的事件上再次使用grok


非常感谢您提供一些示例来展示这些内容。

一个具体的示例可能会有所帮助,但您可以使用正则表达式检查许多内容:

if [myField] =~ /^[0-9]+$/ {
     # it contains a number
}
或者像这样:

if [myField] =~ /^[a-z]+$/ {
    # it contains lower lowercase letters
}

非常感谢。事实证明,我可以使用ruby过滤器在K,V对之间循环,然后可以执行您建议的验证。但我听说这不利于表演。而这个日志库一开始并不是为了验证。理想情况下,这些工作应该使用grok来完成。