如何在使用KV过滤器解析后验证Logstash中的数据格式
我有如下日志消息:如何在使用KV过滤器解析后验证Logstash中的数据格式,logstash,logstash-grok,logstash-configuration,Logstash,Logstash Grok,Logstash Configuration,我有如下日志消息: 2017-01-06 19:27:53,893 INFO [[ACTIVE] ExecuteThread: '10' for queue: 'weblogic.kernel.Default (self-tuning)'] com.symantec.olp.sas.log.SummaryLogAspect - {country=JP, svid=182, typ=sassum, vid=1120, xffmode=0, initallocdate=2014-12-15T01:0
2017-01-06 19:27:53,893 INFO [[ACTIVE] ExecuteThread: '10' for queue: 'weblogic.kernel.Default (self-tuning)'] com.symantec.olp.sas.log.SummaryLogAspect - {country=JP, svid=182, typ=sassum, vid=1120, xffmode=0, initallocdate=2014-12-15T01:08:24Z, xffip=222.230.107.165, seatcnt=1, plgid=2, api=941753644, oslocale=JPN, fng=CJ6FRE1208VMNRQG, req=/228228131/28RXAAPB1DqJj/RSLHL940/EMBXtu+/f+/Zeb/KV1Q/DTXZBFC94ZE5AOmz/mDCqB7zJOARDQO/166180202502162557303662649078783407201612&D09DEEFB7E78065D?NAM=SFlBUy1WQTE2&MFN=VkFJTyBDb3Jwb3JhdGlvbg==&MFM=VkpQMTEx&OLA=JPN&OLO=JPN, llmv=470, oslang=JPN, ctok=166180202502162557303662649078783407201612, resptime=119, epid=70D3B811A994477F957A90985109BE9D, campnid=0, remip=222.230.107.165, lictype=SOS, dbepid=70D3B811A994477F957A90985109BE9D, cid=nav1sasapppex02.msp.symantec.com1481215212435, status=10002, siid=240, skum=21356539, skup=01001230, psn=O749UPCN8KSY, cip=84.100.138.144, mname=VAIO Corporation, puid=1199, skuf=01100470, st=1481765738387, prbid=5967, mmodel=VJP111, clang=EN, pnfi=1120, cprbid=745, cpmv=7428, euip=222.230.107.165, prcdline=2, dvnm=HYAS-VA16, remdays=0, seatid=ah00s8CIdqUQyW2V, sasvid=106, xlsid=3730, baseactkey=186635290403122706518307794, coupon=651218, translogid=75033f05-9cf2-48e2-b924-fc2441d11d33}
2017-01-06 19:28:03,894 INFO [[ACTIVE] ExecuteThread: '10' for queue: 'weblogic.kernel.Default (self-tuning)'] com.symantec.olp.sas.log.SummaryLogAspect - {country=JP, svid=182, typ=sassum, vid=1120, xffmode=0, initallocdate=2014-12-15T01:08:24Z, xffip=222.230.107.165, seatcnt=1, plgid=2, api=228228131, oslocale=JPN, fng=1TA6U8RVL0JQXA0N, req=/228228131/28RXAAPB1DqJj/RSLHL940/EMBXtu+/f+/Zeb/KV1Q/DTXZBFC94ZE5AOmz/mDCqB7zJOARDQO/166180202502162557303662649078783407201612&D09DEEFB7E78065D?NAM=SFlBUy1WQTE2&MFN=VkFJTyBDb3Jwb3JhdGlvbg==&MFM=VkpQMTEx&OLA=JPN&OLO=JPN, lpmv=470, oslang=JPN, ctok=166180202502162557303662649078783407201612, resptime=119, epid=70D3B811A994477F957A90985109BE9D, campnid=0, remip=222.230.107.165, lictype=SOS, dbepid=70D3B811A994477F957A90985109BE9D, cid=nav1sasapppex02.msp.symantec.com1481215212435, status=0000, siid=240, skum=21356539, skup=01001230, psn=28MHHH2VPR4T, cip=222.230.107.165, mname=VAIO Corporation, puid=1199, skuf=01100470, st=1481765738387, prbid=5967, mmodel=VJP111, clang=EN, pnfi=1120, cprbid=745, cpmv=1027, euip=222.230.107.165, prcdline=2, dvnm=HYAS-VA16, remdays=0, seatid=StrlisGXA4yAt1ad, sasvid=130, xlsid=2820, baseactkey=028200017462383754273799438, coupon=123456, translogid=72df4536-6038-4d1c-b213-d0ff5c3c20fb}
我使用以下grok模式来匹配这些:
(?m)%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:severity} \[%{GREEDYDATA:thread}\] %{JAVACLASS:className} - \{%{GREEDYDATA:logmsg}\}
在此之后,我使用KV过滤器分割logmsg field中的字段,并仅包括我感兴趣的字段。我的问题是:如何验证我感兴趣的字段的格式?我需要提到的一件事是——日志在logmsg中包含不同数量的字段,这就是我使用GREEDYDATA的原因
我的Logstash.conf如下:
input {
kafka {
bootstrap_servers => "brokers_list"
topics => ["transaction-log"]
codec => "json"
}
}
filter {
grok {
match => [ "message", "(?m)%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:severity} \[%{GREEDYDATA:thread}\] %{JAVACLASS:className} - \{%{GREEDYDATA:logmsg}\}" ]
#overwrite => [ "message" ]
}
if "_grokparsefailure" not in [tags] {
kv {
field_split => ", "
source => "logmsg"
include_keys => ["api", "fng", "status", "cip", "cpmv", "translogid", "coupon", "baseactkey", "xlsid", "sasvid", "seatid", "srcHostname", "serverId" ]
allow_duplicate_values => false
remove_field => [ "message", "kafka.*", "logmsg"]
}
}
if [api] != "228228131" {
mutate { add_tag => "_grokparsefailure" }
}
date { # use timestamp from the log
"match" => [ "timestamp", "YYYY-MM-dd HH:mm:ss,SSS" ]
target => "@timestamp"
}
mutate {
remove_field => [ "timestamp" ] # remove unused stuff
}
}
output {
if "_grokparsefailure" not in [tags] {
kafka {
topic_id => "invalid topic"
bootstrap_servers => "brokers_list"
codec => json {}
}
} else {
kafka {
topic_id => "valid topic"
bootstrap_servers => "brokers_list"
codec => json { }
}
}
}
在使用KV filter解析之后,我正在检查api字段的值,如果它不等于228228131,那么我将向它添加_grokparsefailure标记,不再进一步处理
我希望能够验证Include_密钥中列出的字段的格式,例如cip,它是客户端IP?如何验证这些字段的数据格式?因为我的日志包含不同数量的字段,所以在grok级别我无法验证。只有在使用KV解析之后,我才能获得这些字段并验证它们。通过验证,我的意思是,验证它们是否符合ES索引中定义的类型。这是因为,如果它们不一致,我想将它们发送到卡夫卡中的无效主题
我应该使用ruby过滤器进行验证吗?如果是的话,你能给我一个样品吗?或者,我应该在KV解析之后重新生成事件,并在新创建的事件上再次使用grok
非常感谢您提供一些示例来展示这些内容。一个具体的示例可能会有所帮助,但您可以使用正则表达式检查许多内容:
if [myField] =~ /^[0-9]+$/ {
# it contains a number
}
或者像这样:
if [myField] =~ /^[a-z]+$/ {
# it contains lower lowercase letters
}
非常感谢。事实证明,我可以使用ruby过滤器在K,V对之间循环,然后可以执行您建议的验证。但我听说这不利于表演。而这个日志库一开始并不是为了验证。理想情况下,这些工作应该使用grok来完成。