elasticsearch 过滤数据时出现logstash grok问题,elasticsearch,logstash,logstash-grok,elasticsearch,Logstash,Logstash Grok" /> elasticsearch 过滤数据时出现logstash grok问题,elasticsearch,logstash,logstash-grok,elasticsearch,Logstash,Logstash Grok" />

elasticsearch 过滤数据时出现logstash grok问题

elasticsearch 过滤数据时出现logstash grok问题,elasticsearch,logstash,logstash-grok,elasticsearch,Logstash,Logstash Grok,我有一个基本上用于通过rm命令删除数据的数据,如下所示 ttmv516,19/05/21,03:59,00-mins,dvcm,dvcm 166820 4.1 0.0 4212 736 ? DN 03:59 0:01 rm -rf /dv/project/agile/mce_dev_folic/test/install.asan/install,/dv/svgwwt/commander/workspace4/dvfcronrun_IL-SFV-RHEL6.5-K4_kinite_agile_in

我有一个基本上用于通过rm命令删除数据的数据,如下所示

ttmv516,19/05/21,03:59,00-mins,dvcm,dvcm 166820 4.1 0.0 4212 736 ? DN 03:59 0:01 rm -rf /dv/project/agile/mce_dev_folic/test/install.asan/install,/dv/svgwwt/commander/workspace4/dvfcronrun_IL-SFV-RHEL6.5-K4_kinite_agile_invoke_dvfcronrun_at_given_site_50322
cat /etc/logstash/conf.d/rmlog.conf
input {
  file {
    path => [ "/data/rm_logs/*.txt" ]
    start_position => beginning
    sincedb_path => "/data/registry-1"
    max_open_files => 64000
    type => "rmlog"
  }
}

filter {
  if [type] == "rmlog" {
    grok {
     match => { "message" => "%{HOSTNAME:Hostname},%{DATE:Date},%{HOUR:dt_h}:%{MINUTE:dt_m},%{NUMBER:duration}-%{WORD:hm},%{USER:User},%{USER:User_1} %{NUMBER:Pid} %{NUMBER:float} %{NUMBER:float} %{NUMBER:Num_1} %{NUMBER:Num_2} %{DATA} (?:%{HOUR:dt_h1}:|)(?:%{MINUTE:dt_m1}|) (?:%{HOUR:dt_h2}:|)(?:%{MINUTE:dt_m2}|)%{GREEDYDATA:CMD},%{GREEDYDATA:PWD_PATH}" }
      add_field => [ "received_at", "%{@timestamp}" ]
      remove_field => [ "@version", "host", "message", "_type", "_index", "_score" ]
   }
  }
 }
output {
        if [type] == "rmlog" {
        elasticsearch {
                hosts => ["myhost.xyz.com:9200"]
                manage_template => false
                index => "pt-rmlog-%{+YYYY.MM.dd}"
  }
 }
}
我正在使用下面的logstash grok,它工作得很好,但直到最近我才看到两个奇怪的问题1)
\u grokparsefailure
另2)
主机名字段
没有正确显示,即它的初始字符不在那里,就像
ttmv516
看起来像
mv516

%{HOSTNAME:Hostname},%{DATE:Date},%{HOUR:dt_h}:%{MINUTE:dt_m},%{NUMBER:duration}-%{WORD:hm},%{USER:User},%{USER:User_1} %{NUMBER:Pid} %{NUMBER:float} %{NUMBER:float} %{NUMBER:Num_1} %{NUMBER:Num_2} %{DATA} (?:%{HOUR:dt_h1}:|)(?:%{MINUTE:dt_m1}|) (?:%{HOUR:dt_h2}:|)(?:%{MINUTE:dt_m2}|)%{GREEDYDATA:CMD},%{GREEDYDATA:PWD_PATH}
但是,在Kibana数据中使用grok调试器进行相同的测试是正确的

我的日志文件如下

ttmv516,19/05/21,03:59,00-mins,dvcm,dvcm 166820 4.1 0.0 4212 736 ? DN 03:59 0:01 rm -rf /dv/project/agile/mce_dev_folic/test/install.asan/install,/dv/svgwwt/commander/workspace4/dvfcronrun_IL-SFV-RHEL6.5-K4_kinite_agile_invoke_dvfcronrun_at_given_site_50322
cat /etc/logstash/conf.d/rmlog.conf
input {
  file {
    path => [ "/data/rm_logs/*.txt" ]
    start_position => beginning
    sincedb_path => "/data/registry-1"
    max_open_files => 64000
    type => "rmlog"
  }
}

filter {
  if [type] == "rmlog" {
    grok {
     match => { "message" => "%{HOSTNAME:Hostname},%{DATE:Date},%{HOUR:dt_h}:%{MINUTE:dt_m},%{NUMBER:duration}-%{WORD:hm},%{USER:User},%{USER:User_1} %{NUMBER:Pid} %{NUMBER:float} %{NUMBER:float} %{NUMBER:Num_1} %{NUMBER:Num_2} %{DATA} (?:%{HOUR:dt_h1}:|)(?:%{MINUTE:dt_m1}|) (?:%{HOUR:dt_h2}:|)(?:%{MINUTE:dt_m2}|)%{GREEDYDATA:CMD},%{GREEDYDATA:PWD_PATH}" }
      add_field => [ "received_at", "%{@timestamp}" ]
      remove_field => [ "@version", "host", "message", "_type", "_index", "_score" ]
   }
  }
 }
output {
        if [type] == "rmlog" {
        elasticsearch {
                hosts => ["myhost.xyz.com:9200"]
                manage_template => false
                index => "pt-rmlog-%{+YYYY.MM.dd}"
  }
 }
}
如有任何帮助建议,将不胜感激

编辑: 根据我的观察,其失败的消息

ttmv540,19/05/21,03:59,00-hrs,USER,USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND,/local/ntr/ttmv540.373
ttmv541,19/05/21,03:43,-mins,USER,USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND,/local/ntr/ttmv541.373
然而,我试图用下面的条件编辑grok,但它仍然会删除少数字段

input {
  file {
    path => [ "/data/rm_logs/*.txt" ]
    start_position => beginning
    max_open_files => 64000
    sincedb_path => "/data/registry-1"
    type => "rmlog"
  }
}
filter {
  if [type] == "rmlog" {
    grok {
     match => { "message" => "%{HOSTNAME:hostname},%{DATE:date},%{HOUR:time_h}:%{MINUTE:time_m},%{NUMBER:duration}-%{WORD:hm},%{USER:user},%{USER:group} %{NUMBER:pid} %{NUMBER:float} %{NUMBER:float} %{NUMBER:num_1} %{NUMBER:num_2} %{DATA} (?:%{HOUR:time_h1}:|)(?:%{MINUTE:time_m1}|) (?:%{HOUR:time_h2}:|)(?:%{MINUTE:time_m2}|)%{GREEDYDATA:cmd},%{GREEDYDATA:pwd}" }
      add_field => [ "received_at", "%{@timestamp}" ]
      remove_field => [ "@version", "host", "message", "_type", "_index", "_score" ]
   }
  }
  if "_grokparsefailure" in [tags] {
    grok {
      match => { "message" => "%{HOSTNAME:hostname},%{DATE:date},%{HOUR:time_h}:%{MINUTE:time_m},-%{WORD:duration},%{USER:user},%{USER:group}%{GREEDYDATA:cmd}" }
      add_field => [ "received_at", "%{@timestamp}" ]
      remove_field => [ "@version", "host", "message", "_type", "_index", "_score" ]
  }
 }
}
output {
        if [type] == "rmlog" {
        elasticsearch {
                hosts => ["myhost.xyz.com:9200"]
                manage_template => false
                index => "pt-rmlog-%{+YYYY.MM.dd}"
  }
 }
}
注意:看起来像是
\u grokparsefailure
标记在以下消息上起作用,但在另一条消息上仍然失败

1) 这很有效

 ttmv541,19/05/21,03:43,-mins,USER,USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND,/local/ntr/ttmv541.373

ttmv540,19/05/21,03:59,00-hrs,USER,USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND,/local/ntr/ttmv540.373
2) 文本日志的第二行失败,因为它有与之关联的
00小时
编号,现在无法满足以下两个条件:

%{HOSTNAME:hostname},%{DATE:date},%{HOUR:time_h}:%{MINUTE:time_m},-%{WORD:duration},%{USER:user},%{USER:group}%{GREEDYDATA:cmd}

我将处理分为两部分,一部分处理主机名和时间戳问题,另一部分处理行的其余部分。我发现这使维护更容易

剩下这两个输入:

ttmv541,19/05/21,03:43,-mins
ttmv540,19/05/21,03:59,00-hrs
您的两个模式将很好地匹配第一部分,因此问题在于您希望如何在之后解析出内容。在最初的模式中,数字部分使用
duration
,单位使用
hm
。在第二种模式中,您似乎将单位置于
持续时间中,这可能是不对的

如果没有更多信息,持续时间看起来是可选的,但您将始终拥有单位。这可以反映在您的模式中,例如:

(%{NUMBER:duration})?-%{WORD:hm}

还要注意的是,如果最终需要多个模式,则不必依赖grokparsefailure来使用它们-match->message可以接受一个数组。请参阅示例。

Logstash grok筛选器是否总是在每次输入时失败?或者它对某些人有效而对其他人无效?如果它只适用于某些条目,您能给出更多的示例吗?尝试修改脚本,这样您就不会删除“message”字段,并查看grok无法解析的条目上该字段的外观。它在某些情况下基本上是失败的,我有一些情况是失败的,让我更新这些情况。@mihomir,我只是用新的东西更新帖子applied@Alan,非常感谢你的建议+1,我正在测试这个结果,艾伦,如果你不介意的话,你能解释一下吗?另外请注意,如果你最终需要多个模式,你不必依赖grokparsefailure来使用它们-匹配->消息可以使用数组。
我添加了一个指向文档的链接。您可以为match提供多个要使用的模式。