elasticsearch,logstash,deis,Ruby,Regex,elasticsearch,Logstash,Deis" /> elasticsearch,logstash,deis,Ruby,Regex,elasticsearch,Logstash,Deis" />

Ruby 日志存储过滤和解析输出

Ruby 日志存储过滤和解析输出,ruby,regex,elasticsearch,logstash,deis,Ruby,Regex,elasticsearch,Logstash,Deis,环境 Ubuntu 16.04 日志存储5.2.1 弹性搜索5.1 我已经将我们的Deis平台配置为将日志发送到我们的Logstack节点,不会出现任何问题。然而,我对Ruby还是新手,正则表达式不是我的强项 日志示例: 2017-02-15T14:55:24UTC deis-logspout[1]: 2017/02/15 14:55:24 routing all to udp://x.x.x.x:xxxx\n 日志存储配置: input { tcp { port

环境

  • Ubuntu 16.04
  • 日志存储5.2.1
  • 弹性搜索5.1
我已经将我们的Deis平台配置为将日志发送到我们的Logstack节点,不会出现任何问题。然而,我对Ruby还是新手,正则表达式不是我的强项

日志示例

2017-02-15T14:55:24UTC deis-logspout[1]: 2017/02/15 14:55:24 routing all to udp://x.x.x.x:xxxx\n
日志存储配置:

input {
    tcp {
        port => 5000
        type => syslog
        codec => plain
    }
    udp {
        port => 5000
        type => syslog
        codec => plain
    }
}

filter {
    json {
        source => "syslog_message"
    }
}

output {
    elasticsearch { hosts => ["foo.somehost"] }
}
"@timestamp" => 2017-02-15T14:55:24.408Z,
"@version" => "1",
"host" => "x.x.x.x",
"message" => "2017-02-15T14:55:24UTC deis-logspout[1]: 2017/02/15 14:55:24 routing all to udp://x.x.x.x:xxxx\n",
"type" => "json"
"@timestamp" => 2017-02-15T14:55:24.408Z,
"@version" => "1",
"host" => "x.x.x.x",
"type" => "json"
"container" => "deis-logspout"
"severity level" => "Info"
"message" => "routing all to udp://x.x.x.x:xxxx\n"
2017-02-15T14:55:24UTC deis-logspout[1]: 2017/02/15 14:55:24 routing all to udp://x.x.x.x:xxxx\n
input {
    tcp {
        port => 5000
        type => deis
    }
    udp {
        port => 5000
        type => deis
    }
}

filter {
    grok {
        match => { "message" => "%{TIMESTAMP_ISO8601:timestamp}(UTC|CST|EST|PST) %{URIHOST:container}\[%{NUMBER}\]: %{YEAR}/%{MONTHNUM}/%{MONTHDAY} %{TIME} %{GREEDYDATA:msg}" }
    }
}

output {
    elasticsearch { hosts => ["foo.somehost"] }
}
{
    "container" => "deis-logspout",
    "msg" => "routing all to udp://x.x.x.x:xxxx",
    "@timestamp" => 2017-02-22T23:55:28.319Z,
    "port" => 62886,
    "@version" => "1",
    "host" => "10.0.2.2",
    "message" => "2017-02-15T14:55:24UTC deis-logspout[1]: 2017/02/15 14:55:24 routing all to udp://x.x.x.x:xxxx",
    "timestamp" => "2017-02-15T14:55:24"
    "type" => "deis"
}
弹性搜索输出:

input {
    tcp {
        port => 5000
        type => syslog
        codec => plain
    }
    udp {
        port => 5000
        type => syslog
        codec => plain
    }
}

filter {
    json {
        source => "syslog_message"
    }
}

output {
    elasticsearch { hosts => ["foo.somehost"] }
}
"@timestamp" => 2017-02-15T14:55:24.408Z,
"@version" => "1",
"host" => "x.x.x.x",
"message" => "2017-02-15T14:55:24UTC deis-logspout[1]: 2017/02/15 14:55:24 routing all to udp://x.x.x.x:xxxx\n",
"type" => "json"
"@timestamp" => 2017-02-15T14:55:24.408Z,
"@version" => "1",
"host" => "x.x.x.x",
"type" => "json"
"container" => "deis-logspout"
"severity level" => "Info"
"message" => "routing all to udp://x.x.x.x:xxxx\n"
2017-02-15T14:55:24UTC deis-logspout[1]: 2017/02/15 14:55:24 routing all to udp://x.x.x.x:xxxx\n
input {
    tcp {
        port => 5000
        type => deis
    }
    udp {
        port => 5000
        type => deis
    }
}

filter {
    grok {
        match => { "message" => "%{TIMESTAMP_ISO8601:timestamp}(UTC|CST|EST|PST) %{URIHOST:container}\[%{NUMBER}\]: %{YEAR}/%{MONTHNUM}/%{MONTHDAY} %{TIME} %{GREEDYDATA:msg}" }
    }
}

output {
    elasticsearch { hosts => ["foo.somehost"] }
}
{
    "container" => "deis-logspout",
    "msg" => "routing all to udp://x.x.x.x:xxxx",
    "@timestamp" => 2017-02-22T23:55:28.319Z,
    "port" => 62886,
    "@version" => "1",
    "host" => "10.0.2.2",
    "message" => "2017-02-15T14:55:24UTC deis-logspout[1]: 2017/02/15 14:55:24 routing all to udp://x.x.x.x:xxxx",
    "timestamp" => "2017-02-15T14:55:24"
    "type" => "deis"
}
期望的结果:

input {
    tcp {
        port => 5000
        type => syslog
        codec => plain
    }
    udp {
        port => 5000
        type => syslog
        codec => plain
    }
}

filter {
    json {
        source => "syslog_message"
    }
}

output {
    elasticsearch { hosts => ["foo.somehost"] }
}
"@timestamp" => 2017-02-15T14:55:24.408Z,
"@version" => "1",
"host" => "x.x.x.x",
"message" => "2017-02-15T14:55:24UTC deis-logspout[1]: 2017/02/15 14:55:24 routing all to udp://x.x.x.x:xxxx\n",
"type" => "json"
"@timestamp" => 2017-02-15T14:55:24.408Z,
"@version" => "1",
"host" => "x.x.x.x",
"type" => "json"
"container" => "deis-logspout"
"severity level" => "Info"
"message" => "routing all to udp://x.x.x.x:xxxx\n"
2017-02-15T14:55:24UTC deis-logspout[1]: 2017/02/15 14:55:24 routing all to udp://x.x.x.x:xxxx\n
input {
    tcp {
        port => 5000
        type => deis
    }
    udp {
        port => 5000
        type => deis
    }
}

filter {
    grok {
        match => { "message" => "%{TIMESTAMP_ISO8601:timestamp}(UTC|CST|EST|PST) %{URIHOST:container}\[%{NUMBER}\]: %{YEAR}/%{MONTHNUM}/%{MONTHDAY} %{TIME} %{GREEDYDATA:msg}" }
    }
}

output {
    elasticsearch { hosts => ["foo.somehost"] }
}
{
    "container" => "deis-logspout",
    "msg" => "routing all to udp://x.x.x.x:xxxx",
    "@timestamp" => 2017-02-22T23:55:28.319Z,
    "port" => 62886,
    "@version" => "1",
    "host" => "10.0.2.2",
    "message" => "2017-02-15T14:55:24UTC deis-logspout[1]: 2017/02/15 14:55:24 routing all to udp://x.x.x.x:xxxx",
    "timestamp" => "2017-02-15T14:55:24"
    "type" => "deis"
}

如何将信息从消息中提取到各个字段中?

不幸的是,您对您尝试执行的操作的假设有点偏离,但我们可以解决这个问题

您为JSON创建了一个正则表达式,但没有解析JSON。您只是在解析一个Bastarized syslog(请参阅中的syslogStreamer)日志,但实际上它不是syslog格式(RFC 5424或3164)。Logstash随后提供JSON输出

让我们分解消息,它将成为您解析的源。关键是您必须前后解析消息

消息:

input {
    tcp {
        port => 5000
        type => syslog
        codec => plain
    }
    udp {
        port => 5000
        type => syslog
        codec => plain
    }
}

filter {
    json {
        source => "syslog_message"
    }
}

output {
    elasticsearch { hosts => ["foo.somehost"] }
}
"@timestamp" => 2017-02-15T14:55:24.408Z,
"@version" => "1",
"host" => "x.x.x.x",
"message" => "2017-02-15T14:55:24UTC deis-logspout[1]: 2017/02/15 14:55:24 routing all to udp://x.x.x.x:xxxx\n",
"type" => "json"
"@timestamp" => 2017-02-15T14:55:24.408Z,
"@version" => "1",
"host" => "x.x.x.x",
"type" => "json"
"container" => "deis-logspout"
"severity level" => "Info"
"message" => "routing all to udp://x.x.x.x:xxxx\n"
2017-02-15T14:55:24UTC deis-logspout[1]: 2017/02/15 14:55:24 routing all to udp://x.x.x.x:xxxx\n
input {
    tcp {
        port => 5000
        type => deis
    }
    udp {
        port => 5000
        type => deis
    }
}

filter {
    grok {
        match => { "message" => "%{TIMESTAMP_ISO8601:timestamp}(UTC|CST|EST|PST) %{URIHOST:container}\[%{NUMBER}\]: %{YEAR}/%{MONTHNUM}/%{MONTHDAY} %{TIME} %{GREEDYDATA:msg}" }
    }
}

output {
    elasticsearch { hosts => ["foo.somehost"] }
}
{
    "container" => "deis-logspout",
    "msg" => "routing all to udp://x.x.x.x:xxxx",
    "@timestamp" => 2017-02-22T23:55:28.319Z,
    "port" => 62886,
    "@version" => "1",
    "host" => "10.0.2.2",
    "message" => "2017-02-15T14:55:24UTC deis-logspout[1]: 2017/02/15 14:55:24 routing all to udp://x.x.x.x:xxxx",
    "timestamp" => "2017-02-15T14:55:24"
    "type" => "deis"
}
  • 2017-02-15T14:55:24UTC
    :时间戳是一种常见的grok模式。这主要遵循时间戳_ISO8601,但并不完全遵循
  • deis logspout[1]
    :这将是您的日志源,您可以将其命名为容器。您可以使用grok模式URIHOST
  • 将全部路由到udp://x.x.x.x:xxxx\n
    :由于大多数日志的消息都包含在消息的末尾,因此您可以使用grok模式GREEDYDATA,它相当于正则表达式中的
    *
  • 2017/02/15 14:55:24
    :另一个时间戳(为什么?)
使用grok过滤器,您可以将语法(从正则表达式抽象)映射到语义(提取的值的名称)。例如
%{URIHOST:container}

你会看到,我对grok过滤器进行了一些修改,以使格式化工作正常。即使您不打算捕获结果,也必须匹配文本的某些部分。如果无法更改时间戳的格式以匹配标准,请创建自定义模式

配置:

input {
    tcp {
        port => 5000
        type => syslog
        codec => plain
    }
    udp {
        port => 5000
        type => syslog
        codec => plain
    }
}

filter {
    json {
        source => "syslog_message"
    }
}

output {
    elasticsearch { hosts => ["foo.somehost"] }
}
"@timestamp" => 2017-02-15T14:55:24.408Z,
"@version" => "1",
"host" => "x.x.x.x",
"message" => "2017-02-15T14:55:24UTC deis-logspout[1]: 2017/02/15 14:55:24 routing all to udp://x.x.x.x:xxxx\n",
"type" => "json"
"@timestamp" => 2017-02-15T14:55:24.408Z,
"@version" => "1",
"host" => "x.x.x.x",
"type" => "json"
"container" => "deis-logspout"
"severity level" => "Info"
"message" => "routing all to udp://x.x.x.x:xxxx\n"
2017-02-15T14:55:24UTC deis-logspout[1]: 2017/02/15 14:55:24 routing all to udp://x.x.x.x:xxxx\n
input {
    tcp {
        port => 5000
        type => deis
    }
    udp {
        port => 5000
        type => deis
    }
}

filter {
    grok {
        match => { "message" => "%{TIMESTAMP_ISO8601:timestamp}(UTC|CST|EST|PST) %{URIHOST:container}\[%{NUMBER}\]: %{YEAR}/%{MONTHNUM}/%{MONTHDAY} %{TIME} %{GREEDYDATA:msg}" }
    }
}

output {
    elasticsearch { hosts => ["foo.somehost"] }
}
{
    "container" => "deis-logspout",
    "msg" => "routing all to udp://x.x.x.x:xxxx",
    "@timestamp" => 2017-02-22T23:55:28.319Z,
    "port" => 62886,
    "@version" => "1",
    "host" => "10.0.2.2",
    "message" => "2017-02-15T14:55:24UTC deis-logspout[1]: 2017/02/15 14:55:24 routing all to udp://x.x.x.x:xxxx",
    "timestamp" => "2017-02-15T14:55:24"
    "type" => "deis"
}
输出:

input {
    tcp {
        port => 5000
        type => syslog
        codec => plain
    }
    udp {
        port => 5000
        type => syslog
        codec => plain
    }
}

filter {
    json {
        source => "syslog_message"
    }
}

output {
    elasticsearch { hosts => ["foo.somehost"] }
}
"@timestamp" => 2017-02-15T14:55:24.408Z,
"@version" => "1",
"host" => "x.x.x.x",
"message" => "2017-02-15T14:55:24UTC deis-logspout[1]: 2017/02/15 14:55:24 routing all to udp://x.x.x.x:xxxx\n",
"type" => "json"
"@timestamp" => 2017-02-15T14:55:24.408Z,
"@version" => "1",
"host" => "x.x.x.x",
"type" => "json"
"container" => "deis-logspout"
"severity level" => "Info"
"message" => "routing all to udp://x.x.x.x:xxxx\n"
2017-02-15T14:55:24UTC deis-logspout[1]: 2017/02/15 14:55:24 routing all to udp://x.x.x.x:xxxx\n
input {
    tcp {
        port => 5000
        type => deis
    }
    udp {
        port => 5000
        type => deis
    }
}

filter {
    grok {
        match => { "message" => "%{TIMESTAMP_ISO8601:timestamp}(UTC|CST|EST|PST) %{URIHOST:container}\[%{NUMBER}\]: %{YEAR}/%{MONTHNUM}/%{MONTHDAY} %{TIME} %{GREEDYDATA:msg}" }
    }
}

output {
    elasticsearch { hosts => ["foo.somehost"] }
}
{
    "container" => "deis-logspout",
    "msg" => "routing all to udp://x.x.x.x:xxxx",
    "@timestamp" => 2017-02-22T23:55:28.319Z,
    "port" => 62886,
    "@version" => "1",
    "host" => "10.0.2.2",
    "message" => "2017-02-15T14:55:24UTC deis-logspout[1]: 2017/02/15 14:55:24 routing all to udp://x.x.x.x:xxxx",
    "timestamp" => "2017-02-15T14:55:24"
    "type" => "deis"
}
您还可以对该项进行额外的修改,以删除@timestamp、@host等,因为默认情况下,它们是由Logstash提供的。另一个建议是使用将找到的任何时间戳转换为可用格式(更适合搜索)

根据日志格式,您可能需要稍微更改模式。我只举了一个例子。这还保留了原始的完整消息,因为在Logstash中执行的任何字段操作都是破坏性的(它们用同名字段覆盖值)

资源:


您的意思是将信息提取到四个不同的字段(如您所述的部分)?正如我从上面的ES输出中看到的那样,您已经完成了吗。你的问题是什么?关于方法?如果不简洁,很抱歉,我更新了问题。谢谢你的建议。我该如何配置Grok筛选器来解析消息?再次感谢Signus,我已经更新了上述问题以获得更好的上下文。