Filter 使用logstash筛选器分析字符串中的文本

Filter 使用logstash筛选器分析字符串中的文本,filter,logstash,Filter,Logstash,我有一个Apache访问日志,我想从请求字段中解析出一些文本: GET /foo/bar?contentId=ABC&_=1212121212 HTTP/1.1" 我想做的是提取1212122并将其分配给一个值,但该值是基于前缀ABC&\的(因此我想我需要一个if语句或其他东西)。前缀可以采用其他形式(例如DDD和) 所以基本上我想说 if (prefix == ABC&_) ABCID = 1212121212 elseif (prefix == DDD&_)

我有一个Apache访问日志,我想从请求字段中解析出一些文本:

GET /foo/bar?contentId=ABC&_=1212121212 HTTP/1.1"
我想做的是提取1212122并将其分配给一个值,但该值是基于前缀ABC&\的(因此我想我需要一个if语句或其他东西)。前缀可以采用其他形式(例如DDD和)

所以基本上我想说

if (prefix == ABC&_)
   ABCID = 1212121212
elseif (prefix == DDD&_)
   DDDID = <whatever value>
else
   do nothing
if(前缀==ABC&)
ABCID=12121212
elseif(前缀==DDD&)
DDDID=
其他的
无所事事
我一直在努力在logstash中构建正确的过滤器,以便根据前缀提取id。任何帮助都会很好


谢谢

为此,您将使用grok过滤器

例如:

artur@pandaadb:~/dev/logstash$ ./logstash-2.3.2/bin/logstash -f conf2
Settings: Default pipeline workers: 8
Pipeline main started
GET /foo/bar?contentId=ABC&_=1212121212 HTTP/1.1"
{
       "message" => "GET /foo/bar?contentId=ABC&_=1212121212 HTTP/1.1\"",
      "@version" => "1",
    "@timestamp" => "2016-07-28T15:59:12.787Z",
          "host" => "pandaadb",
        "prefix" => "ABC&_",
            "id" => "1212121212"
}
这是您的示例输入,解析出您的前缀和Id

这里不需要if,因为GROK过滤器的正则表达式会处理它

但是,您可以(如果需要将其放在不同的字段中)分析您的字段并将其添加到不同的字段中

这将是这样的输出:

GET /foo/bar?contentId=ABC&_=1212121212 HTTP/1.1"
{
       "message" => "GET /foo/bar?contentId=ABC&_=1212121212 HTTP/1.1\"",
      "@version" => "1",
    "@timestamp" => "2016-07-28T16:05:07.442Z",
          "host" => "pandaadb",
        "prefix" => "ABC&_",
            "id" => "1212121212",
         "ABCID" => "1212121212"
}
GET /foo/bar?contentId=DDD&_=1212121212 HTTP/1.1"
{
       "message" => "GET /foo/bar?contentId=DDD&_=1212121212 HTTP/1.1\"",
      "@version" => "1",
    "@timestamp" => "2016-07-28T16:05:20.026Z",
          "host" => "pandaadb",
        "prefix" => "DDD&_",
            "id" => "1212121212",
         "DDDID" => "1212121212"
}
filter {
    grok {
        match => {"message" => ".*contentId=%{GREEDYDATA:prefix}=%{NUMBER:id}"}

    }

    if [prefix] =~ "ABC" {
         mutate {
            add_field => {"ABCID" => "%{id}"}
         }
    }

    if [prefix] =~ "DDD" {
         mutate {
            add_field => {"DDDID" => "%{id}"}
         }
    }

}
我为此使用的过滤器如下所示:

GET /foo/bar?contentId=ABC&_=1212121212 HTTP/1.1"
{
       "message" => "GET /foo/bar?contentId=ABC&_=1212121212 HTTP/1.1\"",
      "@version" => "1",
    "@timestamp" => "2016-07-28T16:05:07.442Z",
          "host" => "pandaadb",
        "prefix" => "ABC&_",
            "id" => "1212121212",
         "ABCID" => "1212121212"
}
GET /foo/bar?contentId=DDD&_=1212121212 HTTP/1.1"
{
       "message" => "GET /foo/bar?contentId=DDD&_=1212121212 HTTP/1.1\"",
      "@version" => "1",
    "@timestamp" => "2016-07-28T16:05:20.026Z",
          "host" => "pandaadb",
        "prefix" => "DDD&_",
            "id" => "1212121212",
         "DDDID" => "1212121212"
}
filter {
    grok {
        match => {"message" => ".*contentId=%{GREEDYDATA:prefix}=%{NUMBER:id}"}

    }

    if [prefix] =~ "ABC" {
         mutate {
            add_field => {"ABCID" => "%{id}"}
         }
    }

    if [prefix] =~ "DDD" {
         mutate {
            add_field => {"DDDID" => "%{id}"}
         }
    }

}
我希望这能说明如何去做。您可以使用此选项测试grok正则表达式:

玩得开心


Artur

为此,您将使用grok过滤器

例如:

artur@pandaadb:~/dev/logstash$ ./logstash-2.3.2/bin/logstash -f conf2
Settings: Default pipeline workers: 8
Pipeline main started
GET /foo/bar?contentId=ABC&_=1212121212 HTTP/1.1"
{
       "message" => "GET /foo/bar?contentId=ABC&_=1212121212 HTTP/1.1\"",
      "@version" => "1",
    "@timestamp" => "2016-07-28T15:59:12.787Z",
          "host" => "pandaadb",
        "prefix" => "ABC&_",
            "id" => "1212121212"
}
这是您的示例输入,解析出您的前缀和Id

这里不需要if,因为GROK过滤器的正则表达式会处理它

但是,您可以(如果需要将其放在不同的字段中)分析您的字段并将其添加到不同的字段中

这将是这样的输出:

GET /foo/bar?contentId=ABC&_=1212121212 HTTP/1.1"
{
       "message" => "GET /foo/bar?contentId=ABC&_=1212121212 HTTP/1.1\"",
      "@version" => "1",
    "@timestamp" => "2016-07-28T16:05:07.442Z",
          "host" => "pandaadb",
        "prefix" => "ABC&_",
            "id" => "1212121212",
         "ABCID" => "1212121212"
}
GET /foo/bar?contentId=DDD&_=1212121212 HTTP/1.1"
{
       "message" => "GET /foo/bar?contentId=DDD&_=1212121212 HTTP/1.1\"",
      "@version" => "1",
    "@timestamp" => "2016-07-28T16:05:20.026Z",
          "host" => "pandaadb",
        "prefix" => "DDD&_",
            "id" => "1212121212",
         "DDDID" => "1212121212"
}
filter {
    grok {
        match => {"message" => ".*contentId=%{GREEDYDATA:prefix}=%{NUMBER:id}"}

    }

    if [prefix] =~ "ABC" {
         mutate {
            add_field => {"ABCID" => "%{id}"}
         }
    }

    if [prefix] =~ "DDD" {
         mutate {
            add_field => {"DDDID" => "%{id}"}
         }
    }

}
我为此使用的过滤器如下所示:

GET /foo/bar?contentId=ABC&_=1212121212 HTTP/1.1"
{
       "message" => "GET /foo/bar?contentId=ABC&_=1212121212 HTTP/1.1\"",
      "@version" => "1",
    "@timestamp" => "2016-07-28T16:05:07.442Z",
          "host" => "pandaadb",
        "prefix" => "ABC&_",
            "id" => "1212121212",
         "ABCID" => "1212121212"
}
GET /foo/bar?contentId=DDD&_=1212121212 HTTP/1.1"
{
       "message" => "GET /foo/bar?contentId=DDD&_=1212121212 HTTP/1.1\"",
      "@version" => "1",
    "@timestamp" => "2016-07-28T16:05:20.026Z",
          "host" => "pandaadb",
        "prefix" => "DDD&_",
            "id" => "1212121212",
         "DDDID" => "1212121212"
}
filter {
    grok {
        match => {"message" => ".*contentId=%{GREEDYDATA:prefix}=%{NUMBER:id}"}

    }

    if [prefix] =~ "ABC" {
         mutate {
            add_field => {"ABCID" => "%{id}"}
         }
    }

    if [prefix] =~ "DDD" {
         mutate {
            add_field => {"DDDID" => "%{id}"}
         }
    }

}
我希望这能说明如何去做。您可以使用此选项测试grok正则表达式:

玩得开心

阿图尔