Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/regex/17.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/ruby/22.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Regex 日志格式具有多个ip';s_Regex_Ruby_Regex Greedy_Fluentd - Fatal编程技术网

Regex 日志格式具有多个ip';s

Regex 日志格式具有多个ip';s,regex,ruby,regex-greedy,fluentd,Regex,Ruby,Regex Greedy,Fluentd,我对fluenTd日志解析器有问题。当有2个ip时,以下配置工作正常 expression /^(?<client_ip>[^ ]*)(?:, (?<lb_ip>[^ ]*))? (?<ident>[^ ]*) (?<user>[^ ]*) \[(?<time>[^ ]* [^ ]*)\] "(?<method>\S+)(?: +(?<path>[^ ]*) (?<protocol>[A-Z]{1,

我对fluenTd日志解析器有问题。当有2个ip时,以下配置工作正常

expression  /^(?<client_ip>[^ ]*)(?:, (?<lb_ip>[^ ]*))? (?<ident>[^ ]*) (?<user>[^ ]*) \[(?<time>[^ ]* [^ ]*)\] "(?<method>\S+)(?: +(?<path>[^ ]*) (?<protocol>[A-Z]{1,}[^ ]*)+\S*)?" (?<code>[^ ]*) (?<size>[^ ]*)/
当有3个ip时,我得到一个模式不匹配警告

这不匹配:

176.30.235.70, 165.225.70.200, 10.25.1.120 - - [09/Dec/2019:13:30:57 +0000] \"GET /comet_request/71142769981/F1551018730440IY5YNF?F1551018721447ZVKYZ4=1551018733078&_=1575898029473 HTTP/1.1\" 200 0 0 0
我尝试了下面的正则表达式,但不起作用。有人能帮忙吗

expression /^(?<client_ip>[^ ]*)(?:, (?<proxy_ip>[^ ]*))? (?:, (?<lb_ip>[^ ]*))? (?<ident>[^ ]*) (?<user>[^ ]*) \[(?<time>[^ ]* [^ ]*)\] "(?<method>\S+)(?: +(?<path>[^ ]*) (?<protocol>[A-Z]{1,}[^ ]*)+\S*)?" (?<code>[^ ]*) (?<size>[^ ]*)$/
expression/^(?[^]*)(?:,(?[^]*)?(?:, (?[^ ]*))? (?[^]*)(?[^]*)\[(?[^]*[^]*)\]”(?\S+(:+(?[^]*)(?[A-Z]{1,}[^]*)+\S*)(?
[^]*)(?[^]*))$/

您需要使用更具体的模式匹配IP,如
[\d.]+
[^,]+
,并确保您还匹配最后两个字段(您没有匹配它们,
$
需要行/字符串结尾)

使用像这样的模式

^(?<client_ip>[^ ,]+)(?:, +(?<proxy_ip>[^ ,]+))?(?:, +(?<lb_ip>[^ ,]+))? (?<ident>[^ ]+) (?<user>[^ ]+) \[(?<time>[^\]\[ ]* [^\]\[ ]*)\] "(?<method>\S+)(?: +(?<path>\S+) (?<protocol>[A-Z][^" ]*)[^"]*)?" (?<code>\S+) (?<size>\S+) \S+ \S+$
^(?[^,]+)(?:,+(?[^,]+)?(?:,+(?[^,]+))?(?:,+(?[^,]+)?(?[^]+)(?[^]+)\[(?[^\]\[]*[^\]\[]*])”(?\S+(:+(?\S+)([A-Z][^”]*[^”]*)?“(?
\S+(\S+)\S+\S+)”+$


IP匹配部分是
^(?[^,]+)(?:,+(?[^,]+)?(?:,+(?[^,]+)?
,请查看
[^,]+
匹配1+个字符而不是空格,并且
\S+
添加在模式的末尾(如果这些是数字,您可以使用
\d+\d+
,并根据需要捕获它们).

示例字符串

让我们考虑你的问题的缩写版本,集中在前四个命名范围(处理剩余的命名范围很简单)。

以自由间距模式编写的正则表达式

以下正则表达式可用于提取命名范围的内容,前提是字符串具有有效的结构。请注意,它要求IPv4地址和日期时间字符串具有有效的模式(而不仅仅是
[^]+
[^]+[^]+
)。我已经以自由间距模式编写了正则表达式,以使其能够自我记录

r = /
    \A              # match the beginning of the string 
    (?<client_ip>   # begin a capture group named client_ip
      \g<user_ip>   # evaluate the subexpression (capture group) named user_ip
    )               # end capture group client_ip
    (?:             # begin a non-capture group
      ,[ ]          # match the string ', '
      (?<lb_ip>     # begin a capture group named lb_ip
        \g<user_ip> # evaluate the subexpression (capture group) named user_ip
      )             # end capture group lb_ip
    )?              # end non-capture group and optionally execute it
    (?:             # begin a non-capture group
      ,[ ]          # match the string ', '
      (?<user_ip>   # begin a capture group named user_ip
        \d{1,3}     # match 1-3 digits 
        (?:         # begin a non-capture group
          \.\d{1,3} # match a period followed by 1-3 digits
        ){3}        # end the non-capture group and execute 3 times
      )             # end capture group user_id
    )               # end non-capture group
    [ ]-[ ]-[ ]\[   # match the string ' - - ['
    (?<time>        # begin a capture group named time 
      \d{2}\/\p{L}{3}\/\d{4}:\d{2}:\d{2}:\d{2}[ ]\+\d{4}
                    # match a time string
    )               # end capture group time                    
    \]              # match string ']'
    \z              # match end of string
    /x              # free-spacing regex definition mode

子表达式调用

我没有为前两个命名的捕获组中的每一个复制捕获组
user\u ip
,而是简单地使用了
\g
,它实际上告诉正则表达式引擎在引用
\g
的位置评估捕获组(子表达式)
user\u ip
的内容。在的文档中搜索“子表达式调用”

请注意,子表达式调用是前瞻性的。假设我们写下:

r = /
    \A 
    (?<client_ip>\d{1,3}(?:\.\d{1,3}){3})
    (?:,[ ](?<lb_ip>\g<client_ip>))?
    (?:,[ ](?<user_ip>\g<client_ip>))
    [ ]-[ ]-[ ]\[
    (?<time>\d{2}\/\p{L}{3}\/\d{4}:\d{2}:\d{2}:\d{2}[ ]\+\d{4}) 
    \]
    \z
    /x

如图所示,捕获组
client_ip
的内容设置为等于
user_ip
的内容。解释了这种行为的原因(查看“在PCRE中,而不是Perl中,一个有趣的转折点是…”以及该文档的其他引用部分)

常规编写的正则表达式

正则表达式按惯例编写如下:

/\A(?<client_ip>\g<user_ip>)(?:, (?<lb_ip>\g<user_ip>))?(?:, (?<user_ip>\d{1,3}(?:\.\d{1,3}){3})) - - \[(?<time>\d{2}\/\p{L}{3}\/\d{4}:\d{2}:\d{2}:\d{2} \+\d{4})\]\z/
/\A(?\g)(?:,(?\g))?(?:,(?\d{1,3}(?:\.\d{1,3}{3}))-\[(?\d{2}\/\p{L}{3}\/\d{4}:\d{2}:\d{2}\+\d{4}]\z/

请注意,当正则表达式以自由间距模式写入时,上面有空格的地方有包含单个空格的字符类。这是必要的,因为在自由间距模式下,在解析表达式之前会删除未受保护的空格。另一种保护空间的方法是避开它们(
\
)。如果希望使用空格而不是空格,则可以使用
\s

谢谢。它起作用了。
r = /
    \A              # match the beginning of the string 
    (?<client_ip>   # begin a capture group named client_ip
      \g<user_ip>   # evaluate the subexpression (capture group) named user_ip
    )               # end capture group client_ip
    (?:             # begin a non-capture group
      ,[ ]          # match the string ', '
      (?<lb_ip>     # begin a capture group named lb_ip
        \g<user_ip> # evaluate the subexpression (capture group) named user_ip
      )             # end capture group lb_ip
    )?              # end non-capture group and optionally execute it
    (?:             # begin a non-capture group
      ,[ ]          # match the string ', '
      (?<user_ip>   # begin a capture group named user_ip
        \d{1,3}     # match 1-3 digits 
        (?:         # begin a non-capture group
          \.\d{1,3} # match a period followed by 1-3 digits
        ){3}        # end the non-capture group and execute 3 times
      )             # end capture group user_id
    )               # end non-capture group
    [ ]-[ ]-[ ]\[   # match the string ' - - ['
    (?<time>        # begin a capture group named time 
      \d{2}\/\p{L}{3}\/\d{4}:\d{2}:\d{2}:\d{2}[ ]\+\d{4}
                    # match a time string
    )               # end capture group time                    
    \]              # match string ']'
    \z              # match end of string
    /x              # free-spacing regex definition mode
    m1 = str1.match(r)
    m1.named_captures
      #=> {"client_ip"=>"148.165.41.129",
      #    "lb_ip"=>nil,
      #    "user_ip"=>"10.25.1.120",
      #    "time"=>"09/Dec/2019:16:22:23 +0000"} 
    m2 = str2.match(r)
    m2.named_captures
      #=> {"client_ip"=>"176.30.235.70",
      #    "lb_ip"=>"165.225.70.200",
      #    "user_ip"=>"10.25.1.120",
      #    "time"=>"09/Dec/2019:13:30:57 +0000"}
r = /
    \A 
    (?<client_ip>\d{1,3}(?:\.\d{1,3}){3})
    (?:,[ ](?<lb_ip>\g<client_ip>))?
    (?:,[ ](?<user_ip>\g<client_ip>))
    [ ]-[ ]-[ ]\[
    (?<time>\d{2}\/\p{L}{3}\/\d{4}:\d{2}:\d{2}:\d{2}[ ]\+\d{4}) 
    \]
    \z
    /x
    m1 = str1.match(r)
    m1.named_captures
      #=> {"client_ip"=>"10.25.1.120",
      #    "lb_ip"=>nil,
      #    "user_ip"=>"10.25.1.120", 
      #    "time"=>"09/Dec/2019:16:22:23 +0000"}
    m2 = str2.match(r)
    m2.named_captures
      #=> {"client_ip"=>"10.25.1.120",
      #    "lb_ip"=>"165.225.70.200",
      #    "user_ip"=>"10.25.1.120",
      #    "time"=>"09/Dec/2019:13:30:57 +0000"} 
/\A(?<client_ip>\g<user_ip>)(?:, (?<lb_ip>\g<user_ip>))?(?:, (?<user_ip>\d{1,3}(?:\.\d{1,3}){3})) - - \[(?<time>\d{2}\/\p{L}{3}\/\d{4}:\d{2}:\d{2}:\d{2} \+\d{4})\]\z/