Python 使用正则表达式查找第二个可能的搜索组

Python 使用正则表达式查找第二个可能的搜索组,python,regex,Python,Regex,我正在用python做一个正则表达式()。 我目前的职位: ^[\d\W]{8}((?=.*Start execution of )|(?=.*Finish execution of)) 现在,如果行包含必要的子字符串,它将在行的开头查找时间,但是我想不出任何方法如何在搜索中创建第二个组,如果它在相应的行中,状态也将被找到(用方括号表示)。 例如,在以下行上使用正则表达式后: 01:01:01 - Start executing steps 1-3 01:01:03 - Start execu

我正在用python做一个正则表达式()。 我目前的职位:

^[\d\W]{8}((?=.*Start execution of )|(?=.*Finish execution of))
现在,如果行包含必要的子字符串,它将在行的开头查找时间,但是我想不出任何方法如何在搜索中创建第二个组,如果它在相应的行中,状态也将被找到(用方括号表示)。 例如,在以下行上使用正则表达式后:

01:01:01 - Start executing steps 1-3
01:01:03 - Start execution of steps group
01:01:04 - Start execution of step [1]
01:02:12 - Finish execution of step [1] with status [ok]
01:02:13 - Start execution of step [2]
01:02:48 - Finish execution of step [2] with status [ok]
01:02:48 - Start execution of step [3]
01:13:21 - Finish execution of step [3] with status [ok]
01:13:21 - Finish execution of steps group with status [success]
01:13:22 - Finish executing steps 1-3
我希望返回:

['01:01:03', 
 '01:01:04', 
('01:02:12', 'ok'), 
 '01:02:13', 
('01:02:48', 'ok'), 
 '01:02:48', 
('01:13:21', 'ok'), 
('01:13:21', 'success')]

您可以将以下内容附加到正则表达式:

.*?status \[(.*?)\]
因此,它成为:

^([\d\W]{8})(?=(?=.*Start execution of )|(?=.*Finish execution of))(?=.*?status \[(.*?)\])?
在线查看:

Regex

输出 基于您想要的输出,上面的正则表达式将完全满足您的需要。如您所见,它具有所需的所有计时,以及可选的状态;没有额外的东西

01:01:03
01:01:04
01:02:12, ok
01:02:13
01:02:48, ok
01:02:48
01:13:21, ok
01:13:21, success
02:01:02
02:01:02
02:03:10, ok
02:03:12
02:03:16, fail
02:03:16, failed
分歧 您将看到它在几个关键领域与您的有所不同

  • 您需要计时,因此必须将它们分组到括号
    (^[\d:{8})

  • 您只需要计时中的数字和冒号,因此正则表达式可以清楚地说明这一点<代码>[\d:][/code>vs
    [\d\W]

    注意:对于上面的(2),这也适用于
    (^[\d:]+)

  • 它删除了“前瞻”组。您不需要将这个前瞻分组,因为您不希望在Python代码中返回该文本。所以删除了额外的括号

  • 将2个lookahead合并为一个<代码>(?:开始|完成)执行

  • 在“前瞻”中添加了您缺少的状态要求
    (?:.\[([a-zA-Z]+)?
    。这应该被捕获,因此您需要方括号中的括号

  • 其他可行的正则表达式 类似的内容可能会有所帮助。
    (^.{8}+)
    (^[\S]+)
    也可以很好地捕获比上述时间更短的计时
    01:01:03
    01:01:04
    01:02:12, ok
    01:02:13
    01:02:48, ok
    01:02:48
    01:13:21, ok
    01:13:21, success
    02:01:02
    02:01:02
    02:03:10, ok
    02:03:12
    02:03:16, fail
    02:03:16, failed
    
    # Implicit status label, explicit letters for status
    (^[\d:]+)(?=.*(?:Start|Finish) execution of (?:.*\[([a-zA-Z]+))?)
    (^[\d:]{8})(?=.*(?:Start|Finish) execution of (?:.*\[([a-zA-Z]+))?)
    (^[\d\W]{8})(?=.*(?:Start|Finish) execution of (?:.*\[([a-zA-Z]+))?)
    
    # Explicit status label, explicit letters for status
    (^[\d:]+)(?=.*(?:Start|Finish) execution of (?:.*status \[([a-zA-Z]+))?)
    (^[\d:]{8})(?=.*(?:Start|Finish) execution of (?:.*status \[([a-zA-Z]+))?)
    (^[\d\W]{8})(?=.*(?:Start|Finish) execution of (?:.*status \[([a-zA-Z]+))?)
    
    # Explicit status label, implicit letters for status
    (^[\d:]+)(?=.*(?:Start|Finish) execution of (?:.*status \[(.*?)\])?)
    (^[\d:]{8})(?=.*(?:Start|Finish) execution of (?:.*status \[(.*?)\])?)
    (^[\d\W]{8})(?=.*(?:Start|Finish) execution of (?:.*status \[(.*?)\])?)
    
    # NOTE: FAILS - Implicit status label and implicit letter for status 
    # (^[\d:]+)(?=.*(?:Start|Finish) execution of (?:.*\[(.*?)\])?)
    # (^[\d:]{8})(?=.*(?:Start|Finish) execution of (?:.*\[(.*?)\])?)
    # (^[\d\W]{8})(?=.*(?:Start|Finish) execution of (?:.*\[(.*?)\])?)
    
    
    # Answers from other posters
    
    ^([\d\W]{8})(?=(?=.*Start execution of )|(?=.*Finish execution of))(?=.*?status \[(.*?)\])?
    
    
    # Customize
    
    # If you prefer the split lookahead, then you can customize any of the above with the middle section
    # For example...
    (^[\d:]+)(?=(?=.*Start execution of )|(?=.*Finish execution of)(?:.*\[([a-zA-Z]+))?)