Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/291.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python-单行与多行正则表达式_Python_Regex_Pattern Matching_Multiline - Fatal编程技术网

Python-单行与多行正则表达式

Python-单行与多行正则表达式,python,regex,pattern-matching,multiline,Python,Regex,Pattern Matching,Multiline,考虑到以下文本模式 #目标:流程报告时间戳,例如2011-09-21 15:45:00和succ中的前两个统计数据。统计行,例如:14381439 input_text = ''' # Process_Name ( 23387) Report at 2011-09-21 15:45:00.001 Type: Periodic #\n some line 1\n some line 2\n some other lines\n succ. statistics | 1

考虑到以下文本模式

#目标:流程报告时间戳,例如2011-09-21 15:45:00和succ中的前两个统计数据。统计行,例如:14381439

input_text = '''
# Process_Name     ( 23387) Report at 2011-09-21 15:45:00.001    Type:  Periodic    #\n
some line 1\n
some line 2\n
some other lines\n
succ. statistics |     1438     1439  99 |   3782245    3797376  99 |\n
some lines\n
Process_Name     ( 23387) Report at 2011-09-21 15:50:00.001    Type:  Periodic    #\n
some line 1\n
some line 2\n
some other lines\n
succ. statistics |     1436     1440  99 |   3782459    3797523  99 |\n
repeat the pattern several hundred times...
'''
我在一行一行迭代的时候让它工作了

def parse_file(file_handler, patterns):

    results = []
    for line in file_handler:
        for key in patterns.iterkeys():
            result = re.match(patterns[key], line)
            if result:
                results.append( result )

return results

patterns = {
    'report_date_time': re.compile('^# Process_Name\s*\(\s*\d+\) Report at (.*)\.[0-9]   {3}\s+Type:\s*Periodic\s*#\s*.*$'),
    'serv_term_stats': re.compile('^succ. statistics \|\s+(\d+)\s+   (\d+)+\s+\d+\s+\|\s+\d+\s+\d+\s+\d+\s+\|\s*$'),
    }
results = parse_file(fh, patterns)
返回

[('2011-09-21 15:40:00',),
('1425', '1428'),
('2011-09-21 15:45:00',),
('1438', '1439')]
但我的目标是输出一个元组列表

[('2011-09-21 15:40:00','1425', '1428'),
('2011-09-21 15:45:00', '1438', '1439')]
我尝试了几个带有初始模式和它们之间的惰性量词的组合,但不知道如何使用多行正则表达式捕获模式

# .+?   Lazy quantifier "match as few characters as possible (all characters allowed) until reaching the next expression"
pattern = '# Process_Name\s*\(\s*\d+\) Report at (.*)\.[0-9]{3}\s+Type:\s*Periodic.*?succ. statistics) \|\s+(\d+)\s+(\d+)+\s+\d+\s+\|\s+\d+\s+\d+\s+\d+\s+\|\s'
regex = re.compile(pattern, flags=re.MULTILINE)

data = file_handler.read()    
for match in regex.finditer(data):
    results = match.groups()

如何实现这一点?

使用
re.DOTALL
这样
将匹配任何字符,包括换行符:

import re

data = '''
# Process_Name     ( 23387) Report at 2011-09-21 15:45:00.001    Type:  Periodic    #\n
some line 1\n
some line 2\n
some other lines\n
succ. statistics |     1438     1439  99 |   3782245    3797376  99 |\n
some lines\n
repeat the pattern several hundred times...
'''

pattern = r'(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}).*?succ. statistics\s+\|\s+(\d+)\s+(\d+)'
regex = re.compile(pattern, flags=re.MULTILINE|re.DOTALL)

for match in regex.finditer(data):
    results = match.groups()
    print(results)

    # ('2011-09-21', '1438', '1439')

我没有答案,但为什么要将\n嵌入这样的多行字符串中?字符串中实际的换行符是换行符。对,Wooble,这是Linux中的换行符,所以只需添加它们来表示换行符(试图避免使用通常的\n或\r或\r\n?)哇。你跑得很快。谢谢你的回答和改进,谢谢你这样的大师!编辑:一个小碰撞,我确实需要保证一个非贪婪的量词,否则正则表达式将只捕获第一个时间戳,最后的统计数据,忽略中间的一千多行。因此,pattern=r'(\d{4}-\d{2}-\d{2}\d{2}:\d{2}:\d{2})。*?成功。统计信息\s+\\124;\ s+(\d+)\s+(\d+)