Python 多行regexp是否与迭代器兼容?

Python 多行regexp是否与迭代器兼容?,python,regex,python-3.x,string,iterator,Python,Regex,Python 3.x,String,Iterator,迭代器和生成器现在是内存效率代码的标准。现在,每当我需要处理长长的列表时,我都会尽可能多地应用它们。在通过迭代器对大文件(>500Mb)进行迭代时,是否有方法使用多行regexp 经典方式: import re my_regex = re.compile(r'some text', re.MULTILINE) with open('my_large_file.txt', 'r') as f: text = f.read() # Stores the whole text in a li

迭代器和生成器现在是内存效率代码的标准。现在,每当我需要处理长长的列表时,我都会尽可能多地应用它们。在通过迭代器对大文件(>500Mb)进行迭代时,是否有方法使用多行regexp

经典方式:

import re
my_regex = re.compile(r'some text', re.MULTILINE)

with open('my_large_file.txt', 'r') as f:
    text = f.read() # Stores the whole text in a list
                    # This is memory consuming    
result = my_regex.findall(text) 
import re
my_regex = re.compile(r'some text', re.MULTILINE)

with open('my_large_file.txt', 'r') as f:
    for line in f: # Use the file as an iterator and
                   # loop over the lines
                   # What could I do?
迭代器方式:

import re
my_regex = re.compile(r'some text', re.MULTILINE)

with open('my_large_file.txt', 'r') as f:
    text = f.read() # Stores the whole text in a list
                    # This is memory consuming    
result = my_regex.findall(text) 
import re
my_regex = re.compile(r'some text', re.MULTILINE)

with open('my_large_file.txt', 'r') as f:
    for line in f: # Use the file as an iterator and
                   # loop over the lines
                   # What could I do?
最小工作示例:

import re
my_regex = re.compile(r'some text', re.MULTILINE)

with open('my_large_file.txt', 'r') as f:
    text = f.read() # Stores the whole text in a list
                    # This is memory consuming    
result = my_regex.findall(text) 
import re
my_regex = re.compile(r'some text', re.MULTILINE)

with open('my_large_file.txt', 'r') as f:
    for line in f: # Use the file as an iterator and
                   # loop over the lines
                   # What could I do?
大文件:

我的正则表达式:


您可以做的是迭代文件行,并将它们连接到一个运行文本,然后使用regexp进行测试。找到匹配项后,可以清空正在运行的文本

text = ''
results = []
with open('my_large_file.txt', 'r') as f:
    for line in f:
        text += line
        result = my_regex.findall(text)
        if result:
            results += result
            text = ''

您不需要正则表达式来匹配这种模式,只需检查该行是否都是连字符,设置标志,保存下一行,如果下一行也都是连字符,则附加到结果列表中即可。如果您的模式是任意的,您可能会使用迭代器方式。您认为multline选项在这方面有什么区别?是因为您希望在文件的行上进行迭代,而不是在中读取整个内容,但multiline regexp需要多行吗?