Python 多行regexp是否与迭代器兼容？_Python_Regex_Python 3.x_String_Iterator

Python 多行regexp是否与迭代器兼容？

python regex python-3.x string

Python 多行regexp是否与迭代器兼容？,python,regex,python-3.x,string,iterator,Python,Regex,Python 3.x,String,Iterator,迭代器和生成器现在是内存效率代码的标准。现在，每当我需要处理长长的列表时，我都会尽可能多地应用它们。在通过迭代器对大文件（>500Mb）进行迭代时，是否有方法使用多行regexp 经典方式： import re my_regex = re.compile(r'some text', re.MULTILINE) with open('my_large_file.txt', 'r') as f: text = f.read() # Stores the whole text in a li

迭代器和生成器现在是内存效率代码的标准。现在，每当我需要处理长长的列表时，我都会尽可能多地应用它们。在通过迭代器对大文件（>500Mb）进行迭代时，是否有方法使用多行regexp

经典方式：

import re
my_regex = re.compile(r'some text', re.MULTILINE)

with open('my_large_file.txt', 'r') as f:
    text = f.read() # Stores the whole text in a list
                    # This is memory consuming    
result = my_regex.findall(text)

import re
my_regex = re.compile(r'some text', re.MULTILINE)

with open('my_large_file.txt', 'r') as f:
    for line in f: # Use the file as an iterator and
                   # loop over the lines
                   # What could I do?

迭代器方式：

import re
my_regex = re.compile(r'some text', re.MULTILINE)

with open('my_large_file.txt', 'r') as f:
    text = f.read() # Stores the whole text in a list
                    # This is memory consuming    
result = my_regex.findall(text)

import re
my_regex = re.compile(r'some text', re.MULTILINE)

with open('my_large_file.txt', 'r') as f:
    for line in f: # Use the file as an iterator and
                   # loop over the lines
                   # What could I do?

最小工作示例：

import re
my_regex = re.compile(r'some text', re.MULTILINE)

with open('my_large_file.txt', 'r') as f:
    text = f.read() # Stores the whole text in a list
                    # This is memory consuming    
result = my_regex.findall(text)

import re
my_regex = re.compile(r'some text', re.MULTILINE)

with open('my_large_file.txt', 'r') as f:
    for line in f: # Use the file as an iterator and
                   # loop over the lines
                   # What could I do?

大文件：

我的正则表达式：

您可以做的是迭代文件行，并将它们连接到一个运行文本，然后使用regexp进行测试。找到匹配项后，可以清空正在运行的文本

text = ''
results = []
with open('my_large_file.txt', 'r') as f:
    for line in f:
        text += line
        result = my_regex.findall(text)
        if result:
            results += result
            text = ''

您不需要正则表达式来匹配这种模式，只需检查该行是否都是连字符，设置标志，保存下一行，如果下一行也都是连字符，则附加到结果列表中即可。如果您的模式是任意的，您可能会使用迭代器方式。您认为multline选项在这方面有什么区别？是因为您希望在文件的行上进行迭代，而不是在中读取整个内容，但multiline regexp需要多行吗？