Python 多行regexp是否与迭代器兼容?
迭代器和生成器现在是内存效率代码的标准。现在,每当我需要处理长长的列表时,我都会尽可能多地应用它们。在通过迭代器对大文件(>500Mb)进行迭代时,是否有方法使用多行regexp 经典方式:Python 多行regexp是否与迭代器兼容?,python,regex,python-3.x,string,iterator,Python,Regex,Python 3.x,String,Iterator,迭代器和生成器现在是内存效率代码的标准。现在,每当我需要处理长长的列表时,我都会尽可能多地应用它们。在通过迭代器对大文件(>500Mb)进行迭代时,是否有方法使用多行regexp 经典方式: import re my_regex = re.compile(r'some text', re.MULTILINE) with open('my_large_file.txt', 'r') as f: text = f.read() # Stores the whole text in a li
import re
my_regex = re.compile(r'some text', re.MULTILINE)
with open('my_large_file.txt', 'r') as f:
text = f.read() # Stores the whole text in a list
# This is memory consuming
result = my_regex.findall(text)
import re
my_regex = re.compile(r'some text', re.MULTILINE)
with open('my_large_file.txt', 'r') as f:
for line in f: # Use the file as an iterator and
# loop over the lines
# What could I do?
迭代器方式:
import re
my_regex = re.compile(r'some text', re.MULTILINE)
with open('my_large_file.txt', 'r') as f:
text = f.read() # Stores the whole text in a list
# This is memory consuming
result = my_regex.findall(text)
import re
my_regex = re.compile(r'some text', re.MULTILINE)
with open('my_large_file.txt', 'r') as f:
for line in f: # Use the file as an iterator and
# loop over the lines
# What could I do?
最小工作示例:
import re
my_regex = re.compile(r'some text', re.MULTILINE)
with open('my_large_file.txt', 'r') as f:
text = f.read() # Stores the whole text in a list
# This is memory consuming
result = my_regex.findall(text)
import re
my_regex = re.compile(r'some text', re.MULTILINE)
with open('my_large_file.txt', 'r') as f:
for line in f: # Use the file as an iterator and
# loop over the lines
# What could I do?
大文件:
我的正则表达式:
您可以做的是迭代文件行,并将它们连接到一个运行文本,然后使用regexp进行测试。找到匹配项后,可以清空正在运行的文本
text = ''
results = []
with open('my_large_file.txt', 'r') as f:
for line in f:
text += line
result = my_regex.findall(text)
if result:
results += result
text = ''
您不需要正则表达式来匹配这种模式,只需检查该行是否都是连字符,设置标志,保存下一行,如果下一行也都是连字符,则附加到结果列表中即可。如果您的模式是任意的,您可能会使用迭代器方式。您认为multline选项在这方面有什么区别?是因为您希望在文件的行上进行迭代,而不是在中读取整个内容,但multiline regexp需要多行吗?