Python 减少使用'获得的iterable中的计数器;枚举';打过电话后
我正在使用Python读取一个文件,文件中有用“#”字符括起来的部分:Python 减少使用'获得的iterable中的计数器;枚举';打过电话后,python,python-3.x,seek,enumerate,tell,Python,Python 3.x,Seek,Enumerate,Tell,我正在使用Python读取一个文件,文件中有用“#”字符括起来的部分: #HEADER1, SOME EXTRA INFO data first section 1 2 1 233 ... // THIS IS A COMMENT #HEADER2, SECOND SECTION 452 134 // ANOTHER COMMENT ... #HEADER3, THIRD SECTION 现在,我编写了如下代码来读取该文件: with open(filename) as fh: e
#HEADER1, SOME EXTRA INFO
data first section
1 2
1 233
...
// THIS IS A COMMENT
#HEADER2, SECOND SECTION
452
134
// ANOTHER COMMENT
...
#HEADER3, THIRD SECTION
现在,我编写了如下代码来读取该文件:
with open(filename) as fh:
enumerated = enumerate(iter(fh.readline, ''), start=1)
for lino, line in enumerated:
# handle special section
if line.startswith('#'):
print("="*40)
print(line)
while True:
start = fh.tell()
lino, line = next(enumerated)
if line.startswith('#'):
fh.seek(start)
break
print("[{}] {}".format(lino,line))
# create enumerated object
e = EnumeratedFile(fh)
header = ""
for lineno, line, in e:
print("[{}] {}".format(lineno, line))
header = line.rstrip()
# HEADER1
if header.startswith("#HEADER1"):
# process header 1 lines
while e.section():
# get node line
lineno, line = next(e)
# do whatever needs to be done with the line
elif header.startswith("#HEADER2"):
# etc.
输出为:
========================================
#HEADER1, SOME EXTRA INFO
[2] data first section
[3] 1 2
[4] 1 233
[5] ...
[6] // THIS IS A COMMENT
========================================
#HEADER2, SECOND SECTION
[9] 452
[10] 134
[11] // ANOTHER COMMENT
[12] ...
========================================
#HEADER3, THIRD SECTION
现在您看到行计数器lino
不再有效,因为我正在使用seek
。此外,在中断循环之前减少它也无济于事,因为每次调用next
时,该计数器都会增加。那么,在Python3.x中有没有一种优雅的方法来解决这个问题呢?另外,是否有更好的方法来解决StopIteration
,而无需在块中放入pass
语句
更新
到目前为止,我已经根据@Dunes提出的建议采用了一个实现。我不得不把它改了一点,这样我就可以向前看,看看是否有一个新的部分开始了。我不知道是否有更好的方法,所以请发表评论:
类枚举文件:
def __init__(self, fh, lineno_start=1):
self.fh = fh
self.lineno = lineno_start
def __iter__(self):
return self
def __next__(self):
result = self.lineno, self.fh.readline()
if result[1] == '':
raise StopIteration
self.lineno += 1
return result
def mark(self):
self.marked_lineno = self.lineno
self.marked_file_position = self.fh.tell()
def recall(self):
self.lineno = self.marked_lineno
self.fh.seek(self.marked_file_position)
def section(self):
pos = self.fh.tell()
char = self.fh.read(1)
self.fh.seek(pos)
return char != '#'
然后读取文件并按如下方式处理每个部分:
with open(filename) as fh:
enumerated = enumerate(iter(fh.readline, ''), start=1)
for lino, line in enumerated:
# handle special section
if line.startswith('#'):
print("="*40)
print(line)
while True:
start = fh.tell()
lino, line = next(enumerated)
if line.startswith('#'):
fh.seek(start)
break
print("[{}] {}".format(lino,line))
# create enumerated object
e = EnumeratedFile(fh)
header = ""
for lineno, line, in e:
print("[{}] {}".format(lineno, line))
header = line.rstrip()
# HEADER1
if header.startswith("#HEADER1"):
# process header 1 lines
while e.section():
# get node line
lineno, line = next(e)
# do whatever needs to be done with the line
elif header.startswith("#HEADER2"):
# etc.
不能更改enumerate()
iterable的计数器,否
你根本不需要在这里,也不需要寻找。而是使用嵌套循环并缓冲节标题:
with open(filename) as fh:
enumerated = enumerate(fh, start=1)
header = None
for lineno, line in enumerated:
# seek to first section
if header is None:
if not line.startswith('#'):
continue
header = line
print("=" * 40)
print(header.rstrip())
for lineno, line in enumerated:
if line.startswith('#'):
# new section
header = line
break
# section line, handle as such
print("[{}] {}".format(lineno, line.rstrip()))
这只缓冲标题行;每次我们遇到一个新的头,它就会被存储,当前的节循环就结束了
演示:
第三部分保持未处理状态,因为其中没有行,但如果有行,则已预先设置了标题变量。您无法更改枚举()的计数器
你根本不需要在这里,也不需要寻找。而是使用嵌套循环并缓冲节标题:
with open(filename) as fh:
enumerated = enumerate(fh, start=1)
header = None
for lineno, line in enumerated:
# seek to first section
if header is None:
if not line.startswith('#'):
continue
header = line
print("=" * 40)
print(header.rstrip())
for lineno, line in enumerated:
if line.startswith('#'):
# new section
header = line
break
# section line, handle as such
print("[{}] {}".format(lineno, line.rstrip()))
这只缓冲标题行;每次我们遇到一个新的头,它就会被存储,当前的节循环就结束了
演示:
第三部分保持未处理状态,因为其中没有行,但如果有行,则已预先设置了标题变量。您无法更改枚举()的计数器
你根本不需要在这里,也不需要寻找。而是使用嵌套循环并缓冲节标题:
with open(filename) as fh:
enumerated = enumerate(fh, start=1)
header = None
for lineno, line in enumerated:
# seek to first section
if header is None:
if not line.startswith('#'):
continue
header = line
print("=" * 40)
print(header.rstrip())
for lineno, line in enumerated:
if line.startswith('#'):
# new section
header = line
break
# section line, handle as such
print("[{}] {}".format(lineno, line.rstrip()))
这只缓冲标题行;每次我们遇到一个新的头,它就会被存储,当前的节循环就结束了
演示:
第三部分保持未处理状态,因为其中没有行,但如果有行,则已预先设置了标题变量。您无法更改枚举()的计数器
你根本不需要在这里,也不需要寻找。而是使用嵌套循环并缓冲节标题:
with open(filename) as fh:
enumerated = enumerate(fh, start=1)
header = None
for lineno, line in enumerated:
# seek to first section
if header is None:
if not line.startswith('#'):
continue
header = line
print("=" * 40)
print(header.rstrip())
for lineno, line in enumerated:
if line.startswith('#'):
# new section
header = line
break
# section line, handle as such
print("[{}] {}".format(lineno, line.rstrip()))
这只缓冲标题行;每次我们遇到一个新的头,它就会被存储,当前的节循环就结束了
演示:
第三部分仍然未处理,因为其中没有行,但如果有行,标题
变量已预先设置。您可以复制迭代器,然后从该副本还原迭代器。但是,不能复制文件对象。您可以获取枚举数的浅层副本,然后在开始使用复制的枚举数时查找文件的相应部分
但是,最好的方法是编写生成器类,使用\uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu
class EnumeratedFile:
def __init__(self, fh, lineno_start=1):
self.fh = fh
self.lineno = lineno_start
def __iter__(self):
return self
def __next__(self):
result = self.lineno, next(self.fh)
self.lineno += 1
return result
def mark(self):
self.marked_lineno = self.lineno
self.marked_file_position = self.fh.tell()
def recall(self):
self.lineno = self.marked_lineno
self.fh.seek(self.marked_file_position)
您可以这样使用它:
from io import StringIO
demo = StringIO('''\
#HEADER1, SOME EXTRA INFO
data first section
1 2
1 233
...
// THIS IS A COMMENT
#HEADER2, SECOND SECTION
452
134
// ANOTHER COMMENT
...
#HEADER3, THIRD SECTION
''')
e = EnumeratedFile(demo)
seen_header2 = False
for lineno, line, in e:
if seen_header2:
print(lineno, line)
assert (lineno, line) == (2, "data first section\n")
break
elif line.startswith("#HEADER1"):
e.mark()
elif line.startswith("#HEADER2"):
e.recall()
seen_header2 = True
您可以复制迭代器,然后从该副本还原迭代器。但是,不能复制文件对象。您可以获取枚举数的浅层副本,然后在开始使用复制的枚举数时查找文件的相应部分
但是,最好的方法是编写生成器类,使用\uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu
class EnumeratedFile:
def __init__(self, fh, lineno_start=1):
self.fh = fh
self.lineno = lineno_start
def __iter__(self):
return self
def __next__(self):
result = self.lineno, next(self.fh)
self.lineno += 1
return result
def mark(self):
self.marked_lineno = self.lineno
self.marked_file_position = self.fh.tell()
def recall(self):
self.lineno = self.marked_lineno
self.fh.seek(self.marked_file_position)
您可以这样使用它:
from io import StringIO
demo = StringIO('''\
#HEADER1, SOME EXTRA INFO
data first section
1 2
1 233
...
// THIS IS A COMMENT
#HEADER2, SECOND SECTION
452
134
// ANOTHER COMMENT
...
#HEADER3, THIRD SECTION
''')
e = EnumeratedFile(demo)
seen_header2 = False
for lineno, line, in e:
if seen_header2:
print(lineno, line)
assert (lineno, line) == (2, "data first section\n")
break
elif line.startswith("#HEADER1"):
e.mark()
elif line.startswith("#HEADER2"):
e.recall()
seen_header2 = True
您可以复制迭代器,然后从该副本还原迭代器。但是,不能复制文件对象。您可以获取枚举数的浅层副本,然后在开始使用复制的枚举数时查找文件的相应部分
但是,最好的方法是编写生成器类,使用\uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu
class EnumeratedFile:
def __init__(self, fh, lineno_start=1):
self.fh = fh
self.lineno = lineno_start
def __iter__(self):
return self
def __next__(self):
result = self.lineno, next(self.fh)
self.lineno += 1
return result
def mark(self):
self.marked_lineno = self.lineno
self.marked_file_position = self.fh.tell()
def recall(self):
self.lineno = self.marked_lineno
self.fh.seek(self.marked_file_position)
您可以这样使用它:
from io import StringIO
demo = StringIO('''\
#HEADER1, SOME EXTRA INFO
data first section
1 2
1 233
...
// THIS IS A COMMENT
#HEADER2, SECOND SECTION
452
134
// ANOTHER COMMENT
...
#HEADER3, THIRD SECTION
''')
e = EnumeratedFile(demo)
seen_header2 = False
for lineno, line, in e:
if seen_header2:
print(lineno, line)
assert (lineno, line) == (2, "data first section\n")
break
elif line.startswith("#HEADER1"):
e.mark()
elif line.startswith("#HEADER2"):
e.recall()
seen_header2 = True
您可以复制迭代器,然后从该副本还原迭代器。但是,不能复制文件对象。您可以获取枚举数的浅层副本,然后在开始使用复制的枚举数时查找文件的相应部分
但是,最好的方法是编写生成器类,使用\uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu
class EnumeratedFile:
def __init__(self, fh, lineno_start=1):
self.fh = fh
self.lineno = lineno_start
def __iter__(self):
return self
def __next__(self):
result = self.lineno, next(self.fh)
self.lineno += 1
return result
def mark(self):
self.marked_lineno = self.lineno
self.marked_file_position = self.fh.tell()
def recall(self):
self.lineno = self.marked_lineno
self.fh.seek(self.marked_file_position)
您可以这样使用它:
from io import StringIO
demo = StringIO('''\
#HEADER1, SOME EXTRA INFO
data first section
1 2
1 233
...
// THIS IS A COMMENT
#HEADER2, SECOND SECTION
452
134
// ANOTHER COMMENT
...
#HEADER3, THIRD SECTION
''')
e = EnumeratedFile(demo)
seen_header2 = False
for lineno, line, in e:
if seen_header2:
print(lineno, line)
assert (lineno, line) == (2, "data first section\n")
break
elif line.startswith("#HEADER1"):
e.mark()
elif line.startswith("#HEADER2"):
e.recall()
seen_header2 = True
您无法重置enumerate()
计数,否。无论如何,混合搜索和迭代不是一个好主意。这里的目标是什么?要对每个部分中的行进行编号,每个新部分从1开始?目的是提醒用户输入文件中的某些行号有问题,以防在读取时出错。我可以用计数器替换enumerate,每次调用next时增加它,每次调用seek时找到新分区时减少它。我不知道为什么需要查找。为什么不将读取的行存储在缓冲区中呢?我不想