Python 使用itertools的迭代器正在跳过一行_Python_Csv_For Loop_Iterator_Itertools

Python 使用itertools的迭代器正在跳过一行

python csv for-loop

Python 使用itertools的迭代器正在跳过一行,python,csv,for-loop,iterator,itertools,Python,Csv,For Loop,Iterator,Itertools,我觉得我的问题与不过，我还没有找到令人满意的答案下面的示例使用以下模块 import csv from itertools import takewhile 这是我的问题。我有一个csv文件，我想用itertools解析它例如，我想将标题与内容分开。这可以通过在第一列中出现一个关键字来发现下面是file.csv示例 a, content b, content KEYWORD, something else c, let's continue 前两行构成文件的标题。关键字行将其与

我觉得我的问题与

不过，我还没有找到令人满意的答案

下面的示例使用以下模块

import csv
from itertools import takewhile

这是我的问题。我有一个csv文件，我想用itertools解析它

例如，我想将标题与内容分开。这可以通过在第一列中出现一个关键字来发现

下面是

file.csv

示例

a, content
b, content
KEYWORD, something else
c, let's continue

前两行构成文件的标题。

关键字

行将其与内容分隔开来：最后一行

即使它不是内容的一部分，我也要解析分隔行

with open('file.csv', 'rb') as f:
    reader = csv.reader(f)
    header = takewhile(lambda x: x[0] != 'KEYWORD', reader)
    for row in header:
        print(row)
    print('End of header')
    for row in reader:
        print(row)

我没想到会这样，但是跳过了

关键字

行。正如您将在以下输出中看到的：

['a', ' content']
['b', ' content']
End of header
['c', " let's continue"]

我试着模拟csv阅读器，看看它是否来自那里。但显然不是。下面的代码生成相同的行为

l = [['a', 'content'],
    ['b','content'],
    ['KEYWORD', 'something else'],
    ['c', "let's continue"]]

i = iter(l)
header = takewhile(lambda x: x[0] != 'KEYWORD', i)
for row in header:
    print(row)
print('End of header')
for row in i:
    print(row)

如何使用takewhile的功能，同时防止以下跳过未解析行的行为

据我所知，for的第一个

调用迭代器上的next，以测试其内容。
第二个命令再次调用next，以收集值。
因此，将跳过分离行。
我认为您必须重新构造-takewhile
不适合您所做的工作。问题是takewhile
必须读取从'KEYWORD'
开始的行，以确定它已经到达了不应该读取的行，并且一旦读取了该行，文件的“读取头”就位于下一行的开始处。类似地，对于iter
，takewhile
在启动i

中的行的

“关键字”

时，已经消耗（但丢弃）了以

开头的行
另一种选择是：
header = []
content = []
target = header
for row in reader:
    if line.startswith('KEYWORD'):
        target = content
    target.append(row)

多亏了@jornsharpe，我开始质疑自己编写代码的一些技巧。
以下是我达到的目标：
class RewindableFile(file):
    def __init__(self, *args, **kwargs):
        nb_backup = kwargs.pop('nb_backup', 1)
        super(RewindableFile, self).__init__(*args, **kwargs)
        self._nb_backup = nb_backup
        self._backups = []
        self._time_anchor = 0

    def next(self):
        if self._time_anchor >= 0:
            item = super(RewindableFile, self).next()
            self._backup(item)
            return item
        else:
            item = self._forward()
            return item

    def rewind(self):
        self._time_anchor = self._time_anchor - 1
        time_bound = min(self._nb_backup, len(self._backups))
        if self._time_anchor < -time_bound:
            raise Exception('You have gone too far in history...')

    def __iter__(self):
        return self

    def _backup(self, row):
        self._backups.append(row)
        extra_items = len(self._backups) - self._nb_backup
        if extra_items > 0:
            del self._backups[0:extra_items]

    def _forward(self):
        item = self._backups[self._time_anchor]
        self._time_anchor = self._time_anchor + 1
        return item

我还可以重载read
和readline
函数来保存跳转。
但是我这里不需要它们。
你可以这样写你自己的照片
def takewhile(predicate, iterable):
    for x in iterable:
        yield x
        if not predicate(x):
            break

测试：
乔尔沙普说得对。这不是一个很好的工作。itertools还有一个groupby函数，可以更轻松地处理拆分。下面的LastHeader
类保存通过check
方法传递的最后一个标题行的记录，并在每次调用check时返回对它的引用。
这样，您就可以在文件中运行一次，而不必回溯任何内容
class LastHeader():
    """Checks for new header strings. For use with groupby"""
    def __init__(self, sentinel='#'):
        self.sentinel = sentinel
        self.lastheader = ''

    def check(self, line):
        if line.startswith(self.sentinel):
            self.lastheader = line
        return self.lastheader

with open(fname, 'r') as fobj:
    lastheader = LastHeader(sentinel)
    for headerline, readlines in groupby(fobj, lastheader.check):
        foo(headerline)
        for line in readlines:
            bar(line)

其中foo
和bar
是您需要对标题和数据进行的任何处理。
谢谢您的回答，但我有一个比将内容添加到列表中更复杂的过程。因此，你的回答对我的问题并不完全满意。你让我重新思考了一些解决方法。谢谢（见我的答案）。这需要takewhile才能访问iterable next而不必使用它。在这方面我看不到任何可能的方法。这是我在创建可倒带文件之前的第一个意图。但谢谢你的回答。受启发，为了保存由文件
自然操作的缓冲，我从搜索迁移到了备份。因此，我修改了代码。之后，我引用的解决方案更好。它装饰文件，而不是重新定义它。因此，我推荐他们的方法。同时还添加了我的一些功能作为自适应备份容量。
>>> list(takewhile(lambda x:x!=3, range(10)))
[0, 1, 2, 3]

class LastHeader():
    """Checks for new header strings. For use with groupby"""
    def __init__(self, sentinel='#'):
        self.sentinel = sentinel
        self.lastheader = ''

    def check(self, line):
        if line.startswith(self.sentinel):
            self.lastheader = line
        return self.lastheader

with open(fname, 'r') as fobj:
    lastheader = LastHeader(sentinel)
    for headerline, readlines in groupby(fobj, lastheader.check):
        foo(headerline)
        for line in readlines:
            bar(line)