Python 如何从CSV文件中提取目标行、前一行和后一行？_Python_Python 2.7_Csv_Enumerate

Python 如何从CSV文件中提取目标行、前一行和后一行？

python python-2.7 csv

Python 如何从CSV文件中提取目标行、前一行和后一行？,python,python-2.7,csv,enumerate,Python,Python 2.7,Csv,Enumerate,我一直在试图找出如何通过python中提供的for循环和enumerate对象来实现这一点。我有一个时间的格式是HH:MM。我有一个csv文件，其中第一列是一个时间戳，它以相同的格式跟随。然后，我在文件中搜索匹配的时间，然后提取该行以稍后转换为XML文件。但是，我还需要提取目标行之前的行和之后的行。我尝试了以下代码： def findRow(timeID, filename): rows = [] csvFile = csv.reader(open(filename, "rb")

我一直在试图找出如何通过python中提供的

for

循环和

enumerate

对象来实现这一点。我有一个时间的格式是

HH:MM

。我有一个csv文件，其中第一列是一个时间戳，它以相同的格式跟随。然后，我在文件中搜索匹配的时间，然后提取该行以稍后转换为XML文件。但是，我还需要提取目标行之前的行和之后的行。我尝试了以下代码：

def findRow(timeID, filename):
    rows = []
    csvFile = csv.reader(open(filename, "rb"), delimiter=",")
    for i, row in enumerate(csvFile):
        if timeID == timeInRow:
            rows.append(i-1)
            rows.append(i)
            rows.append(i+1)
            return rows

然而，我很快意识到这不是正确的方法，因为我提取的是索引而不是值。我需要的是像第[I-1]行、第[I]行、第[I+1]行这样的东西。换句话说，我需要匹配行的I元素

有没有一个简单的方法可以做到这一点？我曾经考虑过使用

range（csvFile）

，但我真的不知道那会有什么结果

我会使用不同的方法：

在循环中存储上一行
如果匹配，则使用
```
next
```
获取下一行，并返回3行

这样（我添加了一条注释，因为

timeInRow

应该从

行中提取，但是您的代码没有显示它）：
next
仅在最后一行匹配时使用默认的空列表值（避免StopIteration
异常）
这种线性方法可行，但如果行按时间排序，并且需要执行多次搜索，则更好（更快）的方法可能是创建一个行列表，一个时间列表，然后使用对分
模块计算时间列表中的插入点，检查时间是否匹配，并使用索引返回行列表的一部分
比如：
list_of_rows = list(csvFile)
list_of_times = [x[3] for x in list_of_rows] # assume that the time is the 4th column here
i = bisect.bisect(list_of_rows,timeInRow)
if i < len(list_of_rows) and list_of_rows[i] == timeInRow:
    return list_of_rows[max(i-1,0):min(i+2,len(list_of_rows)]

rows = list(csv.reader(f))
for x, y, z in zip(rows, rows[1:], rows[2:]):
    # y is the middle row, x is above it, and z below it
    pass

list\u of_rows=list（csvFile）
list_of_times=[x[3]表示_行列表中的x]#假设时间是这里的第4列
i=对分。对分（行列表，timeInRow）
如果i

如果您只需要执行一次搜索，这会比较慢，因为您必须创建列表，所以O（n）+O（log（n））
。但是如果您想在同一个列表中执行多次搜索，每次搜索的成本是O（log（n））
。
我会使用不同的方法：

在循环中存储上一行
如果匹配，则使用next
获取下一行，并返回3行

这样（我添加了一条注释，因为timeInRow
应该从行中提取，但是您的代码没有显示它）：
next
仅在最后一行匹配时使用默认的空列表值（避免StopIteration
异常）
这种线性方法很有效，但是如果行是按时间排序的，并且需要执行多次搜索，那么更好的方法（更快）可能需要创建一个行列表，一个时间列表，然后使用bisect
模块计算时间列表中的插入点，检查时间是否匹配，并使用索引返回行列表的一个片段
比如：
list_of_rows = list(csvFile)
list_of_times = [x[3] for x in list_of_rows] # assume that the time is the 4th column here
i = bisect.bisect(list_of_rows,timeInRow)
if i < len(list_of_rows) and list_of_rows[i] == timeInRow:
    return list_of_rows[max(i-1,0):min(i+2,len(list_of_rows)]

rows = list(csv.reader(f))
for x, y, z in zip(rows, rows[1:], rows[2:]):
    # y is the middle row, x is above it, and z below it
    pass

list\u of_rows=list（csvFile）
list_of_times=[x[3]表示_行列表中的x]#假设时间是这里的第4列
i=对分。对分（行列表，timeInRow）
如果i

如果您只需要执行一次搜索，这会比较慢，因为您必须创建列表，所以O（n）+O（log（n））
。但是如果您想在同一个列表中执行多次搜索，每次搜索的成本是O（log（n））
。
您可以使用一个
鉴于：
$ cat /tmp/file.csv
firstName,lastName,email,phoneNumber
John,Doe,john@doe.com,0123456789
Jane,Doe,jane@doe.com,9876543210
James,Bond,james.bond@mi6.co.uk,0612345678

假设您想要包含Jane
的行以及前后的行
尝试：
然后：
你可以用a来做这个
鉴于：
$ cat /tmp/file.csv
firstName,lastName,email,phoneNumber
John,Doe,john@doe.com,0123456789
Jane,Doe,jane@doe.com,9876543210
James,Bond,james.bond@mi6.co.uk,0612345678

假设您想要包含Jane
的行以及前后的行
尝试：
然后：
上述方法的替代（功能）方法是使用zip
或其变体。类似于：
list_of_rows = list(csvFile)
list_of_times = [x[3] for x in list_of_rows] # assume that the time is the 4th column here
i = bisect.bisect(list_of_rows,timeInRow)
if i < len(list_of_rows) and list_of_rows[i] == timeInRow:
    return list_of_rows[max(i-1,0):min(i+2,len(list_of_rows)]

rows = list(csv.reader(f))
for x, y, z in zip(rows, rows[1:], rows[2:]):
    # y is the middle row, x is above it, and z below it
    pass

如果您想将迭代中的前两行和最后两行作为
(None, None, rows[0])
(None, rows[0], rows[1])
(rows[-2], rows[-1], None)
(rows[-1], None, None)

然后，您必须在行
列表的前后两端各挂起两个None
不说这肯定比其他答案好，但这是我考虑写作的另一种方法。
[编辑]
按照Jean François的建议使用itertools.islice：
rows = list(csv.reader(f))
from itertools import islice
for x, y, z in zip(rows, islice(rows, 1, None), islice(rows, 2, None)):
    # y is the middle row, x is above it, and z below it
    pass

上述方法的替代（功能）方法是使用zip
或其变体。类似于：
list_of_rows = list(csvFile)
list_of_times = [x[3] for x in list_of_rows] # assume that the time is the 4th column here
i = bisect.bisect(list_of_rows,timeInRow)
if i < len(list_of_rows) and list_of_rows[i] == timeInRow:
    return list_of_rows[max(i-1,0):min(i+2,len(list_of_rows)]

rows = list(csv.reader(f))
for x, y, z in zip(rows, rows[1:], rows[2:]):
    # y is the middle row, x is above it, and z below it
    pass

如果您想将迭代中的前两行和最后两行作为
(None, None, rows[0])
(None, rows[0], rows[1])
(rows[-2], rows[-1], None)
(rows[-1], None, None)

然后，您必须在行
列表的前后两端各挂起两个None
不说这肯定比其他答案好，但这是我考虑写作的另一种方法。
[编辑]
按照Jean François的建议使用itertools.islice：
rows = list(csv.reader(f))
from itertools import islice
for x, y, z in zip(rows, islice(rows, 1, None), islice(rows, 2, None)):
    # y is the middle row, x is above it, and z below it
    pass

timeInRow
是通过我编写的另一个函数提取的。我尝试了这种方法，得到了错误消息TypeError:list对象不是迭代器
。ooops修复了我的下一个
部分，它必须应用于csvFile
对象，而不是行
：）很抱歉这么晚才回复您，这个方法如何解释CSV文件中的越界？如果匹配结果是文件中的第一个或最后一个内容，这个会中断吗？不会，因为切片将首先停止。嗯，可能有一个角落的情况，editingno in first方法，如果在开始时找到，它将返回pr的空列表如果在最后找到，将返回下一个空列表。timeInRow
是通过我编写的另一个函数提取的。