Python 从第2行读取文件或跳过标题行_Python_File Io

Python 从第2行读取文件或跳过标题行

python file-io

Python 从第2行读取文件或跳过标题行,python,file-io,Python,File Io,如何跳过标题行并开始从第2行读取文件？如果切片可以在迭代器上工作 with open(fname) as f: next(f) for line in f: #do something from itertools import islice with open(fname) as f: for line in islice(f, 1, None): pass 如果需要第一行，然后要对文件执行某些操作，则此代码将非常有用 with op

如何跳过标题行并开始从第2行读取文件？

如果切片可以在迭代器上工作

with open(fname) as f:
    next(f)
    for line in f:
        #do something

from itertools import islice
with open(fname) as f:
    for line in islice(f, 1, None):
        pass

如果需要第一行，然后要对文件执行某些操作，则此代码将非常有用

with open(filename , 'r') as f:
    first_line = f.readline()
    for line in f:
            # Perform some operations

为了概括阅读多个标题行的任务并提高可读性，我将使用方法提取。假设您想要标记

coordinates.txt

的前三行以用作标题信息

示例

coordinates.txt
---------------
Name,Longitude,Latitude,Elevation, Comments
String, Decimal Deg., Decimal Deg., Meters, String
Euler's Town,7.58857,47.559537,0, "Blah"
Faneuil Hall,-71.054773,42.360217,0
Yellowstone National Park,-110.588455,44.427963,0

然后，方法提取允许您指定要对标题信息执行的操作（在本例中，我们只是基于逗号标记标题行，并将其作为列表返回，但还有更多操作空间）

输出

['Name', 'Longitude', 'Latitude', 'Elevation', 'Comments']
['String', 'Decimal Deg.', 'Decimal Deg.', 'Meters', 'String']

如果

coordinates.txt

包含另一条标题线，只需更改

numberheaderlines

。最重要的是，

\uuu readheader（rh，numberheaderlines=2）

正在做什么是显而易见的，我们避免了必须弄清楚或评论接受答案的作者为什么在代码中使用

next（）

的模糊性

如果您想从第2行开始读取多个CSV文件，这就像一个符咒

for files in csv_file_list:
        with open(files, 'r') as r: 
            next(r)                  #skip headers             
            rr = csv.reader(r)
            for row in rr:
                #do something

（这是另一个问题的一部分）

如果您以后需要标题，请不要使用

next（f）

而使用

f.readline（）

并将其存储为变量，或者使用

header\u line=next（f）

。myone抱怨如果我使用

next（f）

方法，则

“file”对象不是迭代器。相反，f.readline（）
起作用。这将跳过一行<代码>['a'，'b'，'c'][1:

['b'，'c']

@LjubisaLivac是正确的-这个答案适用于任何一行，因此这是一个更强大的解决方案。在文件太大而无法读取之前，这是很好的。这对于小文件来说很好。切片还构建内容的副本。这是不必要的低效。如中所述，如何使用更多itertools中的

consume（）

？我听说这是一个非常好的解决问题的方法，可以扩展到任意数量的标题行。这是一个非常好的执行！非常好的解决方案这应该比现在多得多。这个解决方案真的很好。这甚至适用于在迭代文件对象时在内存中上载的文件。这将立即将整个文件读入内存，因此只有在读取相当小的文件时才实用。如果不需要这一行，则无需将readline（）指定给变量。不过，我最喜欢这个解决方案。不建议将直接读取与将文件用作迭代器混合使用（尽管在这种特定情况下不会造成任何伤害）。

coordinates.txt
---------------
Name,Longitude,Latitude,Elevation, Comments
String, Decimal Deg., Decimal Deg., Meters, String
Euler's Town,7.58857,47.559537,0, "Blah"
Faneuil Hall,-71.054773,42.360217,0
Yellowstone National Park,-110.588455,44.427963,0

def __readheader(filehandle, numberheaderlines=1):
    """Reads the specified number of lines and returns the comma-delimited 
    strings on each line as a list"""
    for _ in range(numberheaderlines):
        yield map(str.strip, filehandle.readline().strip().split(','))

with open('coordinates.txt', 'r') as rh:
    # Single header line
    #print next(__readheader(rh))

    # Multiple header lines
    for headerline in __readheader(rh, numberheaderlines=2):
        print headerline  # Or do other stuff with headerline tokens

['Name', 'Longitude', 'Latitude', 'Elevation', 'Comments']
['String', 'Decimal Deg.', 'Decimal Deg.', 'Meters', 'String']

for files in csv_file_list:
        with open(files, 'r') as r: 
            next(r)                  #skip headers             
            rr = csv.reader(r)
            for row in rr:
                #do something