Python 从文件读取结构数组_Python_Struct

Python 从文件读取结构数组

python struct

Python 从文件读取结构数组,python,struct,Python,Struct,我有下一个任务：我需要从文件中读取一个结构数组。读取一个结构没有问题： structFmt = "=64s 2L 3d" # char[ 64 ] long[ 2 ] double [ 3 ] structLen = struct.calcsize( structFmt ) f = open( "path/to/file", "rb" ) structBytes = f.read( structLen ) s = struct.unpack( structFmt, structBytes

我有下一个任务：我需要从文件中读取一个结构数组。读取一个结构没有问题：

structFmt = "=64s 2L 3d"    # char[ 64 ] long[ 2 ] double [ 3 ]
structLen = struct.calcsize( structFmt )
f = open( "path/to/file", "rb" )
structBytes = f.read( structLen )
s = struct.unpack( structFmt, structBytes )

此外，读取“简单”类型的数组也没有问题：

但从文件中读取1024个结构

structFmt

有一个问题（当然对我来说）。我认为，读取1024次struct并将其附加到列表中是一种开销。

我不想使用像

numpy

这样的外部依赖项，唉，对于包含复杂结构的数组，没有类似的功能

通常的方法是多次调用struct.unpack并将结果附加到列表中

structFmt = "=64s 2L 3d"    # char[ 64 ] long[ 2 ] double [ 3 ]
structLen = struct.calcsize( structFmt )
results = []
with open( "path/to/file", "rb" ) as f:
    structBytes = f.read( structLen )
    s = struct.unpack( structFmt, structBytes )
    results.append(s)

如果您担心效率，请知道struct.unpack会在连续调用之间缓存解析后的结构。

我将研究如何从_buffer（）调用中对文件进行mmaping，然后使用ctypes类方法。这将映射ctypes定义的结构数组

这将结构映射到mmap文件，而不必显式地读取/转换和复制内容

我不知道最终结果是否合适

为了好玩，这里有一个使用mmap的快速示例。（我使用dd

ddif=/dev/zero of=./test.dat bs=96 count=10240创建了一个文件
from ctypes import Structure
from ctypes import c_char, c_long, c_double
import mmap
import timeit


class StructFMT(Structure):
     _fields_ = [('ch',c_char * 64),('lo',c_long *2),('db',c_double * 3)]

d_array = StructFMT * 1024

def doit():
    f = open('test.dat','r+b')
    m = mmap.mmap(f.fileno(),0)
    data = d_array.from_buffer(m)

    for i in data:
        i.ch, i.lo[0]*10 ,i.db[2]*1.0   # just access each row and bit of the struct and do something, with the data.

    m.close()
    f.close()

if __name__ == '__main__':
    from timeit import Timer
    t = Timer("doit()", "from __main__ import doit")
    print t.timeit(number=10)

使用mmap
有什么意义，难道不能将整个数组读入内存缓冲区，然后从_buffer（）应用？您可以这样做。但是，使用mmap，文件会根据需要直接分页到内存中。因此，例如，如果您访问数组的最后一部分，则不必先将前面的所有内容读入内存，然后再使用from\u buffer（）。如果你有一个大文件要mmap，而你只很少访问结构的一部分，那么这可能是一个巨大的胜利。如果你读取/复制文件的每一部分，那么他们的可能不是。虽然I/O性能可能会给mmap一个优势。这当然需要进行基准测试。人们通常会忘记mmap，如果你处理的是wi，它特别有用th文件比内存大得多，您不需要复制周围的数据（即读取、执行某些操作和忘记）.AHH，您在windows.mmap上的操作具有不同的语义，而且如果您正在修改mmap文件中的值，则需要在mmap构造函数中为ACCESS参数设置ACCESS_WRITE。这可能会否定复制的需要。再次值得一试。感谢@RaymondHettinger。我将比较我的决定（在评论中与prev.answer比较）顺便说一句，如何计算函数的执行时间？我测试了mmap variant vs struct.unpack。第一个大约是100次（确切地说是113次）更快。@borisbn:将此方法与基于mmap的方法进行比较时必须非常小心，因为除非您访问它们，否则后者不会读取文件中的结构。良好的比较将访问所有这些结构。@martineau:hmmm.您完全正确。在第一次测试中，我访问了数组的前8192个元素。现在我尝试访问所有元素结果是：mmap为63秒，单次读取和解包为86秒（加上在两个测试中访问元素）。这是整个测试-。请您评论一下，好吗？谢谢。@Raymond Hettinger：好的观点，但能够同时解包大量结构意味着您可以预先一次读取同样多的结构，从而减少一些磁盘I/O开销。要知道瓶颈到底是什么，您需要分析y我们的代码，看看它在做什么。
from ctypes import Structure
from ctypes import c_char, c_long, c_double
import mmap
import timeit


class StructFMT(Structure):
     _fields_ = [('ch',c_char * 64),('lo',c_long *2),('db',c_double * 3)]

d_array = StructFMT * 1024

def doit():
    f = open('test.dat','r+b')
    m = mmap.mmap(f.fileno(),0)
    data = d_array.from_buffer(m)

    for i in data:
        i.ch, i.lo[0]*10 ,i.db[2]*1.0   # just access each row and bit of the struct and do something, with the data.

    m.close()
    f.close()

if __name__ == '__main__':
    from timeit import Timer
    t = Timer("doit()", "from __main__ import doit")
    print t.timeit(number=10)