Python 多处理情况下读取方法的意外行为_Python_File_Multiprocessing_Critical Section

Python 多处理情况下读取方法的意外行为

python file

Python 多处理情况下读取方法的意外行为,python,file,multiprocessing,critical-section,Python,File,Multiprocessing,Critical Section,我正在以二进制模式读取具有多个进程的同一文件。首先在父进程中打开文件，然后创建子进程。读取文件给定部分的实际代码为： def _line(self, n: int) -> str: offset = self._lineOffsets[n] lineSize = self._lineSizes[n] self._criticalSectionLock.acquire() line = b"" try:

我正在以二进制模式读取具有多个进程的同一文件。首先在父进程中打开文件，然后创建子进程。读取文件给定部分的实际代码为：

def _line(self, n: int) -> str:

    offset = self._lineOffsets[n]
    lineSize = self._lineSizes[n]

    self._criticalSectionLock.acquire()

    line = b""
    
    try:
        self._datasetFile.seek(offset)
        while len(line) < lineSize:
            # because read may return less than required number of bytes we must read in while loop
            block = self._datasetFile.read(lineSize-len(line))
            line += block

            if len(block) == 0:
                raise IOError(f"Failed to read whole sample on line {n} (indexed from 0).")
    finally:
        self._criticalSectionLock.release()

    return line.decode("utf-8")

def_行（self，n:int）->str:
偏移量=自身。\u线偏移量[n]
lineSize=self.\u lineSize[n]
self.\u criticalSectionLock.acquire（）
行=b“”
尝试：
self._datasetFile.seek（偏移量）
而len（line）


问题是read方法并不总是返回整个块。不是任意的，它总是发生在同一块上。当我以以下方式编辑代码时：
def _line(self, n: int) -> str:

    offset = self._lineOffsets[n]
    lineSize = self._lineSizes[n]

    self._criticalSectionLock.acquire()

    line = b""
    
    try:
        while len(line) < lineSize:
            # because read may return less than required number of bytes we must read in while loop
            self._datasetFile.seek(offset+len(line))
            block = self._datasetFile.read(lineSize-len(line))
            line += block

            if len(block) == 0:
                raise IOError(f"Failed to read whole sample on line {n} (indexed from 0).")
    finally:
        self._criticalSectionLock.release()

    return line.decode("utf-8")

def_行（self，n:int）->str:
偏移量=自身。\u线偏移量[n]
lineSize=self.\u lineSize[n]
self.\u criticalSectionLock.acquire（）
行=b“”
尝试：
而len（line）

问题消失了。发生的事情是，文件偏移量增加了我真正想要读取的大小，甚至我实际上得到了一个更小的块
所以我想问，这是怎么发生的
为了提前省去一些问题，我在这里提出了一些观点，说明为什么文件不是以标准的逐行方式读取的：

大文件
将整个文件存储在内存中不是一种理想的方法
需要对线路进行非顺序访问
二进制模式下的读取速度更快
线的已知尺寸及其偏移量