Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/334.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 多处理情况下读取方法的意外行为_Python_File_Multiprocessing_Critical Section - Fatal编程技术网

Python 多处理情况下读取方法的意外行为

Python 多处理情况下读取方法的意外行为,python,file,multiprocessing,critical-section,Python,File,Multiprocessing,Critical Section,我正在以二进制模式读取具有多个进程的同一文件。首先在父进程中打开文件,然后创建子进程。读取文件给定部分的实际代码为: def _line(self, n: int) -> str: offset = self._lineOffsets[n] lineSize = self._lineSizes[n] self._criticalSectionLock.acquire() line = b"" try:

我正在以二进制模式读取具有多个进程的同一文件。首先在父进程中打开文件,然后创建子进程。读取文件给定部分的实际代码为:

def _line(self, n: int) -> str:

    offset = self._lineOffsets[n]
    lineSize = self._lineSizes[n]

    self._criticalSectionLock.acquire()

    line = b""
    
    try:
        self._datasetFile.seek(offset)
        while len(line) < lineSize:
            # because read may return less than required number of bytes we must read in while loop
            block = self._datasetFile.read(lineSize-len(line))
            line += block

            if len(block) == 0:
                raise IOError(f"Failed to read whole sample on line {n} (indexed from 0).")
    finally:
        self._criticalSectionLock.release()

    return line.decode("utf-8")
def_行(self,n:int)->str:
偏移量=自身。\u线偏移量[n]
lineSize=self.\u lineSize[n]
self.\u criticalSectionLock.acquire()
行=b“”
尝试:
self._datasetFile.seek(偏移量)
而len(line)
问题是read方法并不总是返回整个块。不是任意的,它总是发生在同一块上。当我以以下方式编辑代码时:

def _line(self, n: int) -> str:

    offset = self._lineOffsets[n]
    lineSize = self._lineSizes[n]

    self._criticalSectionLock.acquire()

    line = b""
    
    try:
        while len(line) < lineSize:
            # because read may return less than required number of bytes we must read in while loop
            self._datasetFile.seek(offset+len(line))
            block = self._datasetFile.read(lineSize-len(line))
            line += block

            if len(block) == 0:
                raise IOError(f"Failed to read whole sample on line {n} (indexed from 0).")
    finally:
        self._criticalSectionLock.release()

    return line.decode("utf-8")
def_行(self,n:int)->str:
偏移量=自身。\u线偏移量[n]
lineSize=self.\u lineSize[n]
self.\u criticalSectionLock.acquire()
行=b“”
尝试:
而len(line)
问题消失了。发生的事情是,文件偏移量增加了我真正想要读取的大小,甚至我实际上得到了一个更小的块

所以我想问,这是怎么发生的

为了提前省去一些问题,我在这里提出了一些观点,说明为什么文件不是以标准的逐行方式读取的:

  • 大文件
  • 将整个文件存储在内存中不是一种理想的方法
  • 需要对线路进行非顺序访问
  • 二进制模式下的读取速度更快
  • 线的已知尺寸及其偏移量