Python多处理锁不';t同步多处理进程子类
我有大约1000个图像需要单独处理并将结果保存到磁盘上。该进程本身占用大量CPU,因此我决定将其子类化为multiprocessing.process,并使用一个全局缓冲区来接收结果。缓冲区将具有一定的大小,超过其大小后将刷新到磁盘中Python多处理锁不';t同步多处理进程子类,python,python-multiprocessing,Python,Python Multiprocessing,我有大约1000个图像需要单独处理并将结果保存到磁盘上。该进程本身占用大量CPU,因此我决定将其子类化为multiprocessing.process,并使用一个全局缓冲区来接收结果。缓冲区将具有一定的大小,超过其大小后将刷新到磁盘中 class ImageBuffer: def __init__(self): self.buffer = [] self.size = 0 self.bulk_index = 0 def add(s
class ImageBuffer:
def __init__(self):
self.buffer = []
self.size = 0
self.bulk_index = 0
def add(self, data):
if self.size == 2000:
self.persist()
self.buffer.append(data)
self.size += 1
print(f"Adding new image. Current size {self.size}")
def persist(self):
self.size = 0
pass
同步应在每个过程中进行:
from multiprocessing import Process
import multiprocessing
class ImageWorker(Process):
def __init__(self, buffer: ImageBuffer, lock: multiprocessing.Lock):
super().__init__()
self.tasks = []
self.buffer = buffer
self.lock = lock
def add_task(self, task):
self.tasks.append(task)
def run(self):
assert len(self.tasks) != 0
for _ in range(len(self.tasks)):
task = self.tasks.pop(0)
result = process_task(task)
self.lock.acquire()
self.buffer.add(result)
self.lock.release()
这就是我启动流程的方式:
from .image_worker import ImageWorker
from .image_buffer import ImageBuffer
import multiprocessing
class ImageWorkerPool:
def __init__(self, num_threads=multiprocessing.cpu_count()):
self.workers = []
self.work_index = 0
self.buffer = ImageBuffer()
lock = multiprocessing.Lock()
for _ in range(num_threads):
self.workers.append(ImageWorker(self.buffer, lock))
def add_task(self, _image_mask):
self.workers[self.work_index].add_task(_image_mask)
self.work_index += 1
self.work_index = self.work_index % len(self.workers)
assert self.work_index < len(self.workers)
def start(self):
for worker in self.workers:
worker.start()
def complete(self):
for worker in self.workers:
worker.join()
self.buffer.persist()
这是因为有多个工人有自己的规模。它们不一定以有序的方式工作,这就是为什么您会看到每个多个worker都在向标准输出写入数据。
ImageBuffer
是一个常规对象。它不是在进程之间共享的,而是复制的。没有全局缓冲区。请注意,如果您的任务是IO密集型的,则进程将没有帮助。在最坏的情况下,它们会增加延迟,因为通信是IO的一种形式。如果您的任务没有CPU限制,请对IO密集型任务使用线程。@MisterMiyagi这是我的错误。此外,IO密集型
也是一个打字错误。我是说CPU
Adding new image. Current size 1
Adding new image. Current size 2
Adding new image. Current size 3
Adding new image. Current size 4
Adding new image. Current size 5
Adding new image. Current size 1
Adding new image. Current size 6
Adding new image. Current size 7
Adding new image. Current size 2
Adding new image. Current size 3
Adding new image. Current size 4
Adding new image. Current size 8
Adding new image. Current size 5
Adding new image. Current size 6
Adding new image. Current size 1
Adding new image. Current size 9
Adding new image. Current size 7
Adding new image. Current size 10