Warning: file_get_contents(/data/phpspider/zhask/data//catemap/6/multithreading/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 3.x 使用Python在GPU(MXNet)和CPU上进行并行处理_Python 3.x_Multithreading_Parallel Processing_Gpu_Mxnet - Fatal编程技术网

Python 3.x 使用Python在GPU(MXNet)和CPU上进行并行处理

Python 3.x 使用Python在GPU(MXNet)和CPU上进行并行处理,python-3.x,multithreading,parallel-processing,gpu,mxnet,Python 3.x,Multithreading,Parallel Processing,Gpu,Mxnet,我有一个数据处理管道,我想通过在CPU上运行一些处理线程,同时在GPU上运行MXNet预测模型(Python 3.6)来优化它 我想申请的想法如下(假设我的机器上有N个GPU): GPU作业调度器从视频中读取N帧序列,并将每个帧发送到一个GPU 每个GPU处理其帧并使用MXNet预测其内容 一旦所有N个GPU都完成了预测,我希望同时执行以下操作: 将预测输出发送到队列 在GPU中读取并处理接下来的N帧 队列由在CPU上运行的多线程进程使用 以下是工作流的可视化描述: 其思想是在GPU忙于

我有一个数据处理管道,我想通过在CPU上运行一些处理线程,同时在GPU上运行MXNet预测模型(Python 3.6)来优化它

我想申请的想法如下(假设我的机器上有N个GPU):

  • GPU作业调度器从视频中读取N帧序列,并将每个帧发送到一个GPU
  • 每个GPU处理其帧并使用MXNet预测其内容
  • 一旦所有N个GPU都完成了预测,我希望同时执行以下操作:
  • 将预测输出发送到队列
  • 在GPU中读取并处理接下来的N帧
  • 队列由在CPU上运行的多线程进程使用
以下是工作流的可视化描述:

其思想是在GPU忙于处理帧时利用空闲CPU

通过使用线程库,我成功地读取和处理了前N个帧,但GPU无法处理下一批帧

请注意,下面的源代码经过简化以澄清工作流程。

以下是函数的代码,该函数读取帧并将其分派到GPU,然后将输出发送到cpu队列:

def dispatch_jobs(video_capture, detection_workers, number_of_gpu, cpu_queue):
    # detection_workers is a list of N similar MXNet models, each one works on a different GPU
    is_last_frame = False
    while not is_last_frame:
        frames_batch = []
        for i in range(0, number_of_gpu):
            success, frame = read_frame_from_video(video_capture)
            if not success:
                logging.warning("Can't receive frame. Exiting.")
                is_last_frame = True
                break
            frames_batch.append(frame)

        workers = []
        for detection_worker_id in range(0, len(frames_batch)):
            frame_image = frames_batch[detection_worker_id]
            thread = Thread(target=detection_workers[detection_worker_id].predict, kwargs={'image': frame_image})
            workers.append(thread)

        for w in workers: w.start()
        for w in workers: w.join()

        # sending to the CPU queue
        for detection_worker_id in range(0, len(frames_batch)):
            detector_output = detection_workers[detection_worker_id].output
            cpu_queue.put(detector_output)

    logging.info("While loop is broken... putting -1 in the queue")
    cpu_queue.put(-1)

    return
如上所述,有一个消费线程从
cpu\u队列
读取输出,并将其发送到多线程函数(在cpu上),以下是消费函数的代码:

def consume_cpu_queue(cpu_queue):
    while cpu_queue.empty():
        logging.info("Sleeping 1 second")
        time.sleep(1)

    prediction_output = cpu_queue.get()
    if prediction_output == -1:
        return

    process_output_multithread(prediction_output)
    consume_cpu_queue()

def process_output_multithread(pred_output, number_of_process):
    workers = []
    for i in range(0, number_of_process):
        thread = Thread(target=process, kwargs={'pred_output': pred_output})
        workers.append(thread)

    for w in workers: w.start()
    for w in workers: w.join()
    return

# Here is how the consumer thread is initiated
cpu_consumer_thread = Thread(target=consume_cpu_queue)

# Here is how I run the application
cpu_consumer_thread.start()
dispatch_jobs(video_capture, detection_workers)
cpu_consumer_thread.join()
我已经检查过了,但我不确定Numba是否能解决我的问题


任何建议或指针都会非常有用。

这可能会有帮助:您是否尝试/查看async/await?@RyabchenkoAlexander我在这里没有实际经验,任何示例都值得一看