在python中共享磁盘空间的并发代理之间共享任务队列

在python中共享磁盘空间的并发代理之间共享任务队列,python,concurrency,multiprocessing,queue,locking,Python,Concurrency,Multiprocessing,Queue,Locking,是否可以将任务队列写入磁盘,使共享硬盘空间的不同系统(单独的cpu、缓存和内存)上的代理可以访问队列,而不会引起代理之间的竞争条件 我一直在使用生成磁盘上的队列,以及将任务分配给每个代理中的CPU的模块。我希望能够运行handle_queue.py的多个实例,而不会出现访问任务队列的并发问题 生成模拟任务队列的示例脚本和执行任务的脚本如下所示 如果只运行一个 一次处理_queue.py,但同时运行两个(例如使用)会导致重复执行某些任务,完成后查看sleep_out.txt中列出的已完成任务就可以

是否可以将任务队列写入磁盘,使共享硬盘空间的不同系统(单独的cpu、缓存和内存)上的代理可以访问队列,而不会引起代理之间的竞争条件

我一直在使用生成磁盘上的队列,以及将任务分配给每个代理中的CPU的模块。我希望能够运行handle_queue.py的多个实例,而不会出现访问任务队列的并发问题

生成模拟任务队列的示例脚本和执行任务的脚本如下所示

如果只运行一个 一次处理_queue.py,但同时运行两个(例如使用)会导致重复执行某些任务,完成后查看sleep_out.txt中列出的已完成任务就可以看出这一点

我曾尝试使用和来限制对
/sleep\u queue
的同时访问,以锁定队列访问,但此锁定似乎不适用于单独的代理

下面代码的另一个问题是,当运行handle_queue.py的两个并发实例时,程序将在完成时挂起。我认为这是因为
q.get()
已应用于空队列(的一个奇怪特性)。我也试着把这条线

q = persistqueue.SQLiteAckQueue("./sleep_queue", multithreading=True)
while not q.empty():
循环中,确保更新
q
以反映另一个代理的更改,但未成功

生成伪任务列表的脚本如下所示:

生成队列。py:

import time
import random
import persistqueue
import shutil
import os

class sleeper:
    """
    class of pseudo-tasks. Calling an instance of this class 
    simulates doing a task by sleeping for a random amount of
    time and the tasks fail with a probability of 1/5.
    """
    def __init__(self, i):
        self.i = i

    def __call__(self):
        t = random.uniform(0, 5)
        code = "{:03d}".format(self.i)
        print("executing {}...".format(code))
        time.sleep(t)
        if t<1:
            raise Exception("Crashed")
        message = "finished {} in time {:.2f}s".format(code, t)
        print(message)
        with open('sleep_out.txt', 'a') as f:
            print(message, file=f)

if __name__ == "__main__":
    import sys

    # set the number of tasks to queue from argument passed to script
    # or using a default value of 10.  
    try:
        n_tasks = int(sys.argv[1])
    except:
        n_tasks = 10 
    print("adding ", n_tasks, " tasks.")

    # clears any existing task log
    try: 
        os.remove("./sleep_out.txt")
        print("Exsting log deleted")
    except:
        print("No log to delete")
    
    # create list of tasks
    sleeper_list = [sleeper(i) for i in range(n_tasks)]
    
    # clears any existing queue
    try:
        shutil.rmtree("./sleep_queue")
        print("Exsting results deleted")
    except:
        print("No queue to delete.")

    # generate queue on disk in sleep_queue dir, and put tasks into queue
    q = persistqueue.SQLiteAckQueue("./sleep_queue", multithreading=True)
    for sl in sleeper_list:
        q.put(sl)
    
   print(q.size," tasks in queue")

import persistqueue
from generate_queue import sleeper
from multiprocessing import cpu_count
from multiprocessing import Process

q = persistqueue.SQLiteAckQueue("./sleep_queue", multithreading=True)

def worker():
    """
    Worker function which takes a task from the queue,
    executes the task, and puts it back in the queue if
    the task fails. 
    """
    while not q.empty():
        task = q.get()
        try:
            task()
            print("task completed!")
            q.ack(task)
        except:
            print("task failed!")
            q.nack(task)

# find how many cores are available
n_cpus = cpu_count()
print(n_cpus, " detected.")

for cpu_n in range(n_cpus):
    p = Process(target=worker)
    p.start()

p.join()

以下是运行handle_queue.py的两个并发实例后sleep_out.txt的内容,您可以看到它包含重复的结果,证明了重复的任务执行

sleeper\u out.txt

finished 003 in time 1.53s
finished 002 in time 2.99s
finished 004 in time 1.62s
finished 000 in time 3.44s
finished 001 in time 4.67s <- duplicate task 001
finished 001 in time 1.07s <- duplicate task 001
finished 005 in time 3.60s <- duplicate task 005
finished 006 in time 3.60s <- duplicate task 006
finished 008 in time 2.22s
finished 007 in time 2.84s <- duplicate task 007
finished 005 in time 3.57s <- duplicate task 005
finished 009 in time 3.04s
finished 006 in time 4.85s <- duplicate task 006
finished 007 in time 4.44s <- duplicate task 007
finished 015 in time 1.23s
finished 011 in time 2.82s
finished 017 in time 1.53s
finished 012 in time 4.07s
finished 010 in time 4.69s
finished 013 in time 4.51s
finished 014 in time 3.99s
finished 018 in time 1.96s
finished 016 in time 2.68s
finished 022 in time 1.04s
finished 019 in time 2.42s
finished 025 in time 2.44s
finished 021 in time 3.48s
finished 020 in time 4.85s
finished 024 in time 3.67s
finished 023 in time 4.41s
finished 026 in time 3.96s
以1.53秒的时间完成003
以2.99秒的时间完成002
以1.62s的时间完成004
以3.44秒的时间完成了000次
以4.67秒的时间完成001