使用多处理锁定Python编写文件时缺少行_Python_Python Multiprocessing

使用多处理锁定Python编写文件时缺少行

python

使用多处理锁定Python编写文件时缺少行,python,python-multiprocessing,Python,Python Multiprocessing,这是我的代码： from multiprocessing import Pool, Lock from datetime import datetime as dt console_out = "/STDOUT/Console.out" chunksize = 50 lock = Lock() def writer(message): lock.acquire() with open(console_out, 'a') as out: out.write(me

这是我的代码：

from multiprocessing import Pool, Lock
from datetime import datetime as dt

console_out = "/STDOUT/Console.out"
chunksize = 50
lock = Lock()

def writer(message):
    lock.acquire()
    with open(console_out, 'a') as out:
        out.write(message)
        out.flush()
    lock.release()

def conf_wrapper(state):
    import ProcessingModule as procs
    import sqlalchemy as sal

    stcd, nrows = state
    engine = sal.create_engine('postgresql://foo:bar@localhost:5432/schema')

    writer("State {s} started  at: {n}"
           "\n".format(s=str(stcd).zfill(2), n=dt.now()))

    with engine.connect() as conn, conn.begin():
        procs.processor(conn, stcd, nrows, chunksize)

    writer("\tState {s} finished  at: {n}"
           "\n".format(s=str(stcd).zfill(2), n=dt.now()))

def main():
    nprocesses = 12
    maxproc = 1
    state_list = [(2, 113), (10, 119), (15, 84), (50, 112), (44, 110), (11, 37), (33, 197)]

    with open(console_out, 'w') as out:
        out.write("Starting at {n}\n".format(n=dt.now()))
        out.write("Using {p} processes..."
                  "\n".format(p=nprocesses))

    with Pool(processes=int(nprocesses), maxtasksperchild=maxproc) as pool:
        pool.map(func=conf_wrapper, iterable=state_list, chunksize=1)

    with open(console_out, 'a') as out:
        out.write("\nAll done at {n}".format(n=dt.now()))

文件

console\u out

中从未包含所有7种状态。它总是错过一个或多个状态。以下是最新运行的输出：

Starting at 2016-07-27 21:46:58.638587
Using 12 processes...
State 44 started  at: 2016-07-27 21:47:01.482322
State 02 started  at: 2016-07-27 21:47:01.497947
State 11 started  at: 2016-07-27 21:47:01.529198
State 10 started  at: 2016-07-27 21:47:01.497947
    State 11 finished  at: 2016-07-27 21:47:15.701207
    State 15 finished  at: 2016-07-27 21:47:24.123164
    State 44 finished  at: 2016-07-27 21:47:32.029489
    State 50 finished  at: 2016-07-27 21:47:51.203107
    State 10 finished  at: 2016-07-27 21:47:53.046876
    State 33 finished  at: 2016-07-27 21:47:58.156301
    State 02 finished  at: 2016-07-27 21:48:18.856979

All done at 2016-07-27 21:48:18.992277

为什么?

注意，操作系统是Windows Server 2012 R2

由于您在Windows上运行，工作进程不会继承任何内容。每个进程“从头开始”运行整个主程序

特别是，对于编写的代码，每个进程都有自己的

lock

实例，而这些实例彼此无关。简而言之，

lock

根本不提供任何进程间互斥

若要解决此问题，可以将

池

构造函数更改为调用每个进程一次的初始化函数，向该函数传递

锁（）的实例

。例如，像这样：

def init(L):
    global lock
    lock = L

然后将这些参数添加到

Pool（）

构造函数：

initializer=init, initargs=(Lock(),),

您不再需要：

lock = Lock()

线路

然后进程间互斥将按预期工作

没有锁如果您想将所有输出委托给一个writer进程，您可以跳过锁，而使用队列来为该进程提供数据[稍后查看不同版本]

def writer_process(q):
    with open(console_out, 'w') as out:
        while True:
            message = q.get()
            if message is None:
                break
            out.write(message)
            out.flush() # can't guess whether you really want this

然后将

writer（）

更改为：

def writer(message):
    q.put(message)

您还需要在

Pool

构造函数中使用

initializer=

和

initargs=

，以便所有进程使用相同的队列

只有一个进程应该运行

writer\u process（）

，并且可以作为

多处理.process的实例自行启动
最后，要让writer\u process（）
知道是时候退出了，什么时候该退出队列并返回，请运行
q.put(None)

在主要过程中
后来
OP选择了这个版本，因为他们需要同时用其他代码打开输出文件：
def writer_process(q):
    while True:
        message = q.get()
        if message == 'done':
            break
        else:
            with open(console_out, 'a') as out:
                out.write(message)

我不知道为什么终止哨兵被更改为“done”
。任何独特的价值观都能实现这一点<代码>无

是传统的。

哪种操作系统？如果您在Windows上运行，则需要更改代码以使

lock

按预期工作。Windows。不幸的是，

Pool

在全局命名空间中创建了

lock

，该命名空间由所有工作进程共享，对吗？通过magic，Windows上的进程之间不会共享任何内容。在每个进程中调用

init（）

函数，后者将每个进程中的进程全局名称

lock

（因为

global lock

语句）绑定到主进程中创建并传递的

lock（）

的单个实例。噢！我明白了。如果不是要求太多，您是否也可以编写一个使用mp.Manager和Queue而不是Lock之类的原语的答案？这将是我的首选方法，但当我尝试它时，它根本不起作用。所以我转而使用Lock，无意中问了这个问题。我个人会按照所示的思路使用

mp.Queue

，但在主进程的线程中运行它（不需要为它创建新进程）

t=threading.Thread（target=writer\u进程，args=（q，）；t、 start（）

然后在程序结束时

编写器（无）；t、 join（）

mp.Manager

工具可能很方便，但使用它们会带来很高的进程间通信开销，因此我很少发现它们最终是“值得的”。不，

q.get（）

当队列为空时，在向队列中添加某些内容之前，会阻塞它们。阻塞逻辑对添加到队列中的内容视而不见—它只等待添加一些内容<代码>无，一个空字符串，一个包含十亿个元素的列表。。。不管怎样。