Python 在多处理中使用锁时遇到问题。池：酸洗错误_Python_Locking_Multiprocessing

Python 在多处理中使用锁时遇到问题。池：酸洗错误

python

Python 在多处理中使用锁时遇到问题。池：酸洗错误,python,locking,multiprocessing,Python,Locking,Multiprocessing,我正在构建一个python模块，从大量文本中提取标记，虽然结果质量很高，但执行速度非常慢。我正试图通过使用多处理来加速进程，这也起到了作用，直到我尝试引入一个锁，这样一次只有一个进程连接到我们的数据库。我一辈子都不知道该怎么做——尽管进行了大量的搜索和调整，我仍然得到了一个PicklingError:can't pickle:attribute lookup thread.lock failed。这是一段令人不快的代码——在我尝试将一个锁对象作为f的参数传递之前，它工作得很好 def make_

我正在构建一个python模块，从大量文本中提取标记，虽然结果质量很高，但执行速度非常慢。我正试图通过使用多处理来加速进程，这也起到了作用，直到我尝试引入一个锁，这样一次只有一个进程连接到我们的数据库。我一辈子都不知道该怎么做——尽管进行了大量的搜索和调整，我仍然得到了一个

PicklingError:can't pickle:attribute lookup thread.lock failed

。这是一段令人不快的代码——在我尝试将一个锁对象作为

的参数传递之前，它工作得很好

def make_network(initial_tag, max_tags = 2, max_iter = 3):
    manager = Manager()
    lock = manager.Lock()
    pool = manager.Pool(8)

    # this is a very expensive function that I would like to parallelize 
    # over a list of tags. It involves a (relatively cheap) call to an external
    # database, which needs a lock to avoid simultaneous queries. It takes a list
    # of strings (tags) as its sole argument, and returns a list of sets with entries
    # corresponding to the input list.
    f = partial(get_more_tags, max_tags = max_tags, lock = lock) 

    def _recursively_find_more_tags(tags, level):
        if level >= max_iter:
            raise StopIteration
        new_tags = pool.map(f, tags)
        to_search = []
        for i, s in zip(tags, new_tags):
            for t in s:
                joined = ' '.join(t)
                print i + "|" + joined
                to_search.append(joined)
        try:
            return _recursively_find_more_tags(to_search, level+1)
        except StopIteration:
            return None

    _recursively_find_more_tags([initial_tag], 0)

您的问题是锁定对象不可拾取。在这种情况下，我可以为您找到两种可能的解决方案

为了避免这种情况，可以将锁变量设置为全局变量。这样，您就可以在池进程函数中直接将其作为全局变量引用，而不必将其作为参数传递给池进程函数。这是因为Python在创建池进程时使用了

OS fork

机制，因此将创建池进程的进程的全部内容复制到池进程。这是向使用多处理包创建的Python进程传递锁的唯一方法。顺便说一句，不必仅为该锁使用

管理器

类。通过此更改，您的代码将如下所示：

import multiprocessing
from functools import partial

lock = None  # Global definition of lock
pool = None  # Global definition of pool


def make_network(initial_tag, max_tags=2, max_iter=3):
    global lock
    global pool
    lock = multiprocessing.Lock()
    pool = multiprocessing.Pool(8)


def get_more_tags():
    global lock
    pass


# this is a very expensive function that I would like to parallelize
# over a list of tags. It involves a (relatively cheap) call to an external
# database, which needs a lock to avoid simultaneous queries. It takes a
# list of strings (tags) as its sole argument, and returns a list of sets
# with entries corresponding to the input list.
f = partial(get_more_tags, max_tags=max_tags) 

def _recursively_find_more_tags(tags, level):
    global pool
    if level >= max_iter:
        raise StopIteration
    new_tags = pool.map(f, tags)
    to_search = []
    for i, s in zip(tags, new_tags):
        for t in s:
            joined = ' '.join(t)
            print(i + "|" + joined)
            to_search.append(joined)
    try:
        return _recursively_find_more_tags(to_search, level + 1)
    except StopIteration:
        return None

_recursively_find_more_tags([initial_tag], 0)

在实际代码中，锁和池变量可能是类实例变量

第二种解决方案可以完全避免使用锁，但开销可能稍高，即使用
```
多处理.process
```
创建另一个进程，并通过
```
多处理.Queue
```
将其连接到每个池进程。此进程将负责运行数据库查询。您可以使用队列来允许池进程向管理数据库查询的进程发送参数。由于所有池进程都将使用相同的队列，因此对数据库的访问将自动序列化。额外的开销将来自数据库查询参数和查询响应的pickle/unpickle。请注意，您可以将
```
多处理.Queue
```
对象作为参数传递给池进程。还请注意，基于
```
多处理.Lock
```
的解决方案在
```
窗口
```
上不起作用，因为该窗口中的进程不是使用
```
fork
```
语义创建的

您是在Windows上运行还是在Linux上运行？我是在Linux上运行，很抱歉我忘了添加它！