Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/16.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 卡桑德拉多处理can';t pickle\u thread.lock对象_Python_Python 3.x_Cassandra_Multiprocessing_Cql - Fatal编程技术网

Python 卡桑德拉多处理can';t pickle\u thread.lock对象

Python 卡桑德拉多处理can';t pickle\u thread.lock对象,python,python-3.x,cassandra,multiprocessing,cql,Python,Python 3.x,Cassandra,Multiprocessing,Cql,我尝试使用Cassandra和multiprocessing根据中的示例并发插入行(虚拟数据) 这是我的密码 class QueryManager(object): concurrency = 100 # chosen to match the default in execute_concurrent_with_args def __init__(self, session, process_count=None): self.pool = Pool(processes=pro

我尝试使用
Cassandra
multiprocessing
根据中的示例并发插入行(虚拟数据)

这是我的密码

class QueryManager(object):

concurrency = 100  # chosen to match the default in execute_concurrent_with_args

def __init__(self, session, process_count=None):
    self.pool = Pool(processes=process_count, initializer=self._setup, initargs=(session,))

@classmethod
def _setup(cls, session):
    cls.session = session
    cls.prepared = cls.session.prepare("""
INSERT INTO test_table (key1, key2, key3, key4, key5) VALUES (?, ?, ?, ?, ?)
""")

def close_pool(self):
    self.pool.close()
    self.pool.join()

def get_results(self, params):
    results = self.pool.map(_multiprocess_write, (params[n:n+self.concurrency] for n in range(0, len(params), self.concurrency)))
    return list(itertools.chain(*results))

@classmethod
def _results_from_concurrent(cls, params):
    return [results[1] for results in execute_concurrent_with_args(cls.session, cls.prepared, params)]


def _multiprocess_write(params):
    return QueryManager._results_from_concurrent(params)


if __name__ == '__main__':

    processes = 2

    # connect cluster
    cluster = Cluster(contact_points=['127.0.0.1'], port=9042)
    session = cluster.connect()

    # database name is a concatenation of client_id and system_id
    keyspace_name = 'unit_test_0'

    # drop keyspace if it already exists in a cluster
    try:
        session.execute("DROP KEYSPACE IF EXISTS " + keyspace_name)
    except:
        pass

    create_keyspace_query = "CREATE KEYSPACE " + keyspace_name \
                        + " WITH replication = {'class': 'SimpleStrategy',    'replication_factor': '1'};"
    session.execute(create_keyspace_query)

    # use a session's keyspace
    session.set_keyspace(keyspace_name)

    # drop table if it already exists in the keyspace
    try:
        session.execute("DROP TABLE IF EXISTS " + "test_table")
    except:
        pass

    # create a table for invoices in the keyspace
    create_test_table = "CREATE TABLE test_table("

    keys = "key1 text,\n" \
           "key2 text,\n" \
           "key3 text,\n" \
           "key4 text,\n" \
           "key5 text,\n"

    create_invoice_table_query += keys
    create_invoice_table_query += "PRIMARY KEY (key1))"
    session.execute(create_test_table)

    qm = QueryManager(session, processes)

    params = list()
    for row in range(100000):
        key = 'test' + str(row)
        params.append([key, 'test', 'test', 'test', 'test'])

    start = time.time()
    rows = qm.get_results(params)
    delta = time.time() - start
    log.info(fm('Cassandra inserts 100k dummy rows for ', delta, ' secs'))
当我执行代码时,我得到了以下错误

TypeError: can't pickle _thread.lock objects
指的是

self.pool = Pool(processes=process_count, initializer=self._setup, initargs=(session,))

这表明您正在尝试序列化IPC边界上的锁。我认为这可能是因为您提供了一个会话对象作为worker初始化函数的参数。使init函数在每个工作进程中创建一个新会话(请参阅您引用的中的“每个进程的会话”部分)。

我知道这已经有了答案,但我想强调一下cassandra驱动程序包中的一些更改,这些更改使此代码仍然无法与python 3.7和3.18.0 cassandra驱动程序包一起正常工作

如果你看链接的博客文章。
\uuuu init\uuuu
函数不会在
会话中传递,而是传递一个
集群
对象。即使是
集群
也不能再作为initarg发送,因为它包含锁。您需要在
def\u设置(cls)中创建它:
classmethod

其次,
execute\u concurrent\u with_args
立即返回一个结果集,该结果集也无法序列化。旧版本的cassandra驱动程序包只返回了一个对象列表

要修复上述代码,请更改以下两个部分:

首先是
\uuuu init\uuuu
\u setup
方法

def __init__(self, process_count=None):
    self.pool = Pool(processes=process_count, initializer=self._setup)

@classmethod
def _setup(cls):
    cluster = Cluster()
    cls.session = cluster.connect()
    cls.prepared = cls.session.prepare("""
        INSERT INTO test_table (key1, key2, key3, key4, key5) VALUES (?, ?, ?, ?, ?)
        """)
其次,
\u结果来自于\u concurrent
方法

@classmethod
def _results_from_concurrent(cls, params):
    return [list(results[1]) for results in execute_concurrent_with_args(cls.session, cls.prepared, params)]

最后,如果您对使用python3和cassandra driver 3.18.0的原始DataStax博客文章中的multi-process_execute.py的要点感兴趣,您可以在此处找到:

其他人可能获得的帮助: