Python 具有两个参数的multiprocessing.pool.map和函数_Python_Multithreading_Concurrency_Multiprocessing

Python 具有两个参数的multiprocessing.pool.map和函数

python multithreading concurrency

Python 具有两个参数的multiprocessing.pool.map和函数,python,multithreading,concurrency,multiprocessing,Python,Multithreading,Concurrency,Multiprocessing,我正在使用multiprocessing.Pool（）以下是我想汇集的内容： def insert_and_process(file_to_process,db): db = DAL("path_to_mysql" + db) #Table Definations db.table.insert(**parse_file(file_to_process)) return True if __name__=="__main__": file_list=o

我正在使用

multiprocessing.Pool（）

以下是我想汇集的内容：

def insert_and_process(file_to_process,db):
    db = DAL("path_to_mysql" + db)
    #Table Definations
    db.table.insert(**parse_file(file_to_process))
    return True

if __name__=="__main__":
    file_list=os.listdir(".")
    P = Pool(processes=4)
    P.map(insert_and_process,file_list,db) # here having problem.

我想传递两个参数我想做的是只初始化4个DB连接（这里将尝试在每个函数调用上创建连接，因此可能会有数百万个连接，并导致IO冻结死亡）。如果我可以创建4个db连接，每个进程1个，那就可以了

游泳池有什么解决办法吗？还是我应该放弃它

编辑：

在你们两人的帮助下，我这样做：

args=zip(f,cycle(dbs))
Out[-]: 
[('f1', 'db1'),
 ('f2', 'db2'),
 ('f3', 'db3'),
 ('f4', 'db4'),
 ('f5', 'db1'),
 ('f6', 'db2'),
 ('f7', 'db3'),
 ('f8', 'db4'),
 ('f9', 'db1'),
 ('f10', 'db2'),
 ('f11', 'db3'),
 ('f12', 'db4')]

这就是它的工作原理，我将把DB连接代码移到主层并执行以下操作：

def process_and_insert(args):

    #Table Definations
    args[1].table.insert(**parse_file(args[0]))
    return True

if __name__=="__main__":
    file_list=os.listdir(".")
    P = Pool(processes=4)

    dbs = [DAL("path_to_mysql/database") for i in range(0,3)]
    args=zip(file_list,cycle(dbs))
    P.map(insert_and_process,args) # here having problem.

是的，我将对其进行测试并让大家知道。

您的池将生成四个进程，每个进程都由自己的Python解释器实例运行。可以使用全局变量保存数据库连接对象，以便每个进程只创建一个连接：

global_db = None

def insert_and_process(file_to_process, db):
    global global_db
    if global_db is None:
        # If this is the first time this function is called within this
        # process, create a new connection.  Otherwise, the global variable
        # already holds a connection established by a former call.
        global_db = DAL("path_to_mysql" + db)
    global_db.table.insert(**parse_file(file_to_process))
    return True

由于

Pool.map（）

和friends只支持一个参数的辅助函数，因此需要创建一个转发工作的包装器：

def insert_and_process_helper(args):
    return insert_and_process(*args)

if __name__ == "__main__":
    file_list=os.listdir(".")
    db = "wherever you get your db"
    # Create argument tuples for each function call:
    jobs = [(file, db) for file in file_list]
    P = Pool(processes=4)
    P.map(insert_and_process_helper, jobs)

Pool

文档中没有提到将多个参数传递给目标函数的方法-我尝试过只传递一个序列，但没有展开（每个参数对应一个序列项）

但是，您可以编写目标函数以期望第一个（也是唯一一个）参数为元组，其中每个元素都是您期望的参数之一：

from itertools import repeat

def insert_and_process((file_to_process,db)):
    db = DAL("path_to_mysql" + db)
    #Table Definations
    db.table.insert(**parse_file(file_to_process))
    return True

if __name__=="__main__":
    file_list=os.listdir(".")
    P = Pool(processes=4)
    P.map(insert_and_process,zip(file_list,repeat(db)))

（请注意，

insert\u和\u process

-python定义中的附加括号将其视为一个单参数，应为2项序列。序列的第一个元素归属于第一个变量，另一个归属于第二个变量）

无需使用zip。例如，如果您有两个参数x和y，并且每个参数都可以获得多个值，如：

X=range(1,6)
Y=range(10)

该函数应仅获取一个参数，并将其解压缩到：

def func(params):
    (x,y)=params
    ...

你这样称呼它：

params = [(x,y) for x in X for y in Y]
pool.map(func, params)

使用

您创建了

和

的完整副本，这可能比使用

from itertools import repeat
P.map(insert_and_process,zip(file_list,repeat(db)))

你可以用

from functools import partial

这方面的图书馆

像

及

请注意，Python3中的语法

def（（arg1，arg2））：

已经不存在了。@FerdinandBeyer:我已经忘记了。好吧，除非multiprocessing.Pool.map的实现在那里有所不同，否则方法是分配给单个参数，然后在函数中解包。谢谢，我让它工作了！我是通过zip（文件列表，循环（dbs））实现的。但是我不使用f（（arg1，arg2））。因为我使用了更多的代码，所以我选择了你！谢谢费迪南德，这已经接近我想要的了。我想做的是创建4个DB连接。每个进程有一个连接，但不是每个函数调用都有一个连接<代码>DAL（“数据库路径”）将创建数据库连接。一次单连接比四连接慢。我试过这些例子，当函数不必返回时，它工作得很好。。。；难道我们不能像my_var=P.map（insert_and_process_helper，jobs）这样做吗？如果我将第二个参数作为列表或集合，它会工作吗？

from functools import partial

func = partial(rdc, lat, lng)
r = pool.map(func, range(8))

def rdc(lat,lng,x):
    pass