Python中更好的并行处理示例
我希望这次我没有被否决。我已经在Python中挣扎了一段时间(确切地说是2天)。我已经检查了这些资源(部分列表如下所示: (a) (b) 我来的不顺利。我想做的是: 大师:Python中更好的并行处理示例,python,parallel-processing,python-multiprocessing,concurrent.futures,Python,Parallel Processing,Python Multiprocessing,Concurrent.futures,我希望这次我没有被否决。我已经在Python中挣扎了一段时间(确切地说是2天)。我已经检查了这些资源(部分列表如下所示: (a) (b) 我来的不顺利。我想做的是: 大师: Break up the file into chunks(strings or numbers) Broadcast a pattern to be searched to all the workers Receive the offsets in the file where the pattern was found
Break up the file into chunks(strings or numbers)
Broadcast a pattern to be searched to all the workers
Receive the offsets in the file where the pattern was found
工人:
Receive pattern and chunk of text from the master
Compute()
Send back the offsets to the master.
我试图使用MPI/concurrent.futures/multiprocessing实现这一点,但失败了
我使用多处理模块的简单实现
import multiprocessing
filename = "file1.txt"
pat = "afow"
N = 1000
""" This is the naive string search algorithm"""
def search(pat, txt):
patLen = len(pat)
txtLen = len(txt)
offsets = []
# A loop to slide pattern[] one by one
# Range generates numbers up to but not including that number
for i in range ((txtLen - patLen) + 1):
# Can not use a for loop here
# For loops in C with && statements must be
# converted to while statements in python
counter = 0
while(counter < patLen) and pat[counter] == txt[counter + i]:
counter += 1
if counter >= patLen:
offsets.append(i)
return str(offsets).strip('[]')
""""
This is what I want
if __name__ == "__main__":
tasks = []
pool_outputs = []
pool = multiprocessing.Pool(processes=5)
with open(filename, 'r') as infile:
lines = []
for line in infile:
lines.append(line.rstrip())
if len(lines) > N:
pool_output = pool.map(search, tasks)
pool_outputs.append(pool_output)
lines = []
if len(lines) > 0:
pool_output = pool.map(search, tasks)
pool_outputs.append(pool_output)
pool.close()
pool.join()
print('Pool:', pool_outputs)
"""""
with open(filename, 'r') as infile:
for line in infile:
print(search(pat, line))
导入多处理
filename=“file1.txt”
pat=“afow”
N=1000
“”“这是简单的字符串搜索算法”“”
def搜索(pat,txt):
patLen=len(pat)
txtLen=len(txt)
偏移量=[]
#一个循环到幻灯片模式[]一个接一个
#范围生成的数字最多为但不包括该数字
对于范围内的i((txtLen-patLen)+1):
#这里不能使用for循环
#C中带有&&语句的For循环必须为
#转换为python中的while语句
计数器=0
而(计数器=patLen:
追加(一)
返回str(偏移量).strip(“[]”)
""""
这就是我想要的
如果名称=“\uuuuu main\uuuuuuuu”:
任务=[]
池_输出=[]
池=多处理。池(进程=5)
打开(文件名为“r”)作为填充:
行=[]
对于填充中的线:
line.append(line.rstrip())
如果len(行)>N:
pool\u output=pool.map(搜索、任务)
池输出。追加(池输出)
行=[]
如果len(线)>0:
pool\u output=pool.map(搜索、任务)
池输出。追加(池输出)
pool.close()
pool.join()
打印('池:',池输出)
"""""
打开(文件名为“r”)作为填充:
对于填充中的线:
打印(搜索(pat,line))
我将非常感谢您的指导,特别是与concurrent.futures的指导。感谢您的时间。Valeriy帮助我添加了他,我为此感谢他
但是如果有人能让我放纵一下的话,这就是我为concurrent.futures编写的代码(根据我在某处看到的一个示例编写)
来自concurrent.futures导入ProcessPoolExecutor的,已完成
输入数学
def搜索(pat,txt):
patLen=len(pat)
txtLen=len(txt)
偏移量=[]
#一个循环到幻灯片模式[]一个接一个
#范围生成的数字最多为但不包括该数字
对于范围内的i((txtLen-patLen)+1):
#这里不能使用for循环
#C中带有&&语句的For循环必须为
#转换为python中的while语句
计数器=0
而(计数器=patLen:
追加(一)
返回str(偏移量).strip(“[]”)
#检查字符串列表
def分块_工作程序(行):
返回{0:search(“fmo”,line)以查找行中的行}
def pool_bruteforce(文件名,NPROC):
行=[]
打开(文件名)为f时:
lines=[line.rstrip('\n')表示f中的行]
chunksize=int(math.ceil(len(行)/float(nprocs)))
期货=[]
以ProcessPoolExecutor()作为执行器:
对于范围内的i(NPROC):
chunk=行[(chunksize*i):(chunksize*(i+1))]
futures.append(executor.submit(chunked_worker,chunk))
结果ct={}
对于已完成的f(期货):
resultDisct.update(f.result())
返回结果
filename=“file1.txt”
pool_bruteforce(文件名,5)
再次感谢Valeriy和任何试图帮助我解开谜语的人。您使用了几个论点,因此:
import multiprocessing
from functools import partial
filename = "file1.txt"
pat = "afow"
N = 1000
""" This is the naive string search algorithm"""
def search(pat, txt):
patLen = len(pat)
txtLen = len(txt)
offsets = []
# A loop to slide pattern[] one by one
# Range generates numbers up to but not including that number
for i in range ((txtLen - patLen) + 1):
# Can not use a for loop here
# For loops in C with && statements must be
# converted to while statements in python
counter = 0
while(counter < patLen) and pat[counter] == txt[counter + i]:
counter += 1
if counter >= patLen:
offsets.append(i)
return str(offsets).strip('[]')
if __name__ == "__main__":
tasks = []
pool_outputs = []
pool = multiprocessing.Pool(processes=5)
lines = []
with open(filename, 'r') as infile:
for line in infile:
lines.append(line.rstrip())
tasks = lines
func = partial(search, pat)
if len(lines) > N:
pool_output = pool.map(func, lines )
pool_outputs.append(pool_output)
elif len(lines) > 0:
pool_output = pool.map(func, lines )
pool_outputs.append(pool_output)
pool.close()
pool.join()
print('Pool:', pool_outputs)
导入多处理
从functools导入部分
filename=“file1.txt”
pat=“afow”
N=1000
“”“这是简单的字符串搜索算法”“”
def搜索(pat,txt):
patLen=len(pat)
txtLen=len(txt)
偏移量=[]
#一个循环到幻灯片模式[]一个接一个
#范围生成的数字最多为但不包括该数字
对于范围内的i((txtLen-patLen)+1):
#这里不能使用for循环
#C中带有&&语句的For循环必须为
#转换为python中的while语句
计数器=0
而(计数器=patLen:
追加(一)
返回str(偏移量).strip(“[]”)
如果名称=“\uuuuu main\uuuuuuuu”:
任务=[]
池_输出=[]
池=多处理。池(进程=5)
行=[]
打开(文件名为“r”)作为填充:
对于填充中的线:
line.append(line.rstrip())
任务=行
func=部分(搜索,pat)
如果len(行)>N:
pool_output=pool.map(func,行)
池输出。追加(池输出)
elif len(线)>0:
pool_output=pool.map(func,行)
池输出。追加(池输出)
pool.close()
pool.join()
打印('池:',池输出)
Valeriy:谢谢。partial做了什么?你知道有什么资源可以彻底解决python中的并行处理问题吗?再次感谢。Valeriy:我读过,但不能真正理解。很抱歉,我的意思是作为函数中的适当示例。谢谢。
import multiprocessing
from functools import partial
filename = "file1.txt"
pat = "afow"
N = 1000
""" This is the naive string search algorithm"""
def search(pat, txt):
patLen = len(pat)
txtLen = len(txt)
offsets = []
# A loop to slide pattern[] one by one
# Range generates numbers up to but not including that number
for i in range ((txtLen - patLen) + 1):
# Can not use a for loop here
# For loops in C with && statements must be
# converted to while statements in python
counter = 0
while(counter < patLen) and pat[counter] == txt[counter + i]:
counter += 1
if counter >= patLen:
offsets.append(i)
return str(offsets).strip('[]')
if __name__ == "__main__":
tasks = []
pool_outputs = []
pool = multiprocessing.Pool(processes=5)
lines = []
with open(filename, 'r') as infile:
for line in infile:
lines.append(line.rstrip())
tasks = lines
func = partial(search, pat)
if len(lines) > N:
pool_output = pool.map(func, lines )
pool_outputs.append(pool_output)
elif len(lines) > 0:
pool_output = pool.map(func, lines )
pool_outputs.append(pool_output)
pool.close()
pool.join()
print('Pool:', pool_outputs)