Python中更好的并行处理示例_Python_Parallel Processing_Python Multiprocessing_Concurrent.futures

Python中更好的并行处理示例

python parallel-processing

Python中更好的并行处理示例,python,parallel-processing,python-multiprocessing,concurrent.futures,Python,Parallel Processing,Python Multiprocessing,Concurrent.futures,我希望这次我没有被否决。我已经在Python中挣扎了一段时间（确切地说是2天）。我已经检查了这些资源（部分列表如下所示：（a）（b）我来的不顺利。我想做的是：大师： Break up the file into chunks(strings or numbers) Broadcast a pattern to be searched to all the workers Receive the offsets in the file where the pattern was found

我希望这次我没有被否决。我已经在Python中挣扎了一段时间（确切地说是2天）。我已经检查了这些资源（部分列表如下所示：

（a）

（b）

我来的不顺利。我想做的是：

大师：

Break up the file into chunks(strings or numbers)
Broadcast a pattern to be searched to all the workers
Receive the offsets in the file where the pattern was found

工人：

Receive pattern and chunk of text from the master
Compute()
Send back the offsets to the master.

我试图使用MPI/concurrent.futures/multiprocessing实现这一点，但失败了

我使用多处理模块的简单实现

import multiprocessing

filename = "file1.txt"
pat = "afow"
N = 1000

""" This is the naive string search algorithm"""

def search(pat, txt):

    patLen = len(pat)
    txtLen = len(txt)
    offsets = []

    # A loop to slide pattern[] one by one
    # Range generates numbers up to but not including that number
    for i in range ((txtLen - patLen) + 1):

    # Can not use a for loop here
    # For loops in C with && statements must be
    # converted to while statements in python
        counter = 0
        while(counter < patLen) and pat[counter] == txt[counter + i]:
           counter += 1
           if counter >= patLen:
               offsets.append(i)
        return str(offsets).strip('[]')

       """"
       This is what I want 
if __name__ == "__main__":
     tasks = []
     pool_outputs = []
     pool = multiprocessing.Pool(processes=5)
     with open(filename, 'r') as infile:
           lines = []
           for line in infile:
                lines.append(line.rstrip())
                if len(lines) > N:
                     pool_output = pool.map(search, tasks)
                     pool_outputs.append(pool_output)
                     lines = []
                if len(lines) > 0:
                     pool_output = pool.map(search, tasks)
                     pool_outputs.append(pool_output)
     pool.close()
     pool.join()
     print('Pool:', pool_outputs)
         """""

with open(filename, 'r') as infile:
    for line in infile:
        print(search(pat, line))

导入多处理
filename=“file1.txt”
pat=“afow”
N=1000
“”“这是简单的字符串搜索算法”“”
def搜索（pat，txt）：
patLen=len（pat）
txtLen=len（txt）
偏移量=[]
#一个循环到幻灯片模式[]一个接一个
#范围生成的数字最多为但不包括该数字
对于范围内的i（（txtLen-patLen）+1）：
#这里不能使用for循环
#C中带有&&语句的For循环必须为
#转换为python中的while语句
计数器=0
而（计数器=patLen：
追加（一）
返回str（偏移量）.strip（“[]”）
""""
这就是我想要的
如果名称=“\uuuuu main\uuuuuuuu”：
任务=[]
池_输出=[]
池=多处理。池（进程=5）
打开（文件名为“r”）作为填充：
行=[]
对于填充中的线：
line.append（line.rstrip（））
如果len（行）>N：
pool\u output=pool.map（搜索、任务）
池输出。追加（池输出）
行=[]
如果len（线）>0：
pool\u output=pool.map（搜索、任务）
池输出。追加（池输出）
pool.close（）
pool.join（）
打印（'池：'，池输出）
"""""
打开（文件名为“r”）作为填充：
对于填充中的线：
打印（搜索（pat，line））

我将非常感谢您的指导，特别是与concurrent.futures的指导。感谢您的时间。Valeriy帮助我添加了他，我为此感谢他

但是如果有人能让我放纵一下的话，这就是我为concurrent.futures编写的代码（根据我在某处看到的一个示例编写）

来自concurrent.futures导入ProcessPoolExecutor的

，已完成
输入数学
def搜索（pat，txt）：
patLen=len（pat）
txtLen=len（txt）
偏移量=[]
#一个循环到幻灯片模式[]一个接一个
#范围生成的数字最多为但不包括该数字
对于范围内的i（（txtLen-patLen）+1）：
#这里不能使用for循环
#C中带有&&语句的For循环必须为
#转换为python中的while语句
计数器=0
而（计数器=patLen：
追加（一）
返回str（偏移量）.strip（“[]”）
#检查字符串列表
def分块_工作程序（行）：
返回{0:search（“fmo”，line）以查找行中的行}
def pool_bruteforce（文件名，NPROC）：
行=[]
打开（文件名）为f时：
lines=[line.rstrip（'\n'）表示f中的行]
chunksize=int（math.ceil（len（行）/float（nprocs）））
期货=[]
以ProcessPoolExecutor（）作为执行器：
对于范围内的i（NPROC）：
chunk=行[（chunksize*i）：（chunksize*（i+1））]
futures.append（executor.submit（chunked_worker，chunk））
结果ct={}
对于已完成的f（期货）：
resultDisct.update（f.result（））
返回结果
filename=“file1.txt”
pool_bruteforce（文件名，5）

再次感谢Valeriy和任何试图帮助我解开谜语的人。

您使用了几个论点，因此：

import multiprocessing
from functools import partial
filename = "file1.txt"
pat = "afow"
N = 1000

""" This is the naive string search algorithm"""

def search(pat, txt):
    patLen = len(pat)
    txtLen = len(txt)
    offsets = []

    # A loop to slide pattern[] one by one
    # Range generates numbers up to but not including that number
    for i in range ((txtLen - patLen) + 1):

    # Can not use a for loop here
    # For loops in C with && statements must be
    # converted to while statements in python
        counter = 0
        while(counter < patLen) and pat[counter] == txt[counter + i]:
           counter += 1
           if counter >= patLen:
               offsets.append(i)
        return str(offsets).strip('[]')


if __name__ == "__main__":
     tasks = []
     pool_outputs = []
     pool = multiprocessing.Pool(processes=5)
     lines = []
     with open(filename, 'r') as infile:
         for line in infile:
             lines.append(line.rstrip())                 
     tasks = lines
     func = partial(search, pat)
     if len(lines) > N:
        pool_output = pool.map(func, lines )
        pool_outputs.append(pool_output)     
     elif len(lines) > 0:
        pool_output = pool.map(func, lines )
        pool_outputs.append(pool_output)
     pool.close()
     pool.join()
     print('Pool:', pool_outputs)

导入多处理
从functools导入部分
filename=“file1.txt”
pat=“afow”
N=1000
“”“这是简单的字符串搜索算法”“”
def搜索（pat，txt）：
patLen=len（pat）
txtLen=len（txt）
偏移量=[]
#一个循环到幻灯片模式[]一个接一个
#范围生成的数字最多为但不包括该数字
对于范围内的i（（txtLen-patLen）+1）：
#这里不能使用for循环
#C中带有&&语句的For循环必须为
#转换为python中的while语句
计数器=0
而（计数器=patLen：
追加（一）
返回str（偏移量）.strip（“[]”）
如果名称=“\uuuuu main\uuuuuuuu”：
任务=[]
池_输出=[]
池=多处理。池（进程=5）
行=[]
打开（文件名为“r”）作为填充：
对于填充中的线：
line.append（line.rstrip（））
任务=行
func=部分（搜索，pat）
如果len（行）>N：
pool_output=pool.map（func，行）
池输出。追加（池输出）
elif len（线）>0:
pool_output=pool.map（func，行）
池输出。追加（池输出）
pool.close（）
pool.join（）
打印（'池：'，池输出）

Valeriy:谢谢。partial做了什么？你知道有什么资源可以彻底解决python中的并行处理问题吗？再次感谢。Valeriy:我读过，但不能真正理解。很抱歉，我的意思是作为函数中的适当示例。谢谢。

import multiprocessing
from functools import partial
filename = "file1.txt"
pat = "afow"
N = 1000

""" This is the naive string search algorithm"""

def search(pat, txt):
    patLen = len(pat)
    txtLen = len(txt)
    offsets = []

    # A loop to slide pattern[] one by one
    # Range generates numbers up to but not including that number
    for i in range ((txtLen - patLen) + 1):

    # Can not use a for loop here
    # For loops in C with && statements must be
    # converted to while statements in python
        counter = 0
        while(counter < patLen) and pat[counter] == txt[counter + i]:
           counter += 1
           if counter >= patLen:
               offsets.append(i)
        return str(offsets).strip('[]')


if __name__ == "__main__":
     tasks = []
     pool_outputs = []
     pool = multiprocessing.Pool(processes=5)
     lines = []
     with open(filename, 'r') as infile:
         for line in infile:
             lines.append(line.rstrip())                 
     tasks = lines
     func = partial(search, pat)
     if len(lines) > N:
        pool_output = pool.map(func, lines )
        pool_outputs.append(pool_output)     
     elif len(lines) > 0:
        pool_output = pool.map(func, lines )
        pool_outputs.append(pool_output)
     pool.close()
     pool.join()
     print('Pool:', pool_outputs)