为什么这个用于分布式计算的Python 0MQ脚本挂起在一个固定的输入大小上?
我最近开始学习。今天早些时候,我偶然发现了一个博客。在我读到的0MQ指南中谈到了它,所以我决定尝试一下 我决定让呼吸机通过0mq消息向工人发送大数组,而不是像原始代码那样只计算工人的数字乘积。下面是我在“实验”中使用的代码 正如下面的一条评论所指出的,每当我试图将变量string_length增加到大于3MB的数字时,代码都会挂起 典型症状:假设我们将字符串长度设置为4MB(即4194304),那么结果管理器可能会从一个工作者那里获得结果,然后代码只是暂停。htop显示2个岩芯的作用不大。Etherape网络流量监视器也显示lo接口上没有流量 到目前为止,环顾四周几个小时后,我还没有弄清楚是什么导致了这一问题,我希望能给你一两个提示,说明原因以及解决这一问题的方法。谢谢 我在戴尔笔记本电脑上运行Ubuntu11.04 64位,配备Intel Core due CPU、8GB RAM、80GB Intel X25MG2 SSD、Python 2.7.1+、libzmq1 2.1.10-1chl1~natty1、Python pyzmq 2.1.10-1chl1~natty1为什么这个用于分布式计算的Python 0MQ脚本挂起在一个固定的输入大小上?,python,parallel-processing,multiprocessing,distributed-computing,zeromq,Python,Parallel Processing,Multiprocessing,Distributed Computing,Zeromq,我最近开始学习。今天早些时候,我偶然发现了一个博客。在我读到的0MQ指南中谈到了它,所以我决定尝试一下 我决定让呼吸机通过0mq消息向工人发送大数组,而不是像原始代码那样只计算工人的数字乘积。下面是我在“实验”中使用的代码 正如下面的一条评论所指出的,每当我试图将变量string_length增加到大于3MB的数字时,代码都会挂起 典型症状:假设我们将字符串长度设置为4MB(即4194304),那么结果管理器可能会从一个工作者那里获得结果,然后代码只是暂停。htop显示2个岩芯的作用不大。Eth
import time
import zmq
from multiprocessing import Process, cpu_count
np = cpu_count()
pool_size = np
number_of_elements = 128
# Odd, why once the slen is bumped to 3MB or above, the code hangs?
string_length = 1024 * 1024 * 3
def create_inputs(nelem, slen, pb=True):
'''
Generates an array that contains nelem fix-sized (of slen bytes)
random strings and an accompanying array of hexdigests of the
former's elements. Both are returned in a tuple.
:type nelem: int
:param nelem: The desired number of elements in the to be generated
array.
:type slen: int
:param slen: The desired number of bytes of each array element.
:type pb: bool
:param pb: If True, displays a text progress bar during input array
generation.
'''
from os import urandom
import sys
import hashlib
if pb:
if nelem <= 64:
toolbar_width = nelem
chunk_size = 1
else:
toolbar_width = 64
chunk_size = nelem // toolbar_width
description = '%d random strings of %d bytes. ' % (nelem, slen)
s = ''.join(('Generating an array of ', description, '...\n'))
sys.stdout.write(s)
# create an ASCII progress bar
sys.stdout.write("[%s]" % (" " * toolbar_width))
sys.stdout.flush()
sys.stdout.write("\b" * (toolbar_width+1))
array = list()
hash4a = list()
try:
for i in range(nelem):
e = urandom(int(slen))
array.append(e)
h = hashlib.md5()
h.update(e)
he = h.hexdigest()
hash4a.append(he)
i += 1
if pb and i and i % chunk_size == 0:
sys.stdout.write("-")
sys.stdout.flush()
if pb:
sys.stdout.write("\n")
except MemoryError:
print('Memory Error: discarding existing arrays')
array = list()
hash4a = list()
finally:
return array, hash4a
# The "ventilator" function generates an array of nelem fix-sized (of slen
# bytes long) random strings, and sends the array down a zeromq "PUSH"
# connection to be processed by listening workers, in a round robin load
# balanced fashion.
def ventilator():
# Initialize a zeromq context
context = zmq.Context()
# Set up a channel to send work
ventilator_send = context.socket(zmq.PUSH)
ventilator_send.bind("tcp://127.0.0.1:5557")
# Give everything a second to spin up and connect
time.sleep(1)
# Create the input array
nelem = number_of_elements
slen = string_length
payloads = create_inputs(nelem, slen)
# Send an array to each worker
for num in range(np):
work_message = { 'num' : payloads }
ventilator_send.send_pyobj(work_message)
time.sleep(1)
# The "worker" functions listen on a zeromq PULL connection for "work"
# (array to be processed) from the ventilator, get the length of the array
# and send the results down another zeromq PUSH connection to the results
# manager.
def worker(wrk_num):
# Initialize a zeromq context
context = zmq.Context()
# Set up a channel to receive work from the ventilator
work_receiver = context.socket(zmq.PULL)
work_receiver.connect("tcp://127.0.0.1:5557")
# Set up a channel to send result of work to the results reporter
results_sender = context.socket(zmq.PUSH)
results_sender.connect("tcp://127.0.0.1:5558")
# Set up a channel to receive control messages over
control_receiver = context.socket(zmq.SUB)
control_receiver.connect("tcp://127.0.0.1:5559")
control_receiver.setsockopt(zmq.SUBSCRIBE, "")
# Set up a poller to multiplex the work receiver and control receiver channels
poller = zmq.Poller()
poller.register(work_receiver, zmq.POLLIN)
poller.register(control_receiver, zmq.POLLIN)
# Loop and accept messages from both channels, acting accordingly
while True:
socks = dict(poller.poll())
# If the message came from work_receiver channel, get the length
# of the array and send the answer to the results reporter
if socks.get(work_receiver) == zmq.POLLIN:
#work_message = work_receiver.recv_json()
work_message = work_receiver.recv_pyobj()
length = len(work_message['num'][0])
answer_message = { 'worker' : wrk_num, 'result' : length }
results_sender.send_json(answer_message)
# If the message came over the control channel, shut down the worker.
if socks.get(control_receiver) == zmq.POLLIN:
control_message = control_receiver.recv()
if control_message == "FINISHED":
print("Worker %i received FINSHED, quitting!" % wrk_num)
break
# The "results_manager" function receives each result from multiple workers,
# and prints those results. When all results have been received, it signals
# the worker processes to shut down.
def result_manager():
# Initialize a zeromq context
context = zmq.Context()
# Set up a channel to receive results
results_receiver = context.socket(zmq.PULL)
results_receiver.bind("tcp://127.0.0.1:5558")
# Set up a channel to send control commands
control_sender = context.socket(zmq.PUB)
control_sender.bind("tcp://127.0.0.1:5559")
for task_nbr in range(np):
result_message = results_receiver.recv_json()
print "Worker %i answered: %i" % (result_message['worker'], result_message['result'])
# Signal to all workers that we are finsihed
control_sender.send("FINISHED")
time.sleep(5)
if __name__ == "__main__":
# Create a pool of workers to distribute work to
for wrk_num in range(pool_size):
Process(target=worker, args=(wrk_num,)).start()
# Fire up our result manager...
result_manager = Process(target=result_manager, args=())
result_manager.start()
# Start the ventilator!
ventilator = Process(target=ventilator, args=())
ventilator.start()
导入时间
导入zmq
从多处理导入进程,cpu\U计数
np=cpu_计数()
池大小=np
元素的数量=128
#奇怪的是,为什么一旦slen达到3MB或更高,代码就会挂起?
字符串长度=1024*1024*3
def create_输入(nelem、slen、pb=True):
'''
生成包含nelem fix大小(slen字节数)的数组
的随机字符串和附带的十六进制摘要数组
前者的要素。两者都以元组的形式返回。
:type nelem:int
:param nelem:要生成的文件中所需的元素数
数组。
:类型slen:int
:param slen:每个数组元素所需的字节数。
:pb类型:bool
:param pb:如果为True,则在输入数组期间显示文本进度条
一代
'''
从操作系统导入urandom
导入系统
导入hashlib
如果pb:
如果nelem问题在于,呼吸机(推送)插座在发送完成之前已经关闭。呼吸机功能结束时,您的睡眠时间为1s
,不足以发送384MB的信息。这就是为什么你有阈值,如果睡眠时间短,那么阈值就会低
也就是说,LINGER应该阻止这种事情发生,所以我会用zeromq提出这个问题:PUSH似乎不尊重LINGER
对于您的特定示例(不添加不确定的长睡眠)的修复方法是使用与工人相同的结束信号来终止呼吸机。这样,你就可以保证你的呼吸机在需要的时候能存活多久
改良呼吸机:
def ventilator():
# Initialize a zeromq context
context = zmq.Context()
# Set up a channel to send work
ventilator_send = context.socket(zmq.PUSH)
ventilator_send.bind("tcp://127.0.0.1:5557")
# Set up a channel to receive control messages
control_receiver = context.socket(zmq.SUB)
control_receiver.connect("tcp://127.0.0.1:5559")
control_receiver.setsockopt(zmq.SUBSCRIBE, "")
# Give everything a second to spin up and connect
time.sleep(1)
# Create the input array
nelem = number_of_elements
slen = string_length
payloads = create_inputs(nelem, slen)
# Send an array to each worker
for num in range(np):
work_message = { 'num' : payloads }
ventilator_send.send_pyobj(work_message)
# Poll for FINISH message, so we don't shutdown too early
poller = zmq.Poller()
poller.register(control_receiver, zmq.POLLIN)
while True:
socks = dict(poller.poll())
if socks.get(control_receiver) == zmq.POLLIN:
control_message = control_receiver.recv()
if control_message == "FINISHED":
print("Ventilator received FINSHED, quitting!")
break
# else: unhandled message
我做了更多的实验:将元素的数量减少到64,将字符串长度增加到6。代码仍然运行良好。除此之外,出现了同样的症状。这让我相信pyzmq绑定中可能存在一个总的消息大小限制。0MQ C API有一个zmq_msg_init_size(3)函数,我在pyzmq的文档中找不到这个函数。这可能是原因吗?你能追踪到它挂在哪里吗?这可能会给你一个提示。我在我的mac笔记本电脑上用字符串_length=1024*1024*4尝试了你的代码,它运行得很好,所以我猜它一定与某种内存争用有关……然后再次运行,它冻结了。。。从“顶部”看,可用内存在0附近反弹,所以看起来0mq没有优化以处理这种大小的邮件。@Aaron Watters。我得出了与你相似的结论。但是,在我把自己的手指指向0MQ本身之前,我会在C++中找到一些时间来完成以上操作。我在快速浏览源代码时注意到,即使pyzmq使用zmq_msg_init_size(),它也不会公开它。想知道是否与功能,结果可能会有所不同?明克,许多人感谢有洞察力的答案。非常有帮助!我并不怀疑ZMQ_LINGER值是由ZMQ_setsockopt(3)设置的,因为正如您所说,默认值是-1(无限)。太棒了!我肯定会首先向pyzmq的人提出这个问题,并在zeromq邮件列表中提到它。我测试了您的修复程序,将字符串长度设置为1024*1024*10,使笔记本的物理内存达到最大,但仍然得到了预期的结果。再次感谢!也许不值得跟“pyzmq人”提出来,因为现在基本上就是我。我已经ping了libzmq,并用C编写了一个更简单的测试用例: