MapReduce代码mpi超时
我使用的是MapReduce代码,但我遇到了一个问题:在映射完成1小时后,代码没有响应。我深入研究了代码,发现这个函数没有响应:MapReduce代码mpi超时,mapreduce,mpi,mpi4py,Mapreduce,Mpi,Mpi4py,我使用的是MapReduce代码,但我遇到了一个问题:在映射完成1小时后,代码没有响应。我深入研究了代码,发现这个函数没有响应: def wait(self, running, tag): """Test if any worker has finished its job. If so, decrease its key and make it available """ atimer = Timer('Wait') inittime = time() status = MPI
def wait(self, running, tag): """Test if any worker has finished its job. If so, decrease its key and make it available """
atimer = Timer('Wait')
inittime = time()
status = MPI.Status()
while time() - inittime < self.config['jobwait']:
if world.Iprobe(source=MPI.ANY_SOURCE,tag=tag,status=status):
jobf = world.recv(source=status.source, tag=tag)
idx = 0
for ii, worker in enumerate(self.workers):
if worker.id == status.source: idx = ii; break
if self.config['verbosity'] >= 8:
print('Freeing worker '+str(self.workers[idx].id))
worker = self.workers[idx]
# faulty worker's job has already been cleaned
if not worker.isFaulty():
del running[jobf]
else:
self.nActive += 1
worker.setFree()
heapq._siftup(self.workers, idx)
我想知道在mpi4py中Iprobe是否有超时,以及如何设置超时时间?Iprobe是否有其他替代品在这里扮演同样的角色
下面是通过.send发送消息的上一个函数
def execTask(self, task):
"""Wrapper function calling mapping/reducing/finalizing phase tasks,
dispatch tasks to workers until all finished and collect feedback.
Faulty workers are removed from active duty work list.
"""
atimer = Timer(task)
print( 'Entering {0:s} phase...'.format(task) )
taskDict = { 'Map':(self.mapIn, MAP_START, MAP_FINISH), \
'Init':(self.mapIn, INIT_START, MAP_FINISH), \
'Reduce':(self.reduceIn, REDUCE_START, REDUCE_FINISH) }
# line up jobs and workers into priority queues
jobs = taskDict[task][0][:]
heapq.heapify(jobs); running = {}
heapq.heapify(self.workers)
while (jobs or running) and self.nActive > 0:
# dispatch all jobs to all free workers
while jobs and self.workers[0].isFree():
job = heapq.heappop(jobs)
worker = heapq.heappop(self.workers)
world.send(job, dest=worker.id, tag=taskDict[task][1])
print('hi')
print job
worker.setBusy(); heapq.heappush(self.workers, worker)
running[job] = (time(), worker)
if self.config['verbosity'] >= 6:
print('Dispatching file '+os.path.basename(job)+' to worker '+str(worker.id))
# if no more free workers, break
if not self.workers[0].isFree(): break
# wait for finishing workers as well as do cleaning
self.wait(running, taskDict[task][2])
# print running
self.clean(running, jobs)
print( '{0:s} phase completed'.format(task) )
整个代码可以在这里看到:
真的只是挂断电话吗?它真的不应该这样做,因为IPROBE是一个无阻塞调用,不需要在网络上取得进展。你确定它不是在那里花很多时间,让你的痕迹看起来不好吗?如果是这样的话,你的问题是相应的消息没有被发送。是的,它挂在那里并被卡住了,我确信在地图完成1小时后,它就被卡住了,你有什么解决办法吗?@WesleyBland如果像你说的,消息没有被发送,解决办法是什么?@WesleyBland,你有什么建议吗?这可能意味着你没有发送信息。问题很少是图书馆行为不端。这可能是您的代码有问题。不幸的是,这个问题没有包含任何内容。我鼓励您阅读指南,以获得关于如何发布一个不太长的好代码示例的建议。
def execTask(self, task):
"""Wrapper function calling mapping/reducing/finalizing phase tasks,
dispatch tasks to workers until all finished and collect feedback.
Faulty workers are removed from active duty work list.
"""
atimer = Timer(task)
print( 'Entering {0:s} phase...'.format(task) )
taskDict = { 'Map':(self.mapIn, MAP_START, MAP_FINISH), \
'Init':(self.mapIn, INIT_START, MAP_FINISH), \
'Reduce':(self.reduceIn, REDUCE_START, REDUCE_FINISH) }
# line up jobs and workers into priority queues
jobs = taskDict[task][0][:]
heapq.heapify(jobs); running = {}
heapq.heapify(self.workers)
while (jobs or running) and self.nActive > 0:
# dispatch all jobs to all free workers
while jobs and self.workers[0].isFree():
job = heapq.heappop(jobs)
worker = heapq.heappop(self.workers)
world.send(job, dest=worker.id, tag=taskDict[task][1])
print('hi')
print job
worker.setBusy(); heapq.heappush(self.workers, worker)
running[job] = (time(), worker)
if self.config['verbosity'] >= 6:
print('Dispatching file '+os.path.basename(job)+' to worker '+str(worker.id))
# if no more free workers, break
if not self.workers[0].isFree(): break
# wait for finishing workers as well as do cleaning
self.wait(running, taskDict[task][2])
# print running
self.clean(running, jobs)
print( '{0:s} phase completed'.format(task) )