MapReduce代码mpi超时

MapReduce代码mpi超时,mapreduce,mpi,mpi4py,Mapreduce,Mpi,Mpi4py,我使用的是MapReduce代码,但我遇到了一个问题:在映射完成1小时后,代码没有响应。我深入研究了代码,发现这个函数没有响应: def wait(self, running, tag): """Test if any worker has finished its job. If so, decrease its key and make it available """ atimer = Timer('Wait') inittime = time() status = MPI

我使用的是MapReduce代码,但我遇到了一个问题:在映射完成1小时后,代码没有响应。我深入研究了代码,发现这个函数没有响应:

def wait(self, running, tag): """Test if any worker has finished its job. If so, decrease its key and make it available """ 
  atimer = Timer('Wait')

  inittime = time()
  status = MPI.Status()
  while time() - inittime < self.config['jobwait']:
      if world.Iprobe(source=MPI.ANY_SOURCE,tag=tag,status=status):
          jobf = world.recv(source=status.source, tag=tag)
          idx = 0
          for ii, worker in enumerate(self.workers):
              if worker.id == status.source: idx = ii; break
          if self.config['verbosity'] >= 8:
              print('Freeing worker '+str(self.workers[idx].id))
          worker = self.workers[idx]

          # faulty worker's job has already been cleaned
          if not worker.isFaulty():
              del running[jobf]
          else:
              self.nActive += 1
          worker.setFree()
          heapq._siftup(self.workers, idx)
我想知道在mpi4py中Iprobe是否有超时,以及如何设置超时时间?Iprobe是否有其他替代品在这里扮演同样的角色

下面是通过.send发送消息的上一个函数

def execTask(self, task):
    """Wrapper function calling mapping/reducing/finalizing phase tasks,
    dispatch tasks to workers until all finished and collect feedback. 
    Faulty workers are removed from active duty work list.
    """
    atimer = Timer(task)
    print( 'Entering {0:s} phase...'.format(task) )

    taskDict = { 'Map':(self.mapIn, MAP_START, MAP_FINISH), \
            'Init':(self.mapIn, INIT_START, MAP_FINISH), \
            'Reduce':(self.reduceIn, REDUCE_START, REDUCE_FINISH) }

    # line up jobs and workers into priority queues
    jobs = taskDict[task][0][:]
    heapq.heapify(jobs); running = {}
    heapq.heapify(self.workers)

    while (jobs or running) and self.nActive > 0:
        # dispatch all jobs to all free workers
        while jobs and self.workers[0].isFree():
            job = heapq.heappop(jobs)
            worker = heapq.heappop(self.workers)
            world.send(job, dest=worker.id, tag=taskDict[task][1])
            print('hi')
            print job
            worker.setBusy(); heapq.heappush(self.workers, worker)
            running[job] = (time(), worker)
            if self.config['verbosity'] >= 6:
                print('Dispatching file '+os.path.basename(job)+' to worker '+str(worker.id))
            # if no more free workers, break
            if not self.workers[0].isFree(): break

        # wait for finishing workers as well as do cleaning
        self.wait(running, taskDict[task][2])
       # print running 
        self.clean(running, jobs)

    print( '{0:s} phase completed'.format(task) )
整个代码可以在这里看到:


真的只是挂断电话吗?它真的不应该这样做,因为IPROBE是一个无阻塞调用,不需要在网络上取得进展。你确定它不是在那里花很多时间,让你的痕迹看起来不好吗?如果是这样的话,你的问题是相应的消息没有被发送。是的,它挂在那里并被卡住了,我确信在地图完成1小时后,它就被卡住了,你有什么解决办法吗?@WesleyBland如果像你说的,消息没有被发送,解决办法是什么?@WesleyBland,你有什么建议吗?这可能意味着你没有发送信息。问题很少是图书馆行为不端。这可能是您的代码有问题。不幸的是,这个问题没有包含任何内容。我鼓励您阅读指南,以获得关于如何发布一个不太长的好代码示例的建议。
def execTask(self, task):
    """Wrapper function calling mapping/reducing/finalizing phase tasks,
    dispatch tasks to workers until all finished and collect feedback. 
    Faulty workers are removed from active duty work list.
    """
    atimer = Timer(task)
    print( 'Entering {0:s} phase...'.format(task) )

    taskDict = { 'Map':(self.mapIn, MAP_START, MAP_FINISH), \
            'Init':(self.mapIn, INIT_START, MAP_FINISH), \
            'Reduce':(self.reduceIn, REDUCE_START, REDUCE_FINISH) }

    # line up jobs and workers into priority queues
    jobs = taskDict[task][0][:]
    heapq.heapify(jobs); running = {}
    heapq.heapify(self.workers)

    while (jobs or running) and self.nActive > 0:
        # dispatch all jobs to all free workers
        while jobs and self.workers[0].isFree():
            job = heapq.heappop(jobs)
            worker = heapq.heappop(self.workers)
            world.send(job, dest=worker.id, tag=taskDict[task][1])
            print('hi')
            print job
            worker.setBusy(); heapq.heappush(self.workers, worker)
            running[job] = (time(), worker)
            if self.config['verbosity'] >= 6:
                print('Dispatching file '+os.path.basename(job)+' to worker '+str(worker.id))
            # if no more free workers, break
            if not self.workers[0].isFree(): break

        # wait for finishing workers as well as do cleaning
        self.wait(running, taskDict[task][2])
       # print running 
        self.clean(running, jobs)

    print( '{0:s} phase completed'.format(task) )