Java 纱线AppMaster请求容器不工作

Java 纱线AppMaster请求容器不工作,java,hadoop,distributed,yarn,Java,Hadoop,Distributed,Yarn,我正在运行一个具有8个vCore和8Gb总内存的本地纱线群集 工作流程如下所示: YanClient提交一个应用程序请求,启动容器中的AppMaster AppMaster启动,创建amRMClient和nmClient,将自身注册到RM,然后通过amRMClient.addContainerRequest为工作线程创建4个容器请求 即使有足够的可用资源,也不会分配容器(从不调用回调函数onContainersAllocated)。我试图检查nodemanager和resourcemanager

我正在运行一个具有8个vCore和8Gb总内存的本地纱线群集

工作流程如下所示:

  • YanClient提交一个应用程序请求,启动容器中的AppMaster

  • AppMaster启动,创建amRMClient和nmClient,将自身注册到RM,然后通过amRMClient.addContainerRequest为工作线程创建4个容器请求

  • 即使有足够的可用资源,也不会分配容器(从不调用回调函数onContainersAllocated)。我试图检查nodemanager和resourcemanager的日志,但没有看到任何与容器请求相关的行。我密切关注apache文档,不明白自己做错了什么

    以下是AppMaster代码供参考:

    @Override
    public void run() {
        Map<String, String> envs = System.getenv();
    
        String containerIdString = envs.get(ApplicationConstants.Environment.CONTAINER_ID.toString());
        if (containerIdString == null) {
            // container id should always be set in the env by the framework
            throw new IllegalArgumentException("ContainerId not set in the environment");
        }
        ContainerId containerId = ConverterUtils.toContainerId(containerIdString);
        ApplicationAttemptId appAttemptID = containerId.getApplicationAttemptId();
    
        LOG.info("Starting AppMaster Client...");
    
        YarnAMRMCallbackHandler amHandler = new YarnAMRMCallbackHandler(allocatedYarnContainers);
    
        // TODO: get heart-beet interval from config instead of 100 default value
        amClient = AMRMClientAsync.createAMRMClientAsync(1000, this);
        amClient.init(config);
        amClient.start();
    
        LOG.info("Starting AppMaster Client OK");
    
        //YarnNMCallbackHandler nmHandler = new YarnNMCallbackHandler();
        containerManager = NMClient.createNMClient();
        containerManager.init(config);
        containerManager.start();
    
        // Get port, ulr information. TODO: get tracking url
        String appMasterHostname = NetUtils.getHostname();
    
        String appMasterTrackingUrl = "/progress";
    
        // Register self with ResourceManager. This will start heart-beating to the RM
        RegisterApplicationMasterResponse response = null;
    
        LOG.info("Register AppMaster on: " + appMasterHostname + "...");
    
        try {
            response = amClient.registerApplicationMaster(appMasterHostname, 0, appMasterTrackingUrl);
        } catch (YarnException | IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
            return;
        }
    
        LOG.info("Register AppMaster OK");
    
        // Dump out information about cluster capability as seen by the resource manager
        int maxMem = response.getMaximumResourceCapability().getMemory();
        LOG.info("Max mem capabililty of resources in this cluster " + maxMem);
    
        int maxVCores = response.getMaximumResourceCapability().getVirtualCores();
        LOG.info("Max vcores capabililty of resources in this cluster " + maxVCores);
    
        containerMemory = Integer.parseInt(config.get(YarnConfig.YARN_CONTAINER_MEMORY_MB));
        containerCores = Integer.parseInt(config.get(YarnConfig.YARN_CONTAINER_CPU_CORES));
    
        // A resource ask cannot exceed the max.
        if (containerMemory > maxMem) {
          LOG.info("Container memory specified above max threshold of cluster."
              + " Using max value." + ", specified=" + containerMemory + ", max="
              + maxMem);
          containerMemory = maxMem;
        }
    
        if (containerCores > maxVCores) {
          LOG.info("Container virtual cores specified above max threshold of  cluster."
            + " Using max value." + ", specified=" + containerCores + ", max=" + maxVCores);
          containerCores = maxVCores;
        }
        List<Container> previousAMRunningContainers = response.getContainersFromPreviousAttempts();
        LOG.info("Received " + previousAMRunningContainers.size()
                + " previous AM's running containers on AM registration.");
    
    
        for (int i = 0; i < 4; ++i) {
            ContainerRequest containerAsk = setupContainerAskForRM();
            amClient.addContainerRequest(containerAsk); // NOTHING HAPPENS HERE...
            LOG.info("Available resources: " + amClient.getAvailableResources().toString());
        }
    
        while(completedYarnContainers != 4) {
            try {
                Thread.sleep(1000);
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
        }
    
        LOG.info("Done with allocation!");
    
    }
    
    @Override
    public void onContainersAllocated(List<Container> containers) {
        LOG.info("Got response from RM for container ask, allocatedCnt=" + containers.size());
    
        for (Container container : containers) {
            LOG.info("Allocated yarn container with id: {}" + container.getId());
            allocatedYarnContainers.push(container);
    
            // TODO: Launch the container in a thread
        }
    }
    
    @Override
    public void onError(Throwable error) {
        LOG.error(error.getMessage());
    }
    
    @Override
    public float getProgress() {
        return (float) completedYarnContainers / allocatedYarnContainers.size();
    }
    
    下面是初始化的AppMaster日志和4个容器请求:

    23:47:09 YarnAppMaster - Starting AppMaster Client OK
    23:47:09 YarnAppMaster - Register AppMaster on: andrei-mbp.local/192.168.1.4...
    23:47:09 YarnAppMaster - Register AppMaster OK
    23:47:09 YarnAppMaster - Max mem capabililty of resources in this cluster 2048
    23:47:09 YarnAppMaster - Max vcores capabililty of resources in this cluster 2
    23:47:09 YarnAppMaster - Received 0 previous AM's running containers on AM registration.
    23:47:11 YarnAppMaster - Requested container ask: Capability[<memory:512, vCores:1>]Priority[0]
    23:47:11 YarnAppMaster - Available resources: <memory:7680, vCores:0>
    23:47:11 YarnAppMaster - Requested container ask: Capability[<memory:512, vCores:1>]Priority[0]
    23:47:11 YarnAppMaster - Available resources: <memory:7680, vCores:0>
    23:47:11 YarnAppMaster - Requested container ask: Capability[<memory:512, vCores:1>]Priority[0]
    23:47:11 YarnAppMaster - Available resources: <memory:7680, vCores:0>
    23:47:11 YarnAppMaster - Requested container ask: Capability[<memory:512, vCores:1>]Priority[0]
    23:47:11 YarnAppMaster - Available resources: <memory:7680, vCores:0>
    23:47:11 YarnAppMaster - Progress indicator should not be negative
    
    23:47:09 YarnapMaster-启动AppMaster客户端正常
    23:47:09 YarnapMaster-在以下位置注册AppMaster:andrei mbp.local/192.168.1.4。。。
    23:47:09 YarnapMaster-注册AppMaster正常
    23:47:09 YarnapMaster-此群集中资源的最大内存容量2048
    23:47:09 YarnapMaster-最大vcores此集群2中资源的能力2
    23:47:09 YarnapMaster-在AM注册时收到0个以前AM的运行容器。
    23:47:11 YarnapMaster-请求的容器请求:能力[]优先级[0]
    23:47:11 YarnapMaster-可用资源:
    23:47:11 YarnapMaster-请求的容器请求:能力[]优先级[0]
    23:47:11 YarnapMaster-可用资源:
    23:47:11 YarnapMaster-请求的容器请求:能力[]优先级[0]
    23:47:11 YarnapMaster-可用资源:
    23:47:11 YarnapMaster-请求的容器请求:能力[]优先级[0]
    23:47:11 YarnapMaster-可用资源:
    23:47:11 YarnapMaster-进度指示器不应为负值
    
    提前感谢。

    感谢您指出,在第一次分配之前调用getProgress()时,它会返回一个NaN以进行零除,这使得ResourceManager会立即退出,但出现异常


    请阅读更多相关信息。

    我怀疑问题恰恰来自负面进展:

    23:47:11 YarnAppMaster - Progress indicator should not be negative
    (注意:复制到StackOverflow以备将来使用)

    我自己在这里添加了答案,以防以后有人遇到同样的问题,并且我的站点不可用/链接格式已更改。 23:47:11 YarnAppMaster - Progress indicator should not be negative
    @Override
    public float getProgress() {
        return (float) allocatedYarnContainers.size() / 4.0f;
    }