Gridgain 使用GridJobStealingCollisionSpi时未处理GridComputeExecutionRejectedException

Gridgain 使用GridJobStealingCollisionSpi时未处理GridComputeExecutionRejectedException,gridgain,Gridgain,我已经成功地使用GridGain三年多了,除了一些颠簸之外,它工作得非常顺利。至少我一直能够找出哪里出了问题(也得益于非常可靠的文档和示例)。嗯,直到现在 在我的一个项目中,我试图在GridGain 6.5.0支持的计算网格中实现工作窃取。不过,配置进行得很顺利,我不时会遇到GridComputeExecutionRejectedException,它会一直冒泡到客户端。奇怪的是,GridComputeExecutionRejectedException应该由标准GridComputeTaskA

我已经成功地使用GridGain三年多了,除了一些颠簸之外,它工作得非常顺利。至少我一直能够找出哪里出了问题(也得益于非常可靠的文档和示例)。嗯,直到现在

在我的一个项目中,我试图在GridGain 6.5.0支持的计算网格中实现工作窃取。不过,配置进行得很顺利,我不时会遇到GridComputeExecutionRejectedException,它会一直冒泡到客户端。奇怪的是,GridComputeExecutionRejectedException应该由标准GridComputeTaskAdapter(我扩展了它)的结果方法中提供的故障转移策略检测和路由:

我还发现,在GridJobStealingCollisionSpi中负责激活作业的代码段有一条注释“我们还需要确保作业没有被另一个线程拒绝”。注释中描述的场景是否确实发生了?(我知道代码中有一个同步块应该可以防止这种情况发生。)

无论如何,我将非常感谢任何帮助

我的配置文件如下:

<beans xmlns="http://www.springframework.org/schema/beans"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xmlns:util="http://www.springframework.org/schema/util"
       xsi:schemaLocation="
        http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-3.1.xsd
    http://www.springframework.org/schema/util http://www.springframework.org/schema/util/spring-util-3.1.xsd">

    <bean id="grid.cfg" class="org.gridgain.grid.GridConfiguration">

        <property name="marshaller">
            <bean class="org.gridgain.grid.marshaller.optimized.GridOptimizedMarshaller">
                <property name="requireSerializable" value="false"/>
            </bean>
        </property>

        <property name="includeEventTypes">
            <util:constant static-field="org.gridgain.grid.events.GridEventType.EVTS_TASK_EXECUTION"/>
        </property>

        <property name="discoverySpi">
            <bean class="org.gridgain.grid.spi.discovery.tcp.GridTcpDiscoverySpi">
                <property name="ipFinder">
            <bean class="org.gridgain.grid.spi.discovery.tcp.ipfinder.sharedfs.GridTcpDiscoverySharedFsIpFinder"/>
                </property>
            </bean>
        </property>

    <property name="loadBalancingSpi">
        <bean class="org.gridgain.grid.spi.loadbalancing.adaptive.GridAdaptiveLoadBalancingSpi">
            <property name="loadProbe">
                <bean class="org.gridgain.grid.spi.loadbalancing.adaptive.GridAdaptiveProcessingTimeLoadProbe"/> 
            </property>
        </bean>
    </property>

    <property name="collisionSpi">
        <bean class="org.gridgain.grid.spi.collision.jobstealing.GridJobStealingCollisionSpi">
            <property name="activeJobsThreshold" value="28"/>
            <property name="waitJobsThreshold" value="0"/>
                <property name="messageExpireTime" value="3000"/>
                <property name="maximumStealingAttempts" value="5"/>
                <property name="stealingEnabled" value="true"/>
            </bean>
        </property>

    <property name="failoverSpi">
        <bean class="org.gridgain.grid.spi.failover.jobstealing.GridJobStealingFailoverSpi">
            <property name="maximumFailoverAttempts" value="5"/>
        </bean>
        </property>

        <property name="swapSpaceSpi">
            <bean class="org.gridgain.grid.spi.swapspace.noop.GridNoopSwapSpaceSpi"/>
        </property>
    </bean>
</beans>

彼得,你能把你的任务实现代码添加到问题中吗?刚刚添加了我的核心任务类。我觉得这很标准。正如我所提到的,只有在使用作业窃取时才会出现未受影响的GridComputeExecutionRejectedException。能否将断点设置为java.lang.Throwable#printStackTrace(PrintStreamOrWriter)以检测此异常打印到控制台的位置?看起来这个异常没有被抛出,它被创建并放到GridComputeJobResult,但是有人将它打印到控制台。由于这是一个生产问题,我无法真正设置断点,但是,我能够捕获并记录完整的异常(见我的编辑)。此外,我还意识到我将activeJobsThreshold设置为28,这不等于线程总数(默认为100)。这可能是个问题吗?请查看stacktrace之前的日志,是否有任何警告或错误?
014-10-26 23:57:33,190 [http-bio-8080-exec-13] ERROR errors.GrailsExceptionResolver  - GridComputeExecutionRejectedException occurred when processing request: [POST] /evoRun/runEvolution
Job was cancelled before execution [jobSes=GridJobSessionImpl [ses=GridTaskSessionImpl [taskName=edu.banda.coel.server.grid.GridCollectionTask, dep=LocalDeployment [super=GridDeployment [ts=1414392425356, depMode=SHARED, clsLdr=sun.misc.Launcher$AppClassLoader@2e2e1b6c, clsLdrId=4faab505941-ea582293-39ba-4648-9022-596e6626954b, userVer=0, loc=true, sampleClsName=java.lang.String, pendingUndeploy=false, undeployed=false, usage=0]], taskClsName=edu.banda.coel.server.grid.GridCollectionTask, sesId=7f4e9505941-b2e9befc-051f-4e17-ba8d-bbafbe9cd7a3, startTime=1414392785621, endTime=9223372036854775807, taskNodeId=b2e9befc-051f-4e17-ba8d-bbafbe9cd7a3, clsLdr=sun.misc.Launcher$AppClassLoader@2e2e1b6c, closed=false, cpSpi=null, failSpi=null, loadSpi=null, usage=1, fullSup=false, subjId=b2e9befc-051f-4e17-ba8d-bbafbe9cd7a3], jobId=55ee9505941-8522cc8b-10fb-4afd-945f-caa0e0c561f0], job=edu.banda.coel.server.grid.GridCollectionInputTask$1@380042f5]
For more information see:
    Troubleshooting:      http://bit.ly/GridGain-Troubleshooting
    Documentation Center: http://bit.ly/GridGain-Documentation
. Stacktrace follows:
class org.gridgain.grid.compute.GridComputeExecutionRejectedException: Job was cancelled before execution [jobSes=GridJobSessionImpl [ses=GridTaskSessionImpl [taskName=edu.banda.coel.server.grid.GridCollectionTask, dep=LocalDeployment [super=GridDeployment [ts=1414392425356, depMode=SHARED, clsLdr=sun.misc.Launcher$AppClassLoader@2e2e1b6c, clsLdrId=4faab505941-ea582293-39ba-4648-9022-596e6626954b, userVer=0, loc=true, sampleClsName=java.lang.String, pendingUndeploy=false, undeployed=false, usage=0]], taskClsName=edu.banda.coel.server.grid.GridCollectionTask, sesId=7f4e9505941-b2e9befc-051f-4e17-ba8d-bbafbe9cd7a3, startTime=1414392785621, endTime=9223372036854775807, taskNodeId=b2e9befc-051f-4e17-ba8d-bbafbe9cd7a3, clsLdr=sun.misc.Launcher$AppClassLoader@2e2e1b6c, closed=false, cpSpi=null, failSpi=null, loadSpi=null, usage=1, fullSup=false, subjId=b2e9befc-051f-4e17-ba8d-bbafbe9cd7a3], jobId=55ee9505941-8522cc8b-10fb-4afd-945f-caa0e0c561f0], job=edu.banda.coel.server.grid.GridCollectionInputTask$1@380042f5]
For more information see:
    Troubleshooting:      http://bit.ly/GridGain-Troubleshooting
    Documentation Center: http://bit.ly/GridGain-Documentation

    at org.gridgain.grid.kernal.processors.job.GridJobProcessor.onBeforeActivateJob(GridJobProcessor.java:1190)
    at org.gridgain.grid.kernal.processors.job.GridJobProcessor.access$1500(GridJobProcessor.java:62)
    at org.gridgain.grid.kernal.processors.job.GridJobProcessor$CollisionJobContext.activate(GridJobProcessor.java:1469)
    at org.gridgain.grid.spi.collision.jobstealing.GridJobStealingCollisionSpi.checkBusy(GridJobStealingCollisionSpi.java:640)
    at org.gridgain.grid.spi.collision.jobstealing.GridJobStealingCollisionSpi.onCollision(GridJobStealingCollisionSpi.java:589)
    at org.gridgain.grid.kernal.managers.collision.GridCollisionManager.onCollision(GridCollisionManager.java:124)
    at org.gridgain.grid.kernal.processors.job.GridJobProcessor.handleCollisions(GridJobProcessor.java:669)
    at org.gridgain.grid.kernal.processors.job.GridJobProcessor.processJobExecuteRequest(GridJobProcessor.java:1089)
    at org.gridgain.grid.kernal.processors.job.GridJobProcessor$JobExecutionListener.onMessage(GridJobProcessor.java:1732)
    at org.gridgain.grid.kernal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:654)
    at org.gridgain.grid.kernal.managers.communication.GridIoManager.access$1800(GridIoManager.java:62)
    at org.gridgain.grid.kernal.managers.communication.GridIoManager$6.body(GridIoManager.java:615)
    at org.gridgain.grid.util.worker.GridWorker.run(GridWorker.java:151)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
<beans xmlns="http://www.springframework.org/schema/beans"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xmlns:util="http://www.springframework.org/schema/util"
       xsi:schemaLocation="
        http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-3.1.xsd
    http://www.springframework.org/schema/util http://www.springframework.org/schema/util/spring-util-3.1.xsd">

    <bean id="grid.cfg" class="org.gridgain.grid.GridConfiguration">

        <property name="marshaller">
            <bean class="org.gridgain.grid.marshaller.optimized.GridOptimizedMarshaller">
                <property name="requireSerializable" value="false"/>
            </bean>
        </property>

        <property name="includeEventTypes">
            <util:constant static-field="org.gridgain.grid.events.GridEventType.EVTS_TASK_EXECUTION"/>
        </property>

        <property name="discoverySpi">
            <bean class="org.gridgain.grid.spi.discovery.tcp.GridTcpDiscoverySpi">
                <property name="ipFinder">
            <bean class="org.gridgain.grid.spi.discovery.tcp.ipfinder.sharedfs.GridTcpDiscoverySharedFsIpFinder"/>
                </property>
            </bean>
        </property>

    <property name="loadBalancingSpi">
        <bean class="org.gridgain.grid.spi.loadbalancing.adaptive.GridAdaptiveLoadBalancingSpi">
            <property name="loadProbe">
                <bean class="org.gridgain.grid.spi.loadbalancing.adaptive.GridAdaptiveProcessingTimeLoadProbe"/> 
            </property>
        </bean>
    </property>

    <property name="collisionSpi">
        <bean class="org.gridgain.grid.spi.collision.jobstealing.GridJobStealingCollisionSpi">
            <property name="activeJobsThreshold" value="28"/>
            <property name="waitJobsThreshold" value="0"/>
                <property name="messageExpireTime" value="3000"/>
                <property name="maximumStealingAttempts" value="5"/>
                <property name="stealingEnabled" value="true"/>
            </bean>
        </property>

    <property name="failoverSpi">
        <bean class="org.gridgain.grid.spi.failover.jobstealing.GridJobStealingFailoverSpi">
            <property name="maximumFailoverAttempts" value="5"/>
        </bean>
        </property>

        <property name="swapSpaceSpi">
            <bean class="org.gridgain.grid.spi.swapspace.noop.GridNoopSwapSpaceSpi"/>
        </property>
    </bean>
</beans>
public abstract class GridCollectionInputTask<IN,OUT,JOB_OUT> extends GridComputeTaskSplitAdapter<Collection<IN>, OUT> {

    /** Auto-injected grid logger. */
    @GridLoggerResource
    private GridLogger log = null;

    private final ArgumentCallable<IN,JOB_OUT> callable;

    public GridCollectionInputTask(ArgumentCallable<IN,JOB_OUT> callable) {
        this.callable = callable;
    }

    @Override
    protected Collection<? extends GridComputeJob> split(int gridSize, Collection<IN> inputs) throws GridException {
      List<GridComputeJob> jobs = new ArrayList<GridComputeJob>(inputs.size());

      for (IN input : inputs) {
          jobs.add(new GridComputeJobAdapter(input) {

            @SuppressWarnings("unchecked")
            @Override
            public JOB_OUT execute() {
                return callable.call((IN) argument(0));
              }
          });
      }
      return jobs;
    }

    @Override
    public OUT reduce(List<GridComputeJobResult> results) throws GridException {
        Collection<JOB_OUT> jobResults = new ArrayList<JOB_OUT>();
        for (GridComputeJobResult res : results)
            jobResults.add((JOB_OUT) res.getData());
        return createTaskOutput(jobResults);
    }

    protected abstract OUT createTaskOutput(Collection<JOB_OUT> jobResults);
}
2014-10-29 19:43:07,896 [http-bio-8080-exec-32] ERROR impl.EvolutionServiceImpl  - Evolution run failed!
edu.banda.coel.CoelRuntimeException: 'GridFitnessEvaluatorBOTaskAdapter' failed on grid.
    at edu.banda.coel.server.grid.ComputationalGrid.runOnGridSync(ComputationalGrid.java:231)
        ...
    at edu.banda.coel.server.service.impl.EvolutionServiceImpl.evolve(EvolutionServiceImpl.java:125)
    at com.banda.math.domain.evo.EvoRunController.runEvolution(EvoRunController.groovy:119)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: class org.gridgain.grid.GridTopologyException: Failed to failover a job to another node (failover SPI returned null) [job=edu.banda.coel.server.grid.GridCollectionInputTask$1@47ba5075, node=GridTcpDiscoveryNode [id=368ffe13-76c7-42f6-9339-a34c772c0931, addrs=[xxx.xxx.xxx.xxx, 127.0.0.1], sockAddrs=[xxx.xxx.xxx.xxx/xxx.xxx.xxx.xxx:47500, /xxx.xxx.xxx.xxx:47500, /127.0.0.1:47500], discPort=47500, order=24, loc=false, ver=6.5.0#20140925-sha1:48190079]]
    at org.gridgain.grid.kernal.processors.task.GridTaskWorker.failover(GridTaskWorker.java:984)
    at org.gridgain.grid.kernal.processors.task.GridTaskWorker.onResponse(GridTaskWorker.java:757)
    at org.gridgain.grid.kernal.processors.task.GridTaskProcessor.processJobExecuteResponse(GridTaskProcessor.java:906)
    at org.gridgain.grid.kernal.processors.task.GridTaskProcessor$JobMessageListener.onMessage(GridTaskProcessor.java:1138)
    at org.gridgain.grid.kernal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:654)
    at org.gridgain.grid.kernal.managers.communication.GridIoManager.access$1800(GridIoManager.java:62)
    at org.gridgain.grid.kernal.managers.communication.GridIoManager$6.body(GridIoManager.java:615)
    at org.gridgain.grid.util.worker.GridWorker.run(GridWorker.java:151)
    ... 3 more
Caused by: class org.gridgain.grid.compute.GridComputeExecutionRejectedException: Job was cancelled before execution [jobSes=GridJobSessionImpl [ses=GridTaskSessionImpl [taskName=edu.banda.coel.server.grid.GridCollectionTask, dep=LocalDeployment [super=GridDeployment [ts=1414636288878, depMode=SHARED, clsLdr=sun.misc.Launcher$AppClassLoader@684be8b8, clsLdrId=3bab4ee5941-368ffe13-76c7-42f6-9339-a34c772c0931, userVer=0, loc=true, sampleClsName=java.lang.String, pendingUndeploy=false, undeployed=false, usage=0]], taskClsName=edu.banda.coel.server.grid.GridCollectionTask, sesId=cc04ede5941-e05a00ce-2864-46a8-bf7c-4452f2a6d46e, startTime=1414636742023, endTime=9223372036854775807, taskNodeId=e05a00ce-2864-46a8-bf7c-4452f2a6d46e, clsLdr=sun.misc.Launcher$AppClassLoader@684be8b8, closed=false, cpSpi=null, failSpi=null, loadSpi=null, usage=1, fullSup=false, subjId=e05a00ce-2864-46a8-bf7c-4452f2a6d46e], jobId=21b4ede5941-368ffe13-76c7-42f6-9339-a34c772c0931], job=edu.banda.coel.server.grid.GridCollectionInputTask$1@1886b071]
For more information see:
    Troubleshooting:      http://bit.ly/GridGain-Troubleshooting
    Documentation Center: http://bit.ly/GridGain-Documentation

    at org.gridgain.grid.kernal.processors.job.GridJobProcessor.onBeforeActivateJob(GridJobProcessor.java:1190)
    at org.gridgain.grid.kernal.processors.job.GridJobProcessor.access$1500(GridJobProcessor.java:62)
    at org.gridgain.grid.kernal.processors.job.GridJobProcessor$CollisionJobContext.activate(GridJobProcessor.java:1469)
    at org.gridgain.grid.spi.collision.jobstealing.GridJobStealingCollisionSpi.checkBusy(GridJobStealingCollisionSpi.java:640)
    at org.gridgain.grid.spi.collision.jobstealing.GridJobStealingCollisionSpi.onCollision(GridJobStealingCollisionSpi.java:589)
    at org.gridgain.grid.kernal.managers.collision.GridCollisionManager.onCollision(GridCollisionManager.java:124)
    at org.gridgain.grid.kernal.processors.job.GridJobProcessor.handleCollisions(GridJobProcessor.java:669)
    at org.gridgain.grid.kernal.processors.job.GridJobProcessor.access$3000(GridJobProcessor.java:62)
    at org.gridgain.grid.kernal.processors.job.GridJobProcessor$JobEventListener.onJobFinished(GridJobProcessor.java:1636)
    at org.gridgain.grid.kernal.processors.job.GridJobWorker.finishJob(GridJobWorker.java:807)
    at org.gridgain.grid.kernal.processors.job.GridJobWorker.execute0(GridJobWorker.java:533)
    at org.gridgain.grid.kernal.processors.job.GridJobWorker.body(GridJobWorker.java:429)
    ... 4 more