Gridgain 使用GridJobStealingCollisionSpi时未处理GridComputeExecutionRejectedException
我已经成功地使用GridGain三年多了,除了一些颠簸之外,它工作得非常顺利。至少我一直能够找出哪里出了问题(也得益于非常可靠的文档和示例)。嗯,直到现在 在我的一个项目中,我试图在GridGain 6.5.0支持的计算网格中实现工作窃取。不过,配置进行得很顺利,我不时会遇到GridComputeExecutionRejectedException,它会一直冒泡到客户端。奇怪的是,GridComputeExecutionRejectedException应该由标准GridComputeTaskAdapter(我扩展了它)的结果方法中提供的故障转移策略检测和路由: 我还发现,在GridJobStealingCollisionSpi中负责激活作业的代码段有一条注释“我们还需要确保作业没有被另一个线程拒绝”。注释中描述的场景是否确实发生了?(我知道代码中有一个同步块应该可以防止这种情况发生。) 无论如何,我将非常感谢任何帮助 我的配置文件如下:Gridgain 使用GridJobStealingCollisionSpi时未处理GridComputeExecutionRejectedException,gridgain,Gridgain,我已经成功地使用GridGain三年多了,除了一些颠簸之外,它工作得非常顺利。至少我一直能够找出哪里出了问题(也得益于非常可靠的文档和示例)。嗯,直到现在 在我的一个项目中,我试图在GridGain 6.5.0支持的计算网格中实现工作窃取。不过,配置进行得很顺利,我不时会遇到GridComputeExecutionRejectedException,它会一直冒泡到客户端。奇怪的是,GridComputeExecutionRejectedException应该由标准GridComputeTaskA
<beans xmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:util="http://www.springframework.org/schema/util"
xsi:schemaLocation="
http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-3.1.xsd
http://www.springframework.org/schema/util http://www.springframework.org/schema/util/spring-util-3.1.xsd">
<bean id="grid.cfg" class="org.gridgain.grid.GridConfiguration">
<property name="marshaller">
<bean class="org.gridgain.grid.marshaller.optimized.GridOptimizedMarshaller">
<property name="requireSerializable" value="false"/>
</bean>
</property>
<property name="includeEventTypes">
<util:constant static-field="org.gridgain.grid.events.GridEventType.EVTS_TASK_EXECUTION"/>
</property>
<property name="discoverySpi">
<bean class="org.gridgain.grid.spi.discovery.tcp.GridTcpDiscoverySpi">
<property name="ipFinder">
<bean class="org.gridgain.grid.spi.discovery.tcp.ipfinder.sharedfs.GridTcpDiscoverySharedFsIpFinder"/>
</property>
</bean>
</property>
<property name="loadBalancingSpi">
<bean class="org.gridgain.grid.spi.loadbalancing.adaptive.GridAdaptiveLoadBalancingSpi">
<property name="loadProbe">
<bean class="org.gridgain.grid.spi.loadbalancing.adaptive.GridAdaptiveProcessingTimeLoadProbe"/>
</property>
</bean>
</property>
<property name="collisionSpi">
<bean class="org.gridgain.grid.spi.collision.jobstealing.GridJobStealingCollisionSpi">
<property name="activeJobsThreshold" value="28"/>
<property name="waitJobsThreshold" value="0"/>
<property name="messageExpireTime" value="3000"/>
<property name="maximumStealingAttempts" value="5"/>
<property name="stealingEnabled" value="true"/>
</bean>
</property>
<property name="failoverSpi">
<bean class="org.gridgain.grid.spi.failover.jobstealing.GridJobStealingFailoverSpi">
<property name="maximumFailoverAttempts" value="5"/>
</bean>
</property>
<property name="swapSpaceSpi">
<bean class="org.gridgain.grid.spi.swapspace.noop.GridNoopSwapSpaceSpi"/>
</property>
</bean>
</beans>
彼得,你能把你的任务实现代码添加到问题中吗?刚刚添加了我的核心任务类。我觉得这很标准。正如我所提到的,只有在使用作业窃取时才会出现未受影响的GridComputeExecutionRejectedException。能否将断点设置为java.lang.Throwable#printStackTrace(PrintStreamOrWriter)以检测此异常打印到控制台的位置?看起来这个异常没有被抛出,它被创建并放到GridComputeJobResult,但是有人将它打印到控制台。由于这是一个生产问题,我无法真正设置断点,但是,我能够捕获并记录完整的异常(见我的编辑)。此外,我还意识到我将activeJobsThreshold设置为28,这不等于线程总数(默认为100)。这可能是个问题吗?请查看stacktrace之前的日志,是否有任何警告或错误?
014-10-26 23:57:33,190 [http-bio-8080-exec-13] ERROR errors.GrailsExceptionResolver - GridComputeExecutionRejectedException occurred when processing request: [POST] /evoRun/runEvolution
Job was cancelled before execution [jobSes=GridJobSessionImpl [ses=GridTaskSessionImpl [taskName=edu.banda.coel.server.grid.GridCollectionTask, dep=LocalDeployment [super=GridDeployment [ts=1414392425356, depMode=SHARED, clsLdr=sun.misc.Launcher$AppClassLoader@2e2e1b6c, clsLdrId=4faab505941-ea582293-39ba-4648-9022-596e6626954b, userVer=0, loc=true, sampleClsName=java.lang.String, pendingUndeploy=false, undeployed=false, usage=0]], taskClsName=edu.banda.coel.server.grid.GridCollectionTask, sesId=7f4e9505941-b2e9befc-051f-4e17-ba8d-bbafbe9cd7a3, startTime=1414392785621, endTime=9223372036854775807, taskNodeId=b2e9befc-051f-4e17-ba8d-bbafbe9cd7a3, clsLdr=sun.misc.Launcher$AppClassLoader@2e2e1b6c, closed=false, cpSpi=null, failSpi=null, loadSpi=null, usage=1, fullSup=false, subjId=b2e9befc-051f-4e17-ba8d-bbafbe9cd7a3], jobId=55ee9505941-8522cc8b-10fb-4afd-945f-caa0e0c561f0], job=edu.banda.coel.server.grid.GridCollectionInputTask$1@380042f5]
For more information see:
Troubleshooting: http://bit.ly/GridGain-Troubleshooting
Documentation Center: http://bit.ly/GridGain-Documentation
. Stacktrace follows:
class org.gridgain.grid.compute.GridComputeExecutionRejectedException: Job was cancelled before execution [jobSes=GridJobSessionImpl [ses=GridTaskSessionImpl [taskName=edu.banda.coel.server.grid.GridCollectionTask, dep=LocalDeployment [super=GridDeployment [ts=1414392425356, depMode=SHARED, clsLdr=sun.misc.Launcher$AppClassLoader@2e2e1b6c, clsLdrId=4faab505941-ea582293-39ba-4648-9022-596e6626954b, userVer=0, loc=true, sampleClsName=java.lang.String, pendingUndeploy=false, undeployed=false, usage=0]], taskClsName=edu.banda.coel.server.grid.GridCollectionTask, sesId=7f4e9505941-b2e9befc-051f-4e17-ba8d-bbafbe9cd7a3, startTime=1414392785621, endTime=9223372036854775807, taskNodeId=b2e9befc-051f-4e17-ba8d-bbafbe9cd7a3, clsLdr=sun.misc.Launcher$AppClassLoader@2e2e1b6c, closed=false, cpSpi=null, failSpi=null, loadSpi=null, usage=1, fullSup=false, subjId=b2e9befc-051f-4e17-ba8d-bbafbe9cd7a3], jobId=55ee9505941-8522cc8b-10fb-4afd-945f-caa0e0c561f0], job=edu.banda.coel.server.grid.GridCollectionInputTask$1@380042f5]
For more information see:
Troubleshooting: http://bit.ly/GridGain-Troubleshooting
Documentation Center: http://bit.ly/GridGain-Documentation
at org.gridgain.grid.kernal.processors.job.GridJobProcessor.onBeforeActivateJob(GridJobProcessor.java:1190)
at org.gridgain.grid.kernal.processors.job.GridJobProcessor.access$1500(GridJobProcessor.java:62)
at org.gridgain.grid.kernal.processors.job.GridJobProcessor$CollisionJobContext.activate(GridJobProcessor.java:1469)
at org.gridgain.grid.spi.collision.jobstealing.GridJobStealingCollisionSpi.checkBusy(GridJobStealingCollisionSpi.java:640)
at org.gridgain.grid.spi.collision.jobstealing.GridJobStealingCollisionSpi.onCollision(GridJobStealingCollisionSpi.java:589)
at org.gridgain.grid.kernal.managers.collision.GridCollisionManager.onCollision(GridCollisionManager.java:124)
at org.gridgain.grid.kernal.processors.job.GridJobProcessor.handleCollisions(GridJobProcessor.java:669)
at org.gridgain.grid.kernal.processors.job.GridJobProcessor.processJobExecuteRequest(GridJobProcessor.java:1089)
at org.gridgain.grid.kernal.processors.job.GridJobProcessor$JobExecutionListener.onMessage(GridJobProcessor.java:1732)
at org.gridgain.grid.kernal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:654)
at org.gridgain.grid.kernal.managers.communication.GridIoManager.access$1800(GridIoManager.java:62)
at org.gridgain.grid.kernal.managers.communication.GridIoManager$6.body(GridIoManager.java:615)
at org.gridgain.grid.util.worker.GridWorker.run(GridWorker.java:151)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
<beans xmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:util="http://www.springframework.org/schema/util"
xsi:schemaLocation="
http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-3.1.xsd
http://www.springframework.org/schema/util http://www.springframework.org/schema/util/spring-util-3.1.xsd">
<bean id="grid.cfg" class="org.gridgain.grid.GridConfiguration">
<property name="marshaller">
<bean class="org.gridgain.grid.marshaller.optimized.GridOptimizedMarshaller">
<property name="requireSerializable" value="false"/>
</bean>
</property>
<property name="includeEventTypes">
<util:constant static-field="org.gridgain.grid.events.GridEventType.EVTS_TASK_EXECUTION"/>
</property>
<property name="discoverySpi">
<bean class="org.gridgain.grid.spi.discovery.tcp.GridTcpDiscoverySpi">
<property name="ipFinder">
<bean class="org.gridgain.grid.spi.discovery.tcp.ipfinder.sharedfs.GridTcpDiscoverySharedFsIpFinder"/>
</property>
</bean>
</property>
<property name="loadBalancingSpi">
<bean class="org.gridgain.grid.spi.loadbalancing.adaptive.GridAdaptiveLoadBalancingSpi">
<property name="loadProbe">
<bean class="org.gridgain.grid.spi.loadbalancing.adaptive.GridAdaptiveProcessingTimeLoadProbe"/>
</property>
</bean>
</property>
<property name="collisionSpi">
<bean class="org.gridgain.grid.spi.collision.jobstealing.GridJobStealingCollisionSpi">
<property name="activeJobsThreshold" value="28"/>
<property name="waitJobsThreshold" value="0"/>
<property name="messageExpireTime" value="3000"/>
<property name="maximumStealingAttempts" value="5"/>
<property name="stealingEnabled" value="true"/>
</bean>
</property>
<property name="failoverSpi">
<bean class="org.gridgain.grid.spi.failover.jobstealing.GridJobStealingFailoverSpi">
<property name="maximumFailoverAttempts" value="5"/>
</bean>
</property>
<property name="swapSpaceSpi">
<bean class="org.gridgain.grid.spi.swapspace.noop.GridNoopSwapSpaceSpi"/>
</property>
</bean>
</beans>
public abstract class GridCollectionInputTask<IN,OUT,JOB_OUT> extends GridComputeTaskSplitAdapter<Collection<IN>, OUT> {
/** Auto-injected grid logger. */
@GridLoggerResource
private GridLogger log = null;
private final ArgumentCallable<IN,JOB_OUT> callable;
public GridCollectionInputTask(ArgumentCallable<IN,JOB_OUT> callable) {
this.callable = callable;
}
@Override
protected Collection<? extends GridComputeJob> split(int gridSize, Collection<IN> inputs) throws GridException {
List<GridComputeJob> jobs = new ArrayList<GridComputeJob>(inputs.size());
for (IN input : inputs) {
jobs.add(new GridComputeJobAdapter(input) {
@SuppressWarnings("unchecked")
@Override
public JOB_OUT execute() {
return callable.call((IN) argument(0));
}
});
}
return jobs;
}
@Override
public OUT reduce(List<GridComputeJobResult> results) throws GridException {
Collection<JOB_OUT> jobResults = new ArrayList<JOB_OUT>();
for (GridComputeJobResult res : results)
jobResults.add((JOB_OUT) res.getData());
return createTaskOutput(jobResults);
}
protected abstract OUT createTaskOutput(Collection<JOB_OUT> jobResults);
}
2014-10-29 19:43:07,896 [http-bio-8080-exec-32] ERROR impl.EvolutionServiceImpl - Evolution run failed!
edu.banda.coel.CoelRuntimeException: 'GridFitnessEvaluatorBOTaskAdapter' failed on grid.
at edu.banda.coel.server.grid.ComputationalGrid.runOnGridSync(ComputationalGrid.java:231)
...
at edu.banda.coel.server.service.impl.EvolutionServiceImpl.evolve(EvolutionServiceImpl.java:125)
at com.banda.math.domain.evo.EvoRunController.runEvolution(EvoRunController.groovy:119)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: class org.gridgain.grid.GridTopologyException: Failed to failover a job to another node (failover SPI returned null) [job=edu.banda.coel.server.grid.GridCollectionInputTask$1@47ba5075, node=GridTcpDiscoveryNode [id=368ffe13-76c7-42f6-9339-a34c772c0931, addrs=[xxx.xxx.xxx.xxx, 127.0.0.1], sockAddrs=[xxx.xxx.xxx.xxx/xxx.xxx.xxx.xxx:47500, /xxx.xxx.xxx.xxx:47500, /127.0.0.1:47500], discPort=47500, order=24, loc=false, ver=6.5.0#20140925-sha1:48190079]]
at org.gridgain.grid.kernal.processors.task.GridTaskWorker.failover(GridTaskWorker.java:984)
at org.gridgain.grid.kernal.processors.task.GridTaskWorker.onResponse(GridTaskWorker.java:757)
at org.gridgain.grid.kernal.processors.task.GridTaskProcessor.processJobExecuteResponse(GridTaskProcessor.java:906)
at org.gridgain.grid.kernal.processors.task.GridTaskProcessor$JobMessageListener.onMessage(GridTaskProcessor.java:1138)
at org.gridgain.grid.kernal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:654)
at org.gridgain.grid.kernal.managers.communication.GridIoManager.access$1800(GridIoManager.java:62)
at org.gridgain.grid.kernal.managers.communication.GridIoManager$6.body(GridIoManager.java:615)
at org.gridgain.grid.util.worker.GridWorker.run(GridWorker.java:151)
... 3 more
Caused by: class org.gridgain.grid.compute.GridComputeExecutionRejectedException: Job was cancelled before execution [jobSes=GridJobSessionImpl [ses=GridTaskSessionImpl [taskName=edu.banda.coel.server.grid.GridCollectionTask, dep=LocalDeployment [super=GridDeployment [ts=1414636288878, depMode=SHARED, clsLdr=sun.misc.Launcher$AppClassLoader@684be8b8, clsLdrId=3bab4ee5941-368ffe13-76c7-42f6-9339-a34c772c0931, userVer=0, loc=true, sampleClsName=java.lang.String, pendingUndeploy=false, undeployed=false, usage=0]], taskClsName=edu.banda.coel.server.grid.GridCollectionTask, sesId=cc04ede5941-e05a00ce-2864-46a8-bf7c-4452f2a6d46e, startTime=1414636742023, endTime=9223372036854775807, taskNodeId=e05a00ce-2864-46a8-bf7c-4452f2a6d46e, clsLdr=sun.misc.Launcher$AppClassLoader@684be8b8, closed=false, cpSpi=null, failSpi=null, loadSpi=null, usage=1, fullSup=false, subjId=e05a00ce-2864-46a8-bf7c-4452f2a6d46e], jobId=21b4ede5941-368ffe13-76c7-42f6-9339-a34c772c0931], job=edu.banda.coel.server.grid.GridCollectionInputTask$1@1886b071]
For more information see:
Troubleshooting: http://bit.ly/GridGain-Troubleshooting
Documentation Center: http://bit.ly/GridGain-Documentation
at org.gridgain.grid.kernal.processors.job.GridJobProcessor.onBeforeActivateJob(GridJobProcessor.java:1190)
at org.gridgain.grid.kernal.processors.job.GridJobProcessor.access$1500(GridJobProcessor.java:62)
at org.gridgain.grid.kernal.processors.job.GridJobProcessor$CollisionJobContext.activate(GridJobProcessor.java:1469)
at org.gridgain.grid.spi.collision.jobstealing.GridJobStealingCollisionSpi.checkBusy(GridJobStealingCollisionSpi.java:640)
at org.gridgain.grid.spi.collision.jobstealing.GridJobStealingCollisionSpi.onCollision(GridJobStealingCollisionSpi.java:589)
at org.gridgain.grid.kernal.managers.collision.GridCollisionManager.onCollision(GridCollisionManager.java:124)
at org.gridgain.grid.kernal.processors.job.GridJobProcessor.handleCollisions(GridJobProcessor.java:669)
at org.gridgain.grid.kernal.processors.job.GridJobProcessor.access$3000(GridJobProcessor.java:62)
at org.gridgain.grid.kernal.processors.job.GridJobProcessor$JobEventListener.onJobFinished(GridJobProcessor.java:1636)
at org.gridgain.grid.kernal.processors.job.GridJobWorker.finishJob(GridJobWorker.java:807)
at org.gridgain.grid.kernal.processors.job.GridJobWorker.execute0(GridJobWorker.java:533)
at org.gridgain.grid.kernal.processors.job.GridJobWorker.body(GridJobWorker.java:429)
... 4 more