Hadoop MapReduce作业挂起,“;货柜;问题

Hadoop MapReduce作业挂起,“;货柜;问题,hadoop,mapreduce,yarn,Hadoop,Mapreduce,Yarn,当我运行MapReduce作业时,它只是挂起并最终失败(大约20分钟后) 这是我在8088上看到的错误代码 exited with exitCode: -100 due to: Container expired since it was unused.Failing this attempt.. Failing the application. 对这个问题有什么想法吗 我正在运行Hadoop 2.2 更新: 问题似乎与此有关: Container killed by the framewo

当我运行MapReduce作业时,它只是挂起并最终失败(大约20分钟后)

这是我在8088上看到的错误代码

exited with exitCode: -100 due to: Container expired since it was unused.Failing this attempt.. Failing the application. 
对这个问题有什么想法吗

我正在运行Hadoop 2.2

更新:

问题似乎与此有关:

Container killed by the framework, either due to being released by the application or being 'lost' due to node failures etc. have a special exit code of -100.
更新2:

这些错误来自ResourceManager日志:

2013-12-18 04:28:42,544 INFO 

org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: completedContainer queue=root usedCapacity=0.0 absoluteUsedCapacity=0.0 used=<memory:0, vCores:0> cluster=<memory:16384, vCores:16>
2013-12-18 04:28:42,544 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Re-sorting completed queue: root.default stats: default: capacity=1.0, absoluteCapacity=1.0, usedResources=<memory:0, vCores:0>usedCapacity=0.0, absoluteUsedCapacity=0.0, numApps=1, numContainers=0
2013-12-18 04:28:42,544 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application appattempt_1387307711170_0002_000002 released container container_1387307711170_0002_02_000001 on node: host: slave-2:42143 #containers=0 available=8192 used=0 with event: EXPIRE
2013-12-18 04:28:42,544 INFO org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: Unregistering app attempt : appattempt_1387307711170_0002_000002
2013-12-18 04:28:42,545 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1387307711170_0002_000002 State change from ALLOCATED to FAILED
2013-12-18 04:28:42,545 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Application application_1387307711170_0002 failed 2 times due to AM Container for appattempt_1387307711170_0002_000002 exited with  exitCode: -100 due to: Container expired since it was unused.Failing this attempt.. Failing the application.
    2013-12-18 04:28:42,546 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Removing info for app: application_1387307711170_0002
    2013-12-18 04:28:42,546 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1387307711170_0002 State change from ACCEPTED to FAILED
    2013-12-18 04:28:42,546 WARN org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=hduser   OPERATION=Application Finished - Failed TARGET=RMAppManager     RESULT=FAILURE  DESCRIPTION=App failed with state: FAILED       PERMISSIONS=Application application_1387307711170_0002 failed 2 times due to AM Container for appattempt_1387307711170_0002_000002 exited with  exitCode: -100 due to: Container expired since it was unused.Failing this attempt.. Failing the application.    APPID=application_1387307711170_0002
2013-12-18 04:28:42,546 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAppManager$ApplicationSummary: appId=application_1387307711170_0002,name=streamjob5941238512810428268.jar,user=hduser,queue=default,state=FAILED,trackingUrl=master-1:8088/cluster/app/application_1387307711170_0002,appMasterHost=N/A,startTime=1387339379570,finishTime=1387340922546,finalStatus=FAILED
2013-12-18 04:28:42,546 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application appattempt_1387307711170_0002_000002 is done. finalState=FAILED
2013-12-18 04:28:42,546 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo: Application application_1387307711170_0002 requests cleared
2013-12-18 04:28:42,546 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: Application removed - appId: application_1387307711170_0002 user: hduser queue: default #user-pending-applications: 0 #user-active-applications: 0 #queue-pending-applications: 0 #queue-active-applications: 0
2013-12-18 04:28:42,547 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Application removed - appId: application_1387307711170_0002 user: hduser leaf-queue of parent: root #applications: 0
2013-12-18 04:28:43,136 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: slave-2/10.239.132.243:42143. Already tried 39 time(s); maxRetries=45
2013-12-18 04:29:03,157 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: slave-2/10.239.132.243:42143. Already tried 40 time(s); maxRetries=45
2013-12-18 04:29:23,158 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: slave-2/10.239.132.243:42143. Already tried 41 time(s); maxRetries=45
2013-12-18 04:29:43,179 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: slave-2/10.239.132.243:42143. Already tried 42 time(s); maxRetries=45
2013-12-18 04:30:03,183 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: slave-2/10.239.132.243:42143. Already tried 43 time(s); maxRetries=45
2013-12-18 04:30:23,185 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: slave-2/10.239.132.243:42143. Already tried 44 time(s); maxRetries=45
2013-12-18 04:30:43,208 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error launching appattempt_1387307711170_0002_000002. Got exception: org.apache.hadoop.net.ConnectTimeoutException: Call From ip-10-73-169-19/10.73.169.19 to slave-2:42143 failed on socket timeout exception: org.apache.hadoop.net.ConnectTimeoutException: 20000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=slave-2/10.239.132.243:42143]; For more details see:  http://wiki.apache.org/hadoop/SocketTimeout
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
        at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:749)
        at org.apache.hadoop.ipc.Client.call(Client.java:1351)
        at org.apache.hadoop.ipc.Client.call(Client.java:1300)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
        at com.sun.proxy.$Proxy69.startContainers(Unknown Source)
        at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:96)
        at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:118)
        at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:249)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:724)
Caused by: org.apache.hadoop.net.ConnectTimeoutException: 20000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=slave-2/10.239.132.243:42143]
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:532)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493)
        at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:547)
        at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:642)
        at org.apache.hadoop.ipc.Client$Connection.access$2600(Client.java:314)
2013-12-18 04:30:43,208 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: LAUNCH_FAILED at FAILED
        at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
        at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
        at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
        at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:625)
        at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:104)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:566)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:547)
        at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134)
        at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:81)
        at java.lang.Thread.run(Thread.java:724)
2013-12-18 19:15:17,626 INFO org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManager: Rolling master-key for amrm-tokens
2013-12-18 19:15:17,632 INFO org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSecretManager: Rolling master-key for container-tokens
2013-12-18 19:15:17,633 INFO org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSecretManager: Going to activate master-key with key-id 422264835 in 900000ms
2013-12-18 19:15:17,637 INFO org.apache.hadoop.yarn.server.resourcemanager.security.NMTokenSecretManagerInRM: Rolling master-key for nm-tokens
2013-12-18 19:15:17,637 INFO org.apache.hadoop.yarn.server.resourcemanager.security.NMTokenSecretManagerInRM: Going to activate master-key with key-id 1883530799 in 900000ms
2013-12-18 19:15:25,884 INFO org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: Updating the current master key for generating delegation tokens
2013-12-18 19:15:25,885 INFO org.apache.hadoop.yarn.server.resourcemanager.security.RMDelegationTokenSecretManager: storing master key with keyID 3
2013-12-18 19:30:17,633 INFO org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSecretManager: Activating next master key with id: 422264835
2013-12-18 19:30:17,637 INFO org.apache.hadoop.yarn.server.resourcemanager.security.NMTokenSecretManagerInRM: Activating next master key with id: 1883530799
2013-12-18 04:28:42544信息
org.apache.hadoop.warn.server.resourcemanager.scheduler.capacity.ParentQueue:completedContainer queue=root usedCapacity=0.0 absoluteUsedCapacity=0.0 used=cluster=
2013-12-18 04:28:42544 INFO org.apache.hadoop.warn.server.resourcemanager.scheduler.capacity.ParentQueue:重新排序完成的队列:root.default stats:default:capacity=1.0,absoluteCapacity=1.0,usedResources=usedCapacity=0.0,absoluteUsedCapacity=0.0,numApps=1,numContainers=0
2013-12-18 04:28:42544 INFO org.apache.hadoop.warn.server.resourcemanager.scheduler.capacity.CapacityScheduler:Application appattempt_1387307711170_0002_000002发布的容器容器容器在节点上:主机:slave-2:42143#containers=0 available=8192 used=0,事件:EXPIRE
2013-12-18 04:28:42544 INFO org.apache.hadoop.warn.server.resourcemanager.ApplicationMasterService:注销应用程序尝试:appattempt\u 1387307711170\u 0002\u000002
2013-12-18 04:28:42545 INFO org.apache.hadoop.warn.server.resourcemanager.rmapp.trust.RMAppAttemptImpl:appattempt\u 1387307711170\u 0002\u000002状态从已分配更改为失败
2013-12-18 04:28:42545 INFO org.apache.hadoop.warn.server.resourcemanager.rmapp.RMAppImpl:应用程序\u 1387307711170 \u 0002由于appattempt的AM容器退出而失败2次\u 1387307711170 \u 0002 \u 0002 \u000002使用exitCode:-100由于:容器因未使用而过期。此尝试失败。。应用程序失败。
2013-12-18 04:28:42546 INFO org.apache.hadoop.warn.server.resourcemanager.recovery.RMStateStore:删除应用程序的信息:应用程序
2013-12-18 04:28:42546 INFO org.apache.hadoop.warn.server.resourcemanager.rmapp.RMAppImpl:application_1387307711170_0002状态从已接受更改为失败
2013-12-18 04:28:42546警告org.apache.hadoop.WARN.server.resourcemanager.RMAuditLogger:USER=hduser OPERATION=Application Finished-Failed TARGET=RMAppManager RESULT=FAILURE DESCRIPTION=App Failed with state:Failed PERMISSIONS=Application\u 1387307711170\u 0002由于的AM容器而失败2次appattempt_1387307711170_0002_000002已退出,退出代码为:-100,原因:容器已过期,因为它未使用。此尝试失败。。应用程序失败。APPID=应用程序\u 1387307711170\u 0002
2013-12-18 04:28:42546 INFO org.apache.hadoop.warn.server.resourcemanager.RMAppManager$ApplicationSummary:appId=application\u 1387307711170\u 0002,name=streamjob5941238512810428268.jar,user=hduser,queue=default,state=FAILED,trackingUrl=master-1:8088/cluster/app/application\u 1387307711170\u 0002,appMasterHost=N/A,startTime=138739379570,finishTime=1387340922546,finalStatus=FAILED
2013-12-18 04:28:42546 INFO org.apache.hadoop.warn.server.resourcemanager.scheduler.capacity.CapacityScheduler:应用程序appattempt\u 1387307711170\u 0002\u000002已完成。finalState=失败
2013-12-18 04:28:42546 INFO org.apache.hadoop.warn.server.resourcemanager.scheduler.AppSchedulingInfo:应用程序请求已清除
2013-12-18 04:28:42546 INFO org.apache.hadoop.warn.server.resourcemanager.scheduler.capacity.LeafQueue:已删除应用程序-appId:Application_1387307711170_0002用户:hduser queue:默认#用户挂起的应用程序:0#用户活动的应用程序:0#挂起的应用程序:0#队列活动的应用程序:0
2013-12-18 04:28:42547 INFO org.apache.hadoop.warn.server.resourcemanager.scheduler.capacity.ParentQueue:已删除应用程序-appId:Application_1387307711170_0002用户:hduser父级叶队列:root#应用程序:0
2013-12-18 04:28:43136 INFO org.apache.hadoop.ipc.Client:正在重试连接到服务器:slave-2/10.239.132.243:42143。已试过39次;maxRetries=45
2013-12-18 04:29:03157 INFO org.apache.hadoop.ipc.Client:正在重试连接到服务器:slave-2/10.239.132.243:42143。已试过40次;maxRetries=45
2013-12-18 04:29:23158 INFO org.apache.hadoop.ipc.Client:正在重试连接到服务器:slave-2/10.239.132.243:42143。已试过41次;maxRetries=45
2013-12-18 04:29:43179 INFO org.apache.hadoop.ipc.Client:正在重试连接到服务器:slave-2/10.239.132.243:42143。已试过42次;maxRetries=45
2013-12-18 04:30:03183 INFO org.apache.hadoop.ipc.Client:正在重试连接到服务器:slave-2/10.239.132.243:42143。已试过43次;maxRetries=45
2013-12-18 04:30:23185 INFO org.apache.hadoop.ipc.Client:正在重试连接到服务器:slave-2/10.239.132.243:42143。已试过44次;maxRetries=45
2013-12-18 04:30:43208 INFO org.apache.hadoop.warn.server.resourcemanager.amlauncher.amlauncher:启动appattempt\u 1387307711170\u 0002\u000002时出错。获取异常:org.apache.hadoop.net.ConnectTimeoutException:从ip-10-73-169-19/10.73.169.19调用slave-2:42143在套接字超时时失败异常:org.apache.hadoop.net.ConnectTimeoutException:等待通道准备好连接时超时20000毫秒。ch:java.nio.channels.SocketChannel[connectionpending remote=slave-2/10.239.132.243:42143];有关更多详细信息,请参阅:http://wiki.apache.org/hadoop/SocketTimeout
位于sun.reflect.NativeConstructorAccessorImpl.newInstance0(本机方法)
位于sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
在sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
位于java.lang.reflect.Constructor.newInstance(Constructor.java:526)
位于org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
在org.apache.hadoop.n