Apache spark 执行器被阻塞在";UserGroupInformation.doAs“;

Apache spark 执行器被阻塞在";UserGroupInformation.doAs“;,apache-spark,Apache Spark,明天我已经构建了一个Spark集群,今天我想在集群上运行一个WordCount程序 我的环境:jdk1.8.121+scala2.10.4+hadoop2.6.5+spark1.6.2 集群:主+从01+从02 客户:客户 附加环境:master、slave01、slave02、client都在同一LAN中[master、slave01、slave02可以在互不保密的情况下登录],并且登录的用户都是root用户 演示代码如下: defmain(args:Array[String])={ val输

明天我已经构建了一个Spark集群,今天我想在集群上运行一个WordCount程序 我的环境:jdk1.8.121+scala2.10.4+hadoop2.6.5+spark1.6.2 集群:主+从01+从02 客户:客户 附加环境:master、slave01、slave02、client都在同一LAN中[master、slave01、slave02可以在互不保密的情况下登录],并且登录的用户都是root用户

演示代码如下:

defmain(args:Array[String])={
val输入路径=”hdfs://master/970655147/input/01WordCount/" 
//1.本地模式
//val conf=new SparkConf().setMaster(“本地”).setAppName(“字数”)
//2.标准模式
val conf=new SparkConf().setMaster(“spark://master:7077)。setAppName(“字数”)
.set(“spark.executor.memory”,“64M”)
.set(“spark.executor.cores”、“1”)
val sc=新的SparkContext(配置)
val line=sc.textFile(inputPath)
行。foreach(println)
sc.停止
} 
  • 首先,我使用localMode,然后程序正常运行
  • 第二,我在集群[deploy in idea]上运行它,但失败了,似乎有交互阻塞,根据日志,Master为app分配Executor,然后Executor run,但它似乎被阻塞在“UserGroupInformation.doAs(UserGroupInformation.java:1643);SparkHadoopUtil.RunAsParkUser(SparkHadoopUtil.scala:59)”,所以执行器没有为驱动程序注册,然后驱动程序就没有资源了
  • 接下来我尝试使用spark submit[spark submit--master]spark://master:7077 --类com.hx.test.Test01WordCount HelloSpark.jar]或slave02,但结果相同 请给我一些建议,谢谢
  • 一些日志信息如下所示:


  • ExecutorBackend的引导cmd

    root@slave02:~# jps 
    7984 CoarseGrainedExecutorBackend 
    6468 NodeManager 
    8037 Jps 
    955 Worker 
    7981 CoarseGrainedExecutorBackend 
    7982 CoarseGrainedExecutorBackend 
    6366 DataNode 
    7983 CoarseGrainedExecutorBackend 
    root@slave02:~# ps -ef | grep 7983 
    root 7983 955 14 06:21 ? 00:00:03 /usr/local/ProgramFiles/jdk1.8.0_121/bin/java -cp /usr/local/ProgramFiles/spark-1.6.2-bin-hadoop2.6/conf/:/usr/local/ProgramFiles/spark-1.6.2-bin-hadoop2.6/lib/spark-assembly-1.6.2-hadoop2.6.0.jar:/usr/local/ProgramFiles/spark-1.6.2-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar:/usr/local/ProgramFiles/spark-1.6.2-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar:/usr/local/ProgramFiles/spark-1.6.2-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar:/usr/local/ProgramFiles/hadoop-2.6.5/etc/hadoop/ -Xms64M -Xmx64M -Dspark.driver.port=37230 org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@192.168.0.191:37230 --executor-id 1 --hostname 192.168.0.182 --cores 1 --app-id app-20170408062155-0015 --worker-url spark://Worker@192.168.0.182:46466 
    root 8050 4249 4 06:22 pts/1 00:00:00 grep --color=auto 7983 
    root@slave02:~# 
    
  • 执行者错误日志

    root@slave02:/usr/local/ProgramFiles/spark-1.6.2-bin-hadoop2.6# cat work/app-20170408062155-0015/0/stderr 
    17/04/08 06:22:20 INFO executor.CoarseGrainedExecutorBackend: Registered signal handlers for [TERM, HUP, INT] 
    17/04/08 06:22:24 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 
    17/04/08 06:22:28 INFO spark.SecurityManager: Changing view acls to: root 
    17/04/08 06:22:28 INFO spark.SecurityManager: Changing modify acls to: root 
    17/04/08 06:22:28 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root) 
    17/04/08 06:23:06 INFO spark.SecurityManager: Changing view acls to: root 
    17/04/08 06:23:06 INFO spark.SecurityManager: Changing modify acls to: root 
    17/04/08 06:23:06 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root) 
    17/04/08 06:23:24 INFO slf4j.Slf4jLogger: Slf4jLogger started 
    17/04/08 06:23:29 INFO Remoting: Starting remoting 
    Exception in thread "main" 17/04/08 06:23:46 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon. 
    17/04/08 06:23:47 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports. 
    java.lang.reflect.UndeclaredThrowableException 
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1643) 
    at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:68) 
    at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:151) 
    at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:253) 
    at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala) 
    Caused by: java.util.concurrent.TimeoutException: Futures timed out after [10000 milliseconds] 
    at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) 
    at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) 
    at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) 
    at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) 
    at scala.concurrent.Await$.result(package.scala:107) 
    at akka.remote.Remoting.start(Remoting.scala:179) 
    at akka.remote.RemoteActorRefProvider.init(RemoteActorRefProvider.scala:184) 
    at akka.actor.ActorSystemImpl.liftedTree2$1(ActorSystem.scala:620) 
    at akka.actor.ActorSystemImpl._start$lzycompute(ActorSystem.scala:617) 
    at akka.actor.ActorSystemImpl._start(ActorSystem.scala:617) 
    at akka.actor.ActorSystemImpl.start(ActorSystem.scala:634) 
    at akka.actor.ActorSystem$.apply(ActorSystem.scala:142) 
    at akka.actor.ActorSystem$.apply(ActorSystem.scala:119) 
    at org.apache.spark.util.AkkaUtils$.org$apache$spark$util$AkkaUtils$$doCreateActorSystem(AkkaUtils.scala:121) 
    at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:53) 
    at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:52) 
    at org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:2024) 
    at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) 
    at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:2015) 
    at org.apache.spark.util.AkkaUtils$.createActorSystem(AkkaUtils.scala:55) 
    at org.apache.spark.SparkEnv$.create(SparkEnv.scala:266) 
    at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:217) 
    at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:186) 
    at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:69) 
    at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:68) 
    at java.security.AccessController.doPrivileged(Native Method) 
    at javax.security.auth.Subject.doAs(Subject.java:422) 
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) 
    ... 4 more 
    root@slave02:/usr/local/ProgramFiles/spark-1.6.2-bin-hadoop2.6# cat work/app-20170408062155-0015/0/stdout 
    root@slave02:/usr/local/ProgramFiles/spark-1.6.2-bin-hadoop2.6# 
    
  • 驾驶日志

    Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 
    17/04/08 21:21:44 INFO SparkContext: Running Spark version 1.6.2 
    17/04/08 21:21:45 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 
    17/04/08 21:21:45 INFO SecurityManager: Changing view acls to: root 
    17/04/08 21:21:45 INFO SecurityManager: Changing modify acls to: root 
    17/04/08 21:21:45 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root) 
    17/04/08 21:21:46 INFO Utils: Successfully started service 'sparkDriver' on port 37230. 
    17/04/08 21:21:47 INFO Slf4jLogger: Slf4jLogger started 
    17/04/08 21:21:47 INFO Remoting: Starting remoting 
    17/04/08 21:21:48 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@192.168.0.191:43974] 
    17/04/08 21:21:48 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 43974. 
    17/04/08 21:21:48 INFO SparkEnv: Registering MapOutputTracker 
    17/04/08 21:21:48 INFO SparkEnv: Registering BlockManagerMaster 
    17/04/08 21:21:48 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-ef79b656-b7f4-4cb3-be3e-0f8bb61baa9d 
    17/04/08 21:21:48 INFO MemoryStore: MemoryStore started with capacity 431.3 MB 
    17/04/08 21:21:48 INFO SparkEnv: Registering OutputCommitCoordinator 
    17/04/08 21:21:54 INFO Utils: Successfully started service 'SparkUI' on port 4040. 
    17/04/08 21:21:54 INFO SparkUI: Started SparkUI at http://192.168.0.191:4040 
    17/04/08 21:21:54 INFO AppClient$ClientEndpoint: Connecting to master spark://master:7077... 
    17/04/08 21:21:55 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20170408062155-0015 
    17/04/08 21:21:55 INFO AppClient$ClientEndpoint: Executor added: app-20170408062155-0015/0 on worker-20170408024004-192.168.0.182-46466 (192.168.0.182:46466) with 1 cores 
    17/04/08 21:21:55 INFO SparkDeploySchedulerBackend: Granted executor ID app-20170408062155-0015/0 on hostPort 192.168.0.182:46466 with 1 cores, 64.0 MB RAM 
    17/04/08 21:21:55 INFO AppClient$ClientEndpoint: Executor added: app-20170408062155-0015/1 on worker-20170408024004-192.168.0.182-46466 (192.168.0.182:46466) with 1 cores 
    17/04/08 21:21:55 INFO SparkDeploySchedulerBackend: Granted executor ID app-20170408062155-0015/1 on hostPort 192.168.0.182:46466 with 1 cores, 64.0 MB RAM 
    17/04/08 21:21:55 INFO AppClient$ClientEndpoint: Executor added: app-20170408062155-0015/2 on worker-20170408024004-192.168.0.182-46466 (192.168.0.182:46466) with 1 cores 
    17/04/08 21:21:55 INFO SparkDeploySchedulerBackend: Granted executor ID app-20170408062155-0015/2 on hostPort 192.168.0.182:46466 with 1 cores, 64.0 MB RAM 
    17/04/08 21:21:55 INFO AppClient$ClientEndpoint: Executor added: app-20170408062155-0015/3 on worker-20170408024004-192.168.0.182-46466 (192.168.0.182:46466) with 1 cores 
    17/04/08 21:21:55 INFO SparkDeploySchedulerBackend: Granted executor ID app-20170408062155-0015/3 on hostPort 192.168.0.182:46466 with 1 cores, 64.0 MB RAM 
    17/04/08 21:21:55 INFO AppClient$ClientEndpoint: Executor added: app-20170408062155-0015/4 on worker-20170408024003-192.168.0.181-45183 (192.168.0.181:45183) with 1 cores 
    17/04/08 21:21:55 INFO SparkDeploySchedulerBackend: Granted executor ID app-20170408062155-0015/4 on hostPort 192.168.0.181:45183 with 1 cores, 64.0 MB RAM 
    17/04/08 21:21:55 INFO AppClient$ClientEndpoint: Executor added: app-20170408062155-0015/5 on worker-20170408024003-192.168.0.181-45183 (192.168.0.181:45183) with 1 cores 
    17/04/08 21:21:55 INFO SparkDeploySchedulerBackend: Granted executor ID app-20170408062155-0015/5 on hostPort 192.168.0.181:45183 with 1 cores, 64.0 MB RAM 
    17/04/08 21:21:55 INFO AppClient$ClientEndpoint: Executor added: app-20170408062155-0015/6 on worker-20170408024003-192.168.0.181-45183 (192.168.0.181:45183) with 1 cores 
    17/04/08 21:21:55 INFO SparkDeploySchedulerBackend: Granted executor ID app-20170408062155-0015/6 on hostPort 192.168.0.181:45183 with 1 cores, 64.0 MB RAM 
    17/04/08 21:21:55 INFO AppClient$ClientEndpoint: Executor added: app-20170408062155-0015/7 on worker-20170408024003-192.168.0.181-45183 (192.168.0.181:45183) with 1 cores 
    17/04/08 21:21:55 INFO SparkDeploySchedulerBackend: Granted executor ID app-20170408062155-0015/7 on hostPort 192.168.0.181:45183 with 1 cores, 64.0 MB RAM 
    17/04/08 21:21:55 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 42255. 
    17/04/08 21:21:55 INFO NettyBlockTransferService: Server created on 42255 
    17/04/08 21:21:56 INFO BlockManagerMaster: Trying to register BlockManager 
    17/04/08 21:21:57 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.0.191:42255 with 431.3 MB RAM, BlockManagerId(driver, 192.168.0.191, 42255) 
    17/04/08 21:21:57 INFO BlockManagerMaster: Registered BlockManager 
    17/04/08 21:21:58 INFO AppClient$ClientEndpoint: Executor updated: app-20170408062155-0015/0 is now RUNNING 
    17/04/08 21:21:58 INFO AppClient$ClientEndpoint: Executor updated: app-20170408062155-0015/1 is now RUNNING 
    17/04/08 21:21:58 INFO AppClient$ClientEndpoint: Executor updated: app-20170408062155-0015/2 is now RUNNING 
    17/04/08 21:21:58 INFO AppClient$ClientEndpoint: Executor updated: app-20170408062155-0015/3 is now RUNNING 
    17/04/08 21:22:00 INFO AppClient$ClientEndpoint: Executor updated: app-20170408062155-0015/4 is now RUNNING 
    17/04/08 21:22:01 INFO AppClient$ClientEndpoint: Executor updated: app-20170408062155-0015/5 is now RUNNING 
    17/04/08 21:22:01 INFO AppClient$ClientEndpoint: Executor updated: app-20170408062155-0015/6 is now RUNNING 
    17/04/08 21:22:01 INFO AppClient$ClientEndpoint: Executor updated: app-20170408062155-0015/7 is now RUNNING 
    17/04/08 21:22:03 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0 
    17/04/08 21:22:05 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 107.7 KB, free 107.7 KB) 
    17/04/08 21:22:06 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 9.8 KB, free 117.5 KB) 
    17/04/08 21:22:06 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.0.191:42255 (size: 9.8 KB, free: 431.2 MB) 
    17/04/08 21:22:06 INFO SparkContext: Created broadcast 0 from textFile at Test01WordCount.scala:30 
    17/04/08 21:22:21 INFO FileInputFormat: Total input paths to process : 1 
    17/04/08 21:22:21 INFO SparkContext: Starting job: foreach at Test01WordCount.scala:33 
    17/04/08 21:22:21 INFO DAGScheduler: Got job 0 (foreach at Test01WordCount.scala:33) with 2 output partitions 
    17/04/08 21:22:21 INFO DAGScheduler: Final stage: ResultStage 0 (foreach at Test01WordCount.scala:33) 
    17/04/08 21:22:21 INFO DAGScheduler: Parents of final stage: List() 
    17/04/08 21:22:21 INFO DAGScheduler: Missing parents: List() 
    17/04/08 21:22:21 INFO DAGScheduler: Submitting ResultStage 0 (hdfs://master/970655147/input/01WordCount/ MapPartitionsRDD[1] at textFile at Test01WordCount.scala:30), which has no missing parents 
    17/04/08 21:22:21 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 3.0 KB, free 120.5 KB) 
    17/04/08 21:22:21 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 1842.0 B, free 122.3 KB) 
    17/04/08 21:22:21 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 192.168.0.191:42255 (size: 1842.0 B, free: 431.2 MB) 
    17/04/08 21:22:21 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1006 
    17/04/08 21:22:21 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (hdfs://master/970655147/input/01WordCount/ MapPartitionsRDD[1] at textFile at Test01WordCount.scala:30) 
    17/04/08 21:22:21 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks 
    17/04/08 21:22:36 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 
    17/04/08 21:23:04 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 
    17/04/08 21:23:06 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 
    17/04/08 21:23:21 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 
    17/04/08 21:23:36 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 
    17/04/08 21:23:51 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 
    17/04/08 21:24:02 INFO AppClient$ClientEndpoint: Executor updated: app-20170408062155-0015/1 is now EXITED (Command exited with code 1) 
    17/04/08 21:24:02 INFO SparkDeploySchedulerBackend: Executor app-20170408062155-0015/1 removed: Command exited with code 1 
    17/04/08 21:24:06 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 
    17/04/08 21:24:21 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 
    17/04/08 21:24:36 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 
    17/04/08 21:24:51 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 
    17/04/08 21:25:06 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 
    17/04/08 21:25:21 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 
    17/04/08 21:25:36 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 
    17/04/08 21:25:51 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 
    17/04/08 21:26:02 WARN NettyRpcEndpointRef: Error sending message [message = RemoveExecutor(1,Command exited with code 1)] in 1 attempts 
    org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [120 seconds]. This timeout is controlled by spark.rpc.askTimeout 
    at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48) 
    at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:63) 
    at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) 
    at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33) 
    at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76) 
    at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101) 
    at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:77) 
    at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:370) 
    at org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend.executorRemoved(SparkDeploySchedulerBackend.scala:144) 
    at org.apache.spark.deploy.client.AppClient$ClientEndpoint$$anonfun$receive$1.applyOrElse(AppClient.scala:184) 
    at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:116) 
    at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204) 
    at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100) 
    at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215) 
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
    at java.lang.Thread.run(Thread.java:745) 
    Caused by: java.util.concurrent.TimeoutException: Futures timed out after [120 seconds] 
    at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) 
    at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) 
    at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) 
    at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) 
    at scala.concurrent.Await$.result(package.scala:107) 
    at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75) 
    ... 12 more 
    
  • 一些参考链接如下:


  • http://apache-spark-user-list.1001560.n3.nabble.com/Submitting-Spark-job-on-Unix-cluster-from-dev-environment-Windows-td16989.html
  • https://issues.streamsets.com/browse/SDC-4249
  • https://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/Unable-to-create-SparkContext-to-Spark-1-3-Standalone-service-in/td-p/29176
  • 我以前搜索过这些帖子,但是没有解决这个问题,我的集群中有一些关于测试的日志。 或者我想错了,请帮我检查一下。< /P>
  • ufw状态和ssh连接

    root@master:/usr/local/ProgramFiles# ufw status
    Status: inactive
    root@master:/usr/local/ProgramFiles# ssh slave01
    Welcome to Ubuntu 16.04.2 LTS (GNU/Linux 4.4.0-62-generic x86_64)
    
     * Documentation:  https://help.ubuntu.com
     * Management:     https://landscape.canonical.com
     * Support:        https://ubuntu.com/advantage
    Last login: Sat Apr  8 21:33:44 2017 from 192.168.0.119
    root@slave01:~# ufw status
    Status: inactive
    root@slave01:~# ssh slave02
    Welcome to Ubuntu 16.04.2 LTS (GNU/Linux 4.4.0-62-generic x86_64)
    
     * Documentation:  https://help.ubuntu.com
     * Management:     https://landscape.canonical.com
     * Support:        https://ubuntu.com/advantage
    Last login: Sat Apr  8 21:10:33 2017 from 192.168.0.119
    root@slave02:~# ufw status
    Status: inactive
    root@slave02:~# 
    
  • 通过ip或FQDN的网络连接

    2.1. nc in master
    root@master:/usr/local/ProgramFiles# netcat -l 12306
    root@master:/usr/local/ProgramFiles# nc -l 12306
    root@master:/usr/local/ProgramFiles# nc -l 12306
    root@master:/usr/local/ProgramFiles# nc -l 12306
    
    2.2. nc in slave01
    root@slave01:~# nc -vz 192.168.0.180 12306
    Connection to 192.168.0.180 12306 port [tcp/*] succeeded!
    root@slave01:~# nc -vz master 12306
    Connection to master 12306 port [tcp/*] succeeded!
    
    2.3. nc in slave02
    root@slave02:/usr/local/ProgramFiles# nc -vz 192.168.0.180 12306
    Connection to 192.168.0.180 12306 port [tcp/*] succeeded!
    root@slave02:/usr/local/ProgramFiles# nc -vz master 12306
    Connection to master 12306 port [tcp/*] succeeded!
    root@slave02:/usr/local/ProgramFiles# 
    
  • 重新安装其他版本的Spark:


    我重新安装了Spark的其他版本,但同样的问题仍然存在,并且环境中可能存在一些问题


    请给我一些建议,谢谢今天,我想改变java、scala的环境,然后我找到一篇帖子,上面说他用jdk1.7.0_80和scala2.11.8构建了Spark 然后我下载jdk1.7.0_40和scala2.11.8,然后安装在我的集群上[master、slave01、slave02]

    更新环境变量hadoop的hadoop-env.sh,spark的spark.env-sh,然后关闭spark,hadoop,启动hadoop,spark 然后我使用“/bin/sparkshell--masterspark://master:7077 --执行器存储器64M“,运行火花外壳

    然后spark shell没有正常运行,但是日志似乎与以前不同,然后我查找执行器日志

    我知道

        Exception in thread "main" java.lang.IllegalArgumentException: System memory 64880640 must be at least 4.718592E8. Please use a larger heap size.
                at org.apache.spark.memory.UnifiedMemoryManager$.getMaxMemory(UnifiedMemoryManager.scala:198)
                at org.apache.spark.memory.UnifiedMemoryManager$.apply(UnifiedMemoryManager.scala:180)
                at org.apache.spark.SparkEnv$.create(SparkEnv.scala:354)
                at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:217)
                at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:186)
                at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:69)
                at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:68)
                at java.security.AccessController.doPrivileged(Native Method)
                at javax.security.auth.Subject.doAs(Subject.java:415)
                at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
                at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:68)
                at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:151)
                at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:253)
                at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
    
    然后我得到了“GrossGrainedExecutorBackend.scala:186”,根据错误日志信息,它似乎没有在“UserGroupInformation.doAs(UserGroupInformation.java:1643)”处被阻止 我改了--“执行者记忆”,指定为“512M”

    然后重新键入命令sparkshell 此时,我成功地进入,然后我尝试在集群上运行“WordCount”

    首先更新“.set”(“spark.executor.memory”,“64M”)” 然后构建jar,并将其放入集群,使其正常运行


    明白了,看来这个问题已经解决了

    然后我想找出这个问题发生的原因,在jdk或scala中

    然后我测试更新环境变量spark的spark-env.sh,hadoop的hadoop-env.sh,介于'jdk1.8.0_121&scala2.10.4'和'jdk1.7.0_40&scala 2.11.8'之间,但此时我发现这两个环境对于spark shell和'WordCount'来说都是可以的

    即使我删除了“jdk1.7.0(40&scala2.11.8)”,并且[当我遇到这个问题时]所有配置都恢复到之前的状态,但仍然可以

    哦,天哪,这是个神秘的问题。。。 即使我没有找到这个问题的原始原因,但我仍然很满意,至少我现在集群还可以


    谢谢,@Kaushal

    警告TaskSchedulerImpl:初始作业未接受任何资源;检查您的群集UI以确保工作人员已注册并且有足够的资源
    意味着您的群集没有启动应用程序的可用资源。打开群集用户界面,检查是否有其他应用程序正在运行并使用群集的全部资源。@Kaushal感谢您的回复,在我的群集中,只有一个应用程序[我只是在测试],这是最明显的原因,更深层次的原因是以下步骤造成的,1。主计划为驱动程序分配执行器,2。驱动程序的工人启动执行器,3。驱动程序的执行器寄存器[在此阶段,执行器在'GrossGrainedExecutorBackend$.main'被阻止],4。驱动程序计划执行器执行驱动程序程序,依此类推..您确定,您的hdfs Url
    “hdfs://master/970655147/input/01WordCount/“
    是否正确,或者您需要指定hdfs端口?恩,是的,我使用默认端口,在本地运行,可以在Spark程序中检测到,并且可以检测到hadoop程序too@Kaushal嗨,谢谢,这个问题很难解决,请看下面的评论