Apache spark 执行器被阻塞在"；UserGroupInformation.doAs“；_Apache Spark

Apache spark 执行器被阻塞在"；UserGroupInformation.doAs“；

apache-spark

Apache spark 执行器被阻塞在"；UserGroupInformation.doAs“；,apache-spark,Apache Spark,明天我已经构建了一个Spark集群，今天我想在集群上运行一个WordCount程序我的环境：jdk1.8.121+scala2.10.4+hadoop2.6.5+spark1.6.2 集群：主+从01+从02 客户：客户附加环境：master、slave01、slave02、client都在同一LAN中[master、slave01、slave02可以在互不保密的情况下登录]，并且登录的用户都是root用户演示代码如下： defmain（args:Array[String]）={ val输

明天我已经构建了一个Spark集群，今天我想在集群上运行一个WordCount程序我的环境：jdk1.8.121+scala2.10.4+hadoop2.6.5+spark1.6.2 集群：主+从01+从02 客户：客户附加环境：master、slave01、slave02、client都在同一LAN中[master、slave01、slave02可以在互不保密的情况下登录]，并且登录的用户都是root用户

演示代码如下：

defmain（args:Array[String]）={
val输入路径=”hdfs://master/970655147/input/01WordCount/" 
//1.本地模式
//val conf=new SparkConf（）.setMaster（“本地”）.setAppName（“字数”）
//2.标准模式
val conf=new SparkConf（）.setMaster（“spark://master:7077）。setAppName（“字数”）
.set（“spark.executor.memory”，“64M”）
.set（“spark.executor.cores”、“1”）
val sc=新的SparkContext（配置）
val line=sc.textFile（inputPath）
行。foreach（println）
sc.停止
}

首先，我使用localMode，然后程序正常运行

第二，我在集群[deploy in idea]上运行它，但失败了，似乎有交互阻塞，根据日志，Master为app分配Executor，然后Executor run，但它似乎被阻塞在“UserGroupInformation.doAs（UserGroupInformation.java:1643）；SparkHadoopUtil.RunAsParkUser（SparkHadoopUtil.scala:59）”，所以执行器没有为驱动程序注册，然后驱动程序就没有资源了

接下来我尝试使用spark submit[spark submit--master]spark://master:7077 --类com.hx.test.Test01WordCount HelloSpark.jar]或slave02，但结果相同请给我一些建议，谢谢

一些日志信息如下所示：

ExecutorBackend的引导cmd

root@slave02:~# jps 7984 CoarseGrainedExecutorBackend 6468 NodeManager 8037 Jps 955 Worker 7981 CoarseGrainedExecutorBackend 7982 CoarseGrainedExecutorBackend 6366 DataNode 7983 CoarseGrainedExecutorBackend root@slave02:~# ps -ef | grep 7983 root 7983 955 14 06:21 ? 00:00:03 /usr/local/ProgramFiles/jdk1.8.0_121/bin/java -cp /usr/local/ProgramFiles/spark-1.6.2-bin-hadoop2.6/conf/:/usr/local/ProgramFiles/spark-1.6.2-bin-hadoop2.6/lib/spark-assembly-1.6.2-hadoop2.6.0.jar:/usr/local/ProgramFiles/spark-1.6.2-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar:/usr/local/ProgramFiles/spark-1.6.2-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar:/usr/local/ProgramFiles/spark-1.6.2-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar:/usr/local/ProgramFiles/hadoop-2.6.5/etc/hadoop/ -Xms64M -Xmx64M -Dspark.driver.port=37230 org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@192.168.0.191:37230 --executor-id 1 --hostname 192.168.0.182 --cores 1 --app-id app-20170408062155-0015 --worker-url spark://Worker@192.168.0.182:46466 root 8050 4249 4 06:22 pts/1 00:00:00 grep --color=auto 7983 root@slave02:~#

执行者错误日志

root@slave02:/usr/local/ProgramFiles/spark-1.6.2-bin-hadoop2.6# cat work/app-20170408062155-0015/0/stderr 17/04/08 06:22:20 INFO executor.CoarseGrainedExecutorBackend: Registered signal handlers for [TERM, HUP, INT] 17/04/08 06:22:24 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 17/04/08 06:22:28 INFO spark.SecurityManager: Changing view acls to: root 17/04/08 06:22:28 INFO spark.SecurityManager: Changing modify acls to: root 17/04/08 06:22:28 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root) 17/04/08 06:23:06 INFO spark.SecurityManager: Changing view acls to: root 17/04/08 06:23:06 INFO spark.SecurityManager: Changing modify acls to: root 17/04/08 06:23:06 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root) 17/04/08 06:23:24 INFO slf4j.Slf4jLogger: Slf4jLogger started 17/04/08 06:23:29 INFO Remoting: Starting remoting Exception in thread "main" 17/04/08 06:23:46 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon. 17/04/08 06:23:47 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports. java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1643) at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:68) at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:151) at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:253) at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala) Caused by: java.util.concurrent.TimeoutException: Futures timed out after [10000 milliseconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:107) at akka.remote.Remoting.start(Remoting.scala:179) at akka.remote.RemoteActorRefProvider.init(RemoteActorRefProvider.scala:184) at akka.actor.ActorSystemImpl.liftedTree2$1(ActorSystem.scala:620) at akka.actor.ActorSystemImpl._start$lzycompute(ActorSystem.scala:617) at akka.actor.ActorSystemImpl._start(ActorSystem.scala:617) at akka.actor.ActorSystemImpl.start(ActorSystem.scala:634) at akka.actor.ActorSystem$.apply(ActorSystem.scala:142) at akka.actor.ActorSystem$.apply(ActorSystem.scala:119) at org.apache.spark.util.AkkaUtils$.org$apache$spark$util$AkkaUtils$$doCreateActorSystem(AkkaUtils.scala:121) at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:53) at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:52) at org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:2024) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:2015) at org.apache.spark.util.AkkaUtils$.createActorSystem(AkkaUtils.scala:55) at org.apache.spark.SparkEnv$.create(SparkEnv.scala:266) at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:217) at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:186) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:69) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:68) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) ... 4 more root@slave02:/usr/local/ProgramFiles/spark-1.6.2-bin-hadoop2.6# cat work/app-20170408062155-0015/0/stdout root@slave02:/usr/local/ProgramFiles/spark-1.6.2-bin-hadoop2.6#

驾驶日志

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 17/04/08 21:21:44 INFO SparkContext: Running Spark version 1.6.2 17/04/08 21:21:45 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 17/04/08 21:21:45 INFO SecurityManager: Changing view acls to: root 17/04/08 21:21:45 INFO SecurityManager: Changing modify acls to: root 17/04/08 21:21:45 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root) 17/04/08 21:21:46 INFO Utils: Successfully started service 'sparkDriver' on port 37230. 17/04/08 21:21:47 INFO Slf4jLogger: Slf4jLogger started 17/04/08 21:21:47 INFO Remoting: Starting remoting 17/04/08 21:21:48 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@192.168.0.191:43974] 17/04/08 21:21:48 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 43974. 17/04/08 21:21:48 INFO SparkEnv: Registering MapOutputTracker 17/04/08 21:21:48 INFO SparkEnv: Registering BlockManagerMaster 17/04/08 21:21:48 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-ef79b656-b7f4-4cb3-be3e-0f8bb61baa9d 17/04/08 21:21:48 INFO MemoryStore: MemoryStore started with capacity 431.3 MB 17/04/08 21:21:48 INFO SparkEnv: Registering OutputCommitCoordinator 17/04/08 21:21:54 INFO Utils: Successfully started service 'SparkUI' on port 4040. 17/04/08 21:21:54 INFO SparkUI: Started SparkUI at http://192.168.0.191:4040 17/04/08 21:21:54 INFO AppClient$ClientEndpoint: Connecting to master spark://master:7077... 17/04/08 21:21:55 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20170408062155-0015 17/04/08 21:21:55 INFO AppClient$ClientEndpoint: Executor added: app-20170408062155-0015/0 on worker-20170408024004-192.168.0.182-46466 (192.168.0.182:46466) with 1 cores 17/04/08 21:21:55 INFO SparkDeploySchedulerBackend: Granted executor ID app-20170408062155-0015/0 on hostPort 192.168.0.182:46466 with 1 cores, 64.0 MB RAM 17/04/08 21:21:55 INFO AppClient$ClientEndpoint: Executor added: app-20170408062155-0015/1 on worker-20170408024004-192.168.0.182-46466 (192.168.0.182:46466) with 1 cores 17/04/08 21:21:55 INFO SparkDeploySchedulerBackend: Granted executor ID app-20170408062155-0015/1 on hostPort 192.168.0.182:46466 with 1 cores, 64.0 MB RAM 17/04/08 21:21:55 INFO AppClient$ClientEndpoint: Executor added: app-20170408062155-0015/2 on worker-20170408024004-192.168.0.182-46466 (192.168.0.182:46466) with 1 cores 17/04/08 21:21:55 INFO SparkDeploySchedulerBackend: Granted executor ID app-20170408062155-0015/2 on hostPort 192.168.0.182:46466 with 1 cores, 64.0 MB RAM 17/04/08 21:21:55 INFO AppClient$ClientEndpoint: Executor added: app-20170408062155-0015/3 on worker-20170408024004-192.168.0.182-46466 (192.168.0.182:46466) with 1 cores 17/04/08 21:21:55 INFO SparkDeploySchedulerBackend: Granted executor ID app-20170408062155-0015/3 on hostPort 192.168.0.182:46466 with 1 cores, 64.0 MB RAM 17/04/08 21:21:55 INFO AppClient$ClientEndpoint: Executor added: app-20170408062155-0015/4 on worker-20170408024003-192.168.0.181-45183 (192.168.0.181:45183) with 1 cores 17/04/08 21:21:55 INFO SparkDeploySchedulerBackend: Granted executor ID app-20170408062155-0015/4 on hostPort 192.168.0.181:45183 with 1 cores, 64.0 MB RAM 17/04/08 21:21:55 INFO AppClient$ClientEndpoint: Executor added: app-20170408062155-0015/5 on worker-20170408024003-192.168.0.181-45183 (192.168.0.181:45183) with 1 cores 17/04/08 21:21:55 INFO SparkDeploySchedulerBackend: Granted executor ID app-20170408062155-0015/5 on hostPort 192.168.0.181:45183 with 1 cores, 64.0 MB RAM 17/04/08 21:21:55 INFO AppClient$ClientEndpoint: Executor added: app-20170408062155-0015/6 on worker-20170408024003-192.168.0.181-45183 (192.168.0.181:45183) with 1 cores 17/04/08 21:21:55 INFO SparkDeploySchedulerBackend: Granted executor ID app-20170408062155-0015/6 on hostPort 192.168.0.181:45183 with 1 cores, 64.0 MB RAM 17/04/08 21:21:55 INFO AppClient$ClientEndpoint: Executor added: app-20170408062155-0015/7 on worker-20170408024003-192.168.0.181-45183 (192.168.0.181:45183) with 1 cores 17/04/08 21:21:55 INFO SparkDeploySchedulerBackend: Granted executor ID app-20170408062155-0015/7 on hostPort 192.168.0.181:45183 with 1 cores, 64.0 MB RAM 17/04/08 21:21:55 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 42255. 17/04/08 21:21:55 INFO NettyBlockTransferService: Server created on 42255 17/04/08 21:21:56 INFO BlockManagerMaster: Trying to register BlockManager 17/04/08 21:21:57 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.0.191:42255 with 431.3 MB RAM, BlockManagerId(driver, 192.168.0.191, 42255) 17/04/08 21:21:57 INFO BlockManagerMaster: Registered BlockManager 17/04/08 21:21:58 INFO AppClient$ClientEndpoint: Executor updated: app-20170408062155-0015/0 is now RUNNING 17/04/08 21:21:58 INFO AppClient$ClientEndpoint: Executor updated: app-20170408062155-0015/1 is now RUNNING 17/04/08 21:21:58 INFO AppClient$ClientEndpoint: Executor updated: app-20170408062155-0015/2 is now RUNNING 17/04/08 21:21:58 INFO AppClient$ClientEndpoint: Executor updated: app-20170408062155-0015/3 is now RUNNING 17/04/08 21:22:00 INFO AppClient$ClientEndpoint: Executor updated: app-20170408062155-0015/4 is now RUNNING 17/04/08 21:22:01 INFO AppClient$ClientEndpoint: Executor updated: app-20170408062155-0015/5 is now RUNNING 17/04/08 21:22:01 INFO AppClient$ClientEndpoint: Executor updated: app-20170408062155-0015/6 is now RUNNING 17/04/08 21:22:01 INFO AppClient$ClientEndpoint: Executor updated: app-20170408062155-0015/7 is now RUNNING 17/04/08 21:22:03 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0 17/04/08 21:22:05 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 107.7 KB, free 107.7 KB) 17/04/08 21:22:06 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 9.8 KB, free 117.5 KB) 17/04/08 21:22:06 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.0.191:42255 (size: 9.8 KB, free: 431.2 MB) 17/04/08 21:22:06 INFO SparkContext: Created broadcast 0 from textFile at Test01WordCount.scala:30 17/04/08 21:22:21 INFO FileInputFormat: Total input paths to process : 1 17/04/08 21:22:21 INFO SparkContext: Starting job: foreach at Test01WordCount.scala:33 17/04/08 21:22:21 INFO DAGScheduler: Got job 0 (foreach at Test01WordCount.scala:33) with 2 output partitions 17/04/08 21:22:21 INFO DAGScheduler: Final stage: ResultStage 0 (foreach at Test01WordCount.scala:33) 17/04/08 21:22:21 INFO DAGScheduler: Parents of final stage: List() 17/04/08 21:22:21 INFO DAGScheduler: Missing parents: List() 17/04/08 21:22:21 INFO DAGScheduler: Submitting ResultStage 0 (hdfs://master/970655147/input/01WordCount/ MapPartitionsRDD[1] at textFile at Test01WordCount.scala:30), which has no missing parents 17/04/08 21:22:21 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 3.0 KB, free 120.5 KB) 17/04/08 21:22:21 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 1842.0 B, free 122.3 KB) 17/04/08 21:22:21 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 192.168.0.191:42255 (size: 1842.0 B, free: 431.2 MB) 17/04/08 21:22:21 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1006 17/04/08 21:22:21 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (hdfs://master/970655147/input/01WordCount/ MapPartitionsRDD[1] at textFile at Test01WordCount.scala:30) 17/04/08 21:22:21 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks 17/04/08 21:22:36 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 17/04/08 21:23:04 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 17/04/08 21:23:06 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 17/04/08 21:23:21 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 17/04/08 21:23:36 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 17/04/08 21:23:51 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 17/04/08 21:24:02 INFO AppClient$ClientEndpoint: Executor updated: app-20170408062155-0015/1 is now EXITED (Command exited with code 1) 17/04/08 21:24:02 INFO SparkDeploySchedulerBackend: Executor app-20170408062155-0015/1 removed: Command exited with code 1 17/04/08 21:24:06 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 17/04/08 21:24:21 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 17/04/08 21:24:36 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 17/04/08 21:24:51 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 17/04/08 21:25:06 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 17/04/08 21:25:21 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 17/04/08 21:25:36 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 17/04/08 21:25:51 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 17/04/08 21:26:02 WARN NettyRpcEndpointRef: Error sending message [message = RemoveExecutor(1,Command exited with code 1)] in 1 attempts org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [120 seconds]. This timeout is controlled by spark.rpc.askTimeout at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48) at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:63) at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33) at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76) at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101) at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:77) at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:370) at org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend.executorRemoved(SparkDeploySchedulerBackend.scala:144) at org.apache.spark.deploy.client.AppClient$ClientEndpoint$$anonfun$receive$1.applyOrElse(AppClient.scala:184) at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:116) at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204) at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100) at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.util.concurrent.TimeoutException: Futures timed out after [120 seconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:107) at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75) ... 12 more

一些参考链接如下：

http://apache-spark-user-list.1001560.n3.nabble.com/Submitting-Spark-job-on-Unix-cluster-from-dev-environment-Windows-td16989.html

https://issues.streamsets.com/browse/SDC-4249

https://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/Unable-to-create-SparkContext-to-Spark-1-3-Standalone-service-in/td-p/29176
我以前搜索过这些帖子，但是没有解决这个问题，我的集群中有一些关于测试的日志。或者我想错了，请帮我检查一下。< /P>
ufw状态和ssh连接

root@master:/usr/local/ProgramFiles# ufw status Status: inactive root@master:/usr/local/ProgramFiles# ssh slave01 Welcome to Ubuntu 16.04.2 LTS (GNU/Linux 4.4.0-62-generic x86_64) * Documentation: https://help.ubuntu.com * Management: https://landscape.canonical.com * Support: https://ubuntu.com/advantage Last login: Sat Apr 8 21:33:44 2017 from 192.168.0.119 root@slave01:~# ufw status Status: inactive root@slave01:~# ssh slave02 Welcome to Ubuntu 16.04.2 LTS (GNU/Linux 4.4.0-62-generic x86_64) * Documentation: https://help.ubuntu.com * Management: https://landscape.canonical.com * Support: https://ubuntu.com/advantage Last login: Sat Apr 8 21:10:33 2017 from 192.168.0.119 root@slave02:~# ufw status Status: inactive root@slave02:~#

通过ip或FQDN的网络连接

2.1. nc in master root@master:/usr/local/ProgramFiles# netcat -l 12306 root@master:/usr/local/ProgramFiles# nc -l 12306 root@master:/usr/local/ProgramFiles# nc -l 12306 root@master:/usr/local/ProgramFiles# nc -l 12306 2.2. nc in slave01 root@slave01:~# nc -vz 192.168.0.180 12306 Connection to 192.168.0.180 12306 port [tcp/*] succeeded! root@slave01:~# nc -vz master 12306 Connection to master 12306 port [tcp/*] succeeded! 2.3. nc in slave02 root@slave02:/usr/local/ProgramFiles# nc -vz 192.168.0.180 12306 Connection to 192.168.0.180 12306 port [tcp/*] succeeded! root@slave02:/usr/local/ProgramFiles# nc -vz master 12306 Connection to master 12306 port [tcp/*] succeeded! root@slave02:/usr/local/ProgramFiles#

重新安装其他版本的Spark:

我重新安装了Spark的其他版本，但同样的问题仍然存在，并且环境中可能存在一些问题

请给我一些建议，谢谢今天，我想改变java、scala的环境，然后我找到一篇帖子，上面说他用jdk1.7.0_80和scala2.11.8构建了Spark 然后我下载jdk1.7.0_40和scala2.11.8，然后安装在我的集群上[master、slave01、slave02]
更新环境变量hadoop的hadoop-env.sh，spark的spark.env-sh，然后关闭spark，hadoop，启动hadoop，spark 然后我使用“/bin/sparkshell--masterspark://master:7077 --执行器存储器64M“，运行火花外壳
然后spark shell没有正常运行，但是日志似乎与以前不同，然后我查找执行器日志
我知道

Exception in thread "main" java.lang.IllegalArgumentException: System memory 64880640 must be at least 4.718592E8. Please use a larger heap size. at org.apache.spark.memory.UnifiedMemoryManager$.getMaxMemory(UnifiedMemoryManager.scala:198) at org.apache.spark.memory.UnifiedMemoryManager$.apply(UnifiedMemoryManager.scala:180) at org.apache.spark.SparkEnv$.create(SparkEnv.scala:354) at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:217) at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:186) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:69) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:68) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:68) at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:151) at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:253) at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
然后我得到了“GrossGrainedExecutorBackend.scala:186”，根据错误日志信息，它似乎没有在“UserGroupInformation.doAs（UserGroupInformation.java:1643）”处被阻止我改了--“执行者记忆”，指定为“512M”
然后重新键入命令sparkshell 此时，我成功地进入，然后我尝试在集群上运行“WordCount”
首先更新“.set”（“spark.executor.memory”，“64M”）” 然后构建jar，并将其放入集群，使其正常运行

明白了，看来这个问题已经解决了
然后我想找出这个问题发生的原因，在jdk或scala中
然后我测试更新环境变量spark的spark-env.sh，hadoop的hadoop-env.sh，介于'jdk1.8.0_121&scala2.10.4'和'jdk1.7.0_40&scala 2.11.8'之间，但此时我发现这两个环境对于spark shell和'WordCount'来说都是可以的
即使我删除了“jdk1.7.0(40&scala2.11.8）”，并且[当我遇到这个问题时]所有配置都恢复到之前的状态，但仍然可以
哦，天哪，这是个神秘的问题。。。即使我没有找到这个问题的原始原因，但我仍然很满意，至少我现在集群还可以

谢谢，@Kaushal
警告TaskSchedulerImpl:初始作业未接受任何资源；检查您的群集UI以确保工作人员已注册并且有足够的资源
意味着您的群集没有启动应用程序的可用资源。打开群集用户界面，检查是否有其他应用程序正在运行并使用群集的全部资源。@Kaushal感谢您的回复，在我的群集中，只有一个应用程序[我只是在测试]，这是最明显的原因，更深层次的原因是以下步骤造成的，1。主计划为驱动程序分配执行器，2。驱动程序的工人启动执行器，3。驱动程序的执行器寄存器[在此阶段，执行器在'GrossGrainedExecutorBackend$.main'被阻止]，4。驱动程序计划执行器执行驱动程序程序，依此类推..您确定，您的hdfs Url
“hdfs://master/970655147/input/01WordCount/“
是否正确，或者您需要指定hdfs端口？恩，是的，我使用默认端口，在本地运行，可以在Spark程序中检测到，并且可以检测到hadoop程序too@Kaushal嗨，谢谢，这个问题很难解决，请看下面的评论