Apache spark SPARK&x2B;独立群集:无法从另一台计算机启动辅助程序

Apache spark SPARK&x2B;独立群集:无法从另一台计算机启动辅助程序,apache-spark,Apache Spark,我一直在设置Spark独立群集设置。我有两台机器;第一个(ubuntu0)既是主控者又是工作者,第二个(ubuntu1)只是工作者。已经为这两台机器正确配置了无密码ssh,并通过在两侧手动执行ssh进行了测试 现在,当我尝试./start-all.ssh时,主计算机(ubuntu0)上的主计算机和辅助计算机都已正确启动。这通过(1)可访问的WebUI(我的本地主机:8081)和(2)在WebUI上注册/显示的工作人员来表示。 但是,第二台机器(ubuntu1)上的另一个工作进程未启动。显示的错误

我一直在设置Spark独立群集设置。我有两台机器;第一个(ubuntu0)既是主控者又是工作者,第二个(ubuntu1)只是工作者。已经为这两台机器正确配置了无密码ssh,并通过在两侧手动执行ssh进行了测试

现在,当我尝试./start-all.ssh时,主计算机(ubuntu0)上的主计算机和辅助计算机都已正确启动。这通过(1)可访问的WebUI(我的本地主机:8081)和(2)在WebUI上注册/显示的工作人员来表示。 但是,第二台机器(ubuntu1)上的另一个工作进程未启动。显示的错误为:

ubuntu1: ssh: connect to host ubuntu1 port 22: Connection timed out
现在这已经很奇怪了,因为我已经正确地将ssh配置为两边都没有密码。鉴于此,我访问了第二台机器,并尝试使用以下命令手动启动工人:

./spark-class org.apache.spark.deploy.worker.Worker spark://ubuntu0:7707
./spark-class org.apache.spark.deploy.worker.Worker spark://<ip>:7707
以下是my master and slave\worker spark-env.ssh的内容:

SPARK_MASTER_IP=192.168.3.222
STANDALONE_SPARK_MASTER_HOST=`hostname -f`

我应该如何解决这个问题?提前谢谢

对于那些在不同机器上启动工人时仍遇到错误的人,我只想分享一下,在conf/slaves中使用IP地址对我来说是有效的。
希望这有帮助

在/cong/slaves中使用主机名对我来说效果很好。 以下是我将采取的一些步骤

  • 已检查SSH公钥
  • scp/etc/spark/conf.dist/spark-env.sh发送给您的员工
我在spark-env.sh中的设置部分

导出独立\u SPARK\u MASTER\u主机=
hostname

导出SPARK\u MASTER\u IP=$STANDALONE\u SPARK\u MASTER\u主机


我猜你在配置中遗漏了一些东西,这是我从你的日志中了解到的

  • 检查您的
    /etc/hosts
    ,确保主设备的主机列表中的
    ubuntu1
    ,并且其Ip与从设备的Ip地址匹配
  • 在从属服务器的
    SPARK env.sh
    文件中添加
    export SPARK\u LOCAL\u IP='ubuntu1'

  • 今天我在RHEL6.7上运行spark 1.5.1时添加了类似的问题。 我有两台机器,它们的主机名是 -master.domain.com -slave.domain.com

    我安装了spark的独立版本(针对hadoop 2.6的预构建)并安装了Oracle jdk-8u66

    Spark下载:

    wget http://d3kbcqa49mib13.cloudfront.net/spark-1.5.1-bin-hadoop2.6.tgz
    
    Java下载

    wget --no-cookies --no-check-certificate --header "Cookie: gpw_e24=http%3A%2F%2Fwww.oracle.com%2F; oraclelicense=accept-securebackup-cookie" "http://download.oracle.com/otn-pub/java/jdk/8u66-b17/jdk-8u66-linux-x64.tar.gz"
    
    spark和java在我的主目录中解包后,我执行了以下操作:

    在“master.domain.com”上,我运行了:

    /sbin/start master.sh

    webUI在(无从属运行)时可用

    在“slave.domain.com”上,我尝试过:
    /sbin/start-slave.shspark://master.domain.com:7077
    失败,如下所示

    Spark Command: /root/java/bin/java -cp /root/spark-1.5.1-bin-hadoop2.6/sbin/../conf/:/root/spark-1.5.1-bin-hadoop2.6/lib/spark-assembly-1.5.1-hadoop2.6.0.jar:/root/spark-1.5.1-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar:/root/spark-1.5.1-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar:/root/spark-1.5.1-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar -Xms1g -Xmx1g org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://master.domain.com:7077
    ========================================
    Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
    15/11/06 11:03:51 INFO Worker: Registered signal handlers for [TERM, HUP, INT]
    15/11/06 11:03:51 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    15/11/06 11:03:51 INFO SecurityManager: Changing view acls to: root
    15/11/06 11:03:51 INFO SecurityManager: Changing modify acls to: root
    15/11/06 11:03:51 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
    15/11/06 11:03:52 INFO Slf4jLogger: Slf4jLogger started
    15/11/06 11:03:52 INFO Remoting: Starting remoting
    15/11/06 11:03:52 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkWorker@10.80.70.38:50573]
    15/11/06 11:03:52 INFO Utils: Successfully started service 'sparkWorker' on port 50573.
    15/11/06 11:03:52 INFO Worker: Starting Spark worker 10.80.70.38:50573 with 8 cores, 6.7 GB RAM
    15/11/06 11:03:52 INFO Worker: Running Spark version 1.5.1
    15/11/06 11:03:52 INFO Worker: Spark home: /root/spark-1.5.1-bin-hadoop2.6
    15/11/06 11:03:53 INFO Utils: Successfully started service 'WorkerUI' on port 8081.
    15/11/06 11:03:53 INFO WorkerWebUI: Started WorkerWebUI at http://10.80.70.38:8081
    15/11/06 11:03:53 INFO Worker: Connecting to master master.domain.com:7077...
    15/11/06 11:04:05 INFO Worker: Retrying connection to master (attempt # 1)
    15/11/06 11:04:05 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[sparkWorker-akka.actor.default-dispatcher-4,5,main]
    java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.FutureTask@48711bf5 rejected from java.util.concurrent.ThreadPoolExecutor@14db705b[Running, pool size = 1, active threads = 0, queued tasks = 0, completed tasks = 1]
        at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047)
        at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823)
        at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369)
        at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:112)
        at org.apache.spark.deploy.worker.Worker$$anonfun$org$apache$spark$deploy$worker$Worker$$tryRegisterAllMasters$1.apply(Worker.scala:211)
        at org.apache.spark.deploy.worker.Worker$$anonfun$org$apache$spark$deploy$worker$Worker$$tryRegisterAllMasters$1.apply(Worker.scala:210)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
        at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
        at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
        at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
        at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
        at org.apache.spark.deploy.worker.Worker.org$apache$spark$deploy$worker$Worker$$tryRegisterAllMasters(Worker.scala:210)
        at org.apache.spark.deploy.worker.Worker$$anonfun$org$apache$spark$deploy$worker$Worker$$reregisterWithMaster$1.apply$mcV$sp(Worker.scala:288)
        at org.apache.spark.util.Utils$.tryOrExit(Utils.scala:1119)
        at org.apache.spark.deploy.worker.Worker.org$apache$spark$deploy$worker$Worker$$reregisterWithMaster(Worker.scala:234)
        at org.apache.spark.deploy.worker.Worker$$anonfun$receive$1.applyOrElse(Worker.scala:521)
        at org.apache.spark.rpc.akka.AkkaRpcEnv.org$apache$spark$rpc$akka$AkkaRpcEnv$$processMessage(AkkaRpcEnv.scala:177)
        at org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1$$anonfun$receiveWithLogging$1$$anonfun$applyOrElse$4.apply$mcV$sp(AkkaRpcEnv.scala:126)
        at org.apache.spark.rpc.akka.AkkaRpcEnv.org$apache$spark$rpc$akka$AkkaRpcEnv$$safelyCall(AkkaRpcEnv.scala:197)
        at org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1$$anonfun$receiveWithLogging$1.applyOrElse(AkkaRpcEnv.scala:125)
        at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
        at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
        at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
        at org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:59)
        at org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:42)
        at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
        at org.apache.spark.util.ActorLogReceive$$anon$1.applyOrElse(ActorLogReceive.scala:42)
        at akka.actor.Actor$class.aroundReceive(Actor.scala:467)
        at org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1.aroundReceive(AkkaRpcEnv.scala:92)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
        at akka.actor.ActorCell.invoke(ActorCell.scala:487)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
        at akka.dispatch.Mailbox.run(Mailbox.scala:220)
        at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
    15/11/06 11:04:05 INFO ShutdownHookManager: Shutdown hook called
    
    启动从属火花://:7077
    也如上所述失败

    启动从机spark://master:7077
    工作,工作人员显示在主Web UI中

    Spark Command: /root/java/bin/java -cp /root/spark-1.5.1-bin-hadoop2.6/sbin/../conf/:/root/spark-1.5.1-bin-hadoop2.6/lib/spark-assembly-1.5.1-hadoop2.6.0.jar:/root/spark-1.5.1-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar:/root/spark-1.5.1-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar:/root/spark-1.5.1-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar -Xms1g -Xmx1g org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://master:7077
    ========================================
    Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
    15/11/06 11:08:15 INFO Worker: Registered signal handlers for [TERM, HUP, INT]
    15/11/06 11:08:15 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    15/11/06 11:08:15 INFO SecurityManager: Changing view acls to: root
    15/11/06 11:08:15 INFO SecurityManager: Changing modify acls to: root
    15/11/06 11:08:15 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
    15/11/06 11:08:16 INFO Slf4jLogger: Slf4jLogger started
    15/11/06 11:08:16 INFO Remoting: Starting remoting
    15/11/06 11:08:17 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkWorker@10.80.70.38:40780]
    15/11/06 11:08:17 INFO Utils: Successfully started service 'sparkWorker' on port 40780.
    15/11/06 11:08:17 INFO Worker: Starting Spark worker 10.80.70.38:40780 with 8 cores, 6.7 GB RAM
    15/11/06 11:08:17 INFO Worker: Running Spark version 1.5.1
    15/11/06 11:08:17 INFO Worker: Spark home: /root/spark-1.5.1-bin-hadoop2.6
    15/11/06 11:08:17 INFO Utils: Successfully started service 'WorkerUI' on port 8081.
    15/11/06 11:08:17 INFO WorkerWebUI: Started WorkerWebUI at http://10.80.70.38:8081
    15/11/06 11:08:17 INFO Worker: Connecting to master master:7077...
    15/11/06 11:08:17 INFO Worker: Successfully registered with master spark://master:7077
    
    注意:我没有在conf/spark-env.sh中添加任何额外的配置

    注2:当查看master webUI时,顶部的spark master URL实际上是对我有用的URL,所以我怀疑你可以使用它


    我希望这有帮助;)

    我刚刚遇到了同样的问题,可以确认您确实必须使用web UI中显示的主URL启动工作程序(即使用与主URL相同的
    $SPARK\u master\u IP
    值)。
    Spark Command: /root/java/bin/java -cp /root/spark-1.5.1-bin-hadoop2.6/sbin/../conf/:/root/spark-1.5.1-bin-hadoop2.6/lib/spark-assembly-1.5.1-hadoop2.6.0.jar:/root/spark-1.5.1-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar:/root/spark-1.5.1-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar:/root/spark-1.5.1-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar -Xms1g -Xmx1g org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://master:7077
    ========================================
    Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
    15/11/06 11:08:15 INFO Worker: Registered signal handlers for [TERM, HUP, INT]
    15/11/06 11:08:15 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    15/11/06 11:08:15 INFO SecurityManager: Changing view acls to: root
    15/11/06 11:08:15 INFO SecurityManager: Changing modify acls to: root
    15/11/06 11:08:15 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
    15/11/06 11:08:16 INFO Slf4jLogger: Slf4jLogger started
    15/11/06 11:08:16 INFO Remoting: Starting remoting
    15/11/06 11:08:17 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkWorker@10.80.70.38:40780]
    15/11/06 11:08:17 INFO Utils: Successfully started service 'sparkWorker' on port 40780.
    15/11/06 11:08:17 INFO Worker: Starting Spark worker 10.80.70.38:40780 with 8 cores, 6.7 GB RAM
    15/11/06 11:08:17 INFO Worker: Running Spark version 1.5.1
    15/11/06 11:08:17 INFO Worker: Spark home: /root/spark-1.5.1-bin-hadoop2.6
    15/11/06 11:08:17 INFO Utils: Successfully started service 'WorkerUI' on port 8081.
    15/11/06 11:08:17 INFO WorkerWebUI: Started WorkerWebUI at http://10.80.70.38:8081
    15/11/06 11:08:17 INFO Worker: Connecting to master master:7077...
    15/11/06 11:08:17 INFO Worker: Successfully registered with master spark://master:7077