独立群集上的python应用程序当前不支持群集部署模式
我正在尝试在我的独立群集上的python应用程序当前不支持群集部署模式,python,apache-spark,Python,Apache Spark,我正在尝试在我的Spark集群上运行示例Python程序。集群由一个主节点和两个辅助节点组成。然而,当我尝试运行示例代码时,它发誓: $ spark-submit --master spark://sparkmaster:7077 --deploy-mode cluster test01.py Exception in thread "main" org.apache.spark.SparkException: Cluster deploy mode is currently
Spark
集群上运行示例Python程序。集群由一个主节点和两个辅助节点组成。然而,当我尝试运行示例代码时,它发誓:
$ spark-submit --master spark://sparkmaster:7077 --deploy-mode cluster test01.py
Exception in thread "main" org.apache.spark.SparkException: Cluster deploy mode is currently not supported for python applications on standalone clusters.
这是什么意思?我的集群是独立的吗?即使它由3台计算机组成,它仍然是独立的吗?如何使python程序在集群模式下非独立运行
如果我这么做的话
spark-submit test01.py
它因错误而崩溃
21/03/30 11:07:27 WARN Utils: Service 'sparkDriver' could not bind on a random free port. You may check whether configuring an appropriate binding address.
21/03/30 11:07:27 WARN Utils: Service 'sparkDriver' could not bind on a random free port. You may check whether configuring an appropriate binding address.
21/03/30 11:07:27 ERROR SparkContext: Error initializing SparkContext.
java.net.BindException: Cannot assign requested address: Service 'sparkDriver' failed after 16 retries (on a random free port)! Consider explicitly setting the appropriate binding address for the service 'sparkDriver' (for example spark.driver.bindAddress for SparkDriver) to the correct binding address.
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:461)
at sun.nio.ch.Net.bind(Net.java:453)
at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:222)
at io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:134)
at io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:550)
at io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1334)
at io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:506)
at io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:491)
at io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:973)
at io.netty.channel.AbstractChannel.bind(AbstractChannel.java:248)
at io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:356)
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
我用以下方式编写了
test01.py
from pyspark.sql import SparkSession
logFile = "README.md" # Should be some file on your system
spark = SparkSession.builder\
.appName("SimpleApp")\
.config("spark.driver.bindAddress", "127.0.0.1")\
.getOrCreate()
logData = spark.read.text(logFile).cache()
numAs = logData.filter(logData.value.contains('a')).count()
numBs = logData.filter(logData.value.contains('b')).count()
print("Lines with a: %i, lines with b: %i" % (numAs, numBs))
spark.stop()
它成功了。不幸的是,spark母版页上没有这项工作的痕迹
它是分布式的吗?嘿,你的配置没有问题。如错误所示,这只是Apache Spark的一个限制 spark运行它需要资源。在独立模式下,您启动workers,spark master和持久化层可以是任意的—HDFS、文件系统、cassandra等。在Thread模式下,您要求Thread Hadoop cluster管理资源分配和簿记 当您使用master作为本地[2]时,您请求Spark使用2个核心,并在同一JVM中运行驱动程序和工作程序。在本地模式下,所有与spark作业相关的任务都在同一JVM中运行
因此,在单机版中,您定义了worker和spark master在您的机器中运行的“容器”(因此您可以有两个worker,并且您的任务可以分布在这两个worker的JVM中?)(在本地模式下,您只是在本地机器的同一JVM中运行所有内容)。那么,如何运行Python作业呢?支持Python吗?如果你在云中,你需要抛出一个CI/CD工具,如Jenkins、azure devops或cloud function,如果你感兴趣,请参考此链接,