Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/304.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
独立群集上的python应用程序当前不支持群集部署模式_Python_Apache Spark - Fatal编程技术网

独立群集上的python应用程序当前不支持群集部署模式

独立群集上的python应用程序当前不支持群集部署模式,python,apache-spark,Python,Apache Spark,我正在尝试在我的Spark集群上运行示例Python程序。集群由一个主节点和两个辅助节点组成。然而,当我尝试运行示例代码时,它发誓: $ spark-submit --master spark://sparkmaster:7077 --deploy-mode cluster test01.py Exception in thread "main" org.apache.spark.SparkException: Cluster deploy mode is currently

我正在尝试在我的
Spark
集群上运行示例Python程序。集群由一个主节点和两个辅助节点组成。然而,当我尝试运行示例代码时,它发誓:

$ spark-submit --master spark://sparkmaster:7077 --deploy-mode cluster test01.py
Exception in thread "main" org.apache.spark.SparkException: Cluster deploy mode is currently not supported for python applications on standalone clusters.
这是什么意思?我的集群是独立的吗?即使它由3台计算机组成,它仍然是独立的吗?如何使python程序在集群模式下非独立运行


如果我这么做的话

spark-submit test01.py
它因错误而崩溃

21/03/30 11:07:27 WARN Utils: Service 'sparkDriver' could not bind on a random free port. You may check whether configuring an appropriate binding address.
21/03/30 11:07:27 WARN Utils: Service 'sparkDriver' could not bind on a random free port. You may check whether configuring an appropriate binding address.
21/03/30 11:07:27 ERROR SparkContext: Error initializing SparkContext.
java.net.BindException: Cannot assign requested address: Service 'sparkDriver' failed after 16 retries (on a random free port)! Consider explicitly setting the appropriate binding address for the service 'sparkDriver' (for example spark.driver.bindAddress for SparkDriver) to the correct binding address.
        at sun.nio.ch.Net.bind0(Native Method)
        at sun.nio.ch.Net.bind(Net.java:461)
        at sun.nio.ch.Net.bind(Net.java:453)
        at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:222)
        at io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:134)
        at io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:550)
        at io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1334)
        at io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:506)
        at io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:491)
        at io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:973)
        at io.netty.channel.AbstractChannel.bind(AbstractChannel.java:248)
        at io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:356)
        at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
        at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500)
        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.lang.Thread.run(Thread.java:748)

我用以下方式编写了
test01.py

from pyspark.sql import SparkSession

logFile = "README.md"  # Should be some file on your system
spark = SparkSession.builder\
    .appName("SimpleApp")\
    .config("spark.driver.bindAddress", "127.0.0.1")\
    .getOrCreate()

logData = spark.read.text(logFile).cache()

numAs = logData.filter(logData.value.contains('a')).count()
numBs = logData.filter(logData.value.contains('b')).count()

print("Lines with a: %i, lines with b: %i" % (numAs, numBs))

spark.stop()
它成功了。不幸的是,spark母版页上没有这项工作的痕迹


它是分布式的吗?

嘿,你的配置没有问题。如错误所示,这只是Apache Spark的一个限制

spark运行它需要资源。在独立模式下,您启动workers,spark master和持久化层可以是任意的—HDFS、文件系统、cassandra等。在Thread模式下,您要求Thread Hadoop cluster管理资源分配和簿记

当您使用master作为本地[2]时,您请求Spark使用2个核心,并在同一JVM中运行驱动程序和工作程序。在本地模式下,所有与spark作业相关的任务都在同一JVM中运行


因此,在单机版中,您定义了worker和spark master在您的机器中运行的“容器”(因此您可以有两个worker,并且您的任务可以分布在这两个worker的JVM中?)(在本地模式下,您只是在本地机器的同一JVM中运行所有内容)。

那么,如何运行Python作业呢?支持Python吗?如果你在云中,你需要抛出一个CI/CD工具,如Jenkins、azure devops或cloud function,如果你感兴趣,请参考此链接,