Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Scala 无法覆盖默认值";spark.sql.shuffle.partitions“;使用Spark结构化流媒体_Scala_Apache Spark_Spark Structured Streaming - Fatal编程技术网

Scala 无法覆盖默认值";spark.sql.shuffle.partitions“;使用Spark结构化流媒体

Scala 无法覆盖默认值";spark.sql.shuffle.partitions“;使用Spark结构化流媒体,scala,apache-spark,spark-structured-streaming,Scala,Apache Spark,Spark Structured Streaming,我想直接在代码中覆盖spark.sql.shuffle.partitions参数: val sparkSession = SparkSession .builder() .appName("SPARK") .getOrCreate() sparkSession.conf.set("spark.sql.shuffle.partitions", 2) 但此设置不会生效,因为我在日志中收到以下警告消息: WARN OffsetSeqMetada

我想直接在代码中覆盖
spark.sql.shuffle.partitions
参数:

val sparkSession = SparkSession
  .builder()
  .appName("SPARK")
  .getOrCreate()

sparkSession.conf.set("spark.sql.shuffle.partitions", 2)
但此设置不会生效,因为我在日志中收到以下警告消息:

WARN  OffsetSeqMetadata:66 - Updating the value of conf 'spark.sql.shuffle.partitions' in current session from '2' to '200'.
当在
spark submit
shell中传递相同的参数时:

#!/bin/bash

/app/spark-2/bin/spark-submit \
--queue root.dev \
--master yarn \
--deploy-mode cluster \
--driver-memory 5G \
--executor-memory 4G \
--executor-cores 2 \
--num-executors 4 \
--conf spark.app.name=SPARK \
--conf spark.executor.memoryOverhead=2048 \
--conf spark.yarn.maxAppAttempts=1 \
--conf spark.sql.shuffle.partitions=2 \
--class com.dev.MainClass

有什么想法吗?

在Spark结构化流媒体作业的检查点文件中,存储了一些
sparkSession
配置

例如,在文件夹“offset”中,最新批次的内容可能如下所示:

v1
{"batchWatermarkMs":0,"batchTimestampMs":1619782960476,"conf":{"spark.sql.streaming.stateStore.providerClass":"org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider","spark.sql.streaming.join.stateFormatVersion":"2","spark.sql.streaming.stateStore.compression.codec":"lz4","spark.sql.streaming.flatMapGroupsWithState.stateFormatVersion":"2","spark.sql.streaming.multipleWatermarkPolicy":"min","spark.sql.streaming.aggregation.stateFormatVersion":"2","spark.sql.shuffle.partitions":"200"}}
4

除其他外,它存储配置
spark.sql.shuffle.partitions
的值,在我的示例中,该值设置为默认值200

在中,您将看到,如果该配置值在检查点文件的元数据中可用,则会替换该配置值


如果确实需要更改分区,请删除所有检查点文件,或者在最后一个检查点文件中将值手动更改为2。

感谢您的解释。如果要手动更改该值,是否只需要在“offset”文件夹中包含的最新文件中对其进行修改?