Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark Spark Structured Streaming给我的错误是org.apache.Spark.sql.AnalysisException:';foreachBatch&x27;不支持分区;_Apache Spark_Databricks_Spark Structured Streaming_Azure Databricks - Fatal编程技术网

Apache spark Spark Structured Streaming给我的错误是org.apache.Spark.sql.AnalysisException:';foreachBatch&x27;不支持分区;

Apache spark Spark Structured Streaming给我的错误是org.apache.Spark.sql.AnalysisException:';foreachBatch&x27;不支持分区;,apache-spark,databricks,spark-structured-streaming,azure-databricks,Apache Spark,Databricks,Spark Structured Streaming,Azure Databricks,我在Databricks中设计了以下结构化流式代码,以写入Azure Data Lake: def upsertToDelta(microBatchOutputDF: DataFrame, batchId: Long) { microBatchOutputDF.createOrReplaceTempView("updates") microBatchOutputDF.sparkSession.sql(s""" MERGE INTO silver as r USING (

我在Databricks中设计了以下结构化流式代码,以写入Azure Data Lake:

def upsertToDelta(microBatchOutputDF: DataFrame, batchId: Long) {


  microBatchOutputDF.createOrReplaceTempView("updates")


  microBatchOutputDF.sparkSession.sql(s"""
   MERGE INTO silver as r
USING 
(
SELECT smtUidNr, dcl, inv, evt, smt, msgTs,msgInfSrcCd
FROM (
  SELECT smtUidNr, msgTs
  , RANK() OVER (PARTITION BY smtUidNr ORDER BY msgTs DESC) as rank
  , ROW_NUMBER() OVER (PARTITION BY smtUidNr ORDER BY msgTs DESC) as row_num
  FROM updates
  )
WHERE rank = 1 AND row_num = 1
)
as u
ON u.smtUidNr = r.smtUidNr 
WHEN MATCHED and u.msgTs > r.msgTs THEN
  UPDATE SET *
WHEN NOT MATCHED THEN
  INSERT *
  """)
}

splitDF.writeStream.format("delta").foreachBatch(upsertToDelta _).outputMode("append").partitionBy("year","month","day").option("checkpointLocation", "abfss://checkpoint@mcfdatalake.dfs.core.windows.net/kjd/test/").start("abfss://dump@mcfdatalake.dfs.core.windows.net/main_data/")
当我尝试执行此操作时,会出现如下错误:

org.apache.spark.sql.AnalysisException: 'foreachBatch' does not support partitioning;
使用foreachBatch进行分区的替代方法是什么

使用foreachBatch进行分区的替代方法是什么

foreachBatch
中使用分区


您还可以将批写入增量表,并在增量表上运行单独的查询以将其与另一个表合并。

如果您不介意,可以共享相同的代码吗?