Apache spark 在结构化流中找不到连续触发器_Apache Spark_Spark Structured Streaming

Apache spark 在结构化流中找不到连续触发器

apache-spark

Apache spark 在结构化流中找不到连续触发器,apache-spark,spark-structured-streaming,Apache Spark,Spark Structured Streaming,运行时间：Spark 2.3.0、Scala 2.11（Databricks 4.1 ML beta版） import org.apache.spark.sql.streaming.Trigger 导入scala.concurrent.duration_ //卡夫卡设置和df定义在这里 val query=df.writeStream.format（“拼花地板”） .选项（“路径”，…） .选项（“检查点位置”，…） .触发器（连续（30000）） .outputMode（outputMode.

运行时间：Spark 2.3.0、Scala 2.11（Databricks 4.1 ML beta版）

import org.apache.spark.sql.streaming.Trigger
导入scala.concurrent.duration_
//卡夫卡设置和df定义在这里
val query=df.writeStream.format（“拼花地板”）
.选项（“路径”，…）
.选项（“检查点位置”，…）
.触发器（连续（30000））
.outputMode（outputMode.Append）
开始

未找到抛出错误：值连续

其他不起作用的尝试：

.trigger（continuous=“30秒”）//根据Databricks博客
//抛出与上面相同的错误
.trigger（trigger.Continuous（“1秒”）//根据Spark文档
//抛出java.lang.IllegalStateException:触发器的未知类型：ContinuousTrigger（1000）

参考资料：

（Databricks博客）

（火花导管）

（Scaladoc）

Spark 2.3.0不支持连续流下的拼花地板，您必须使用基于、控制台或内存的流

引用博文：

您可以在满足以下条件的查询中设置可选的连续触发器：从支持的源（如Kafka）读取，并写入支持的接收器（如Kafka、内存、控制台）

尝试使用

trigger（trigger.ProcessingTime（“1秒”））

这将起作用，因为我遇到了相同的问题，并使用此方法解决了它。

与下面的spark代码一样，只有实现

StreamWriteSupport

接口的接收器才能使用

ContinuousTrigger

（接收器、触发器）匹配{
案例（v2Sink:StreamWriteSupport，trigger:ContinuousTrigger）=>
不支持运行检查器。检查是否连续（分析计划、输出模式）
新的StreamingQueryRapper（新的ContinuousExecution(
sparkSession，
userSpecifiedName.orNull，
检查点位置，
分析计划，
v2Sink，
触发
触发时钟，
输出模式，
额外选择，
删除检查点（顶部）
案例=>
新StreamingQueryRapper（新的微批次执行）(
sparkSession，
userSpecifiedName.orNull，
检查点位置，
分析计划，
下沉，
触发
触发时钟，
输出模式，
额外选择，
删除检查点（顶部）

只有三个接收器实现了这个接口，

ConsoleSinkProvider

，

KafkaSourceProvider

，

MemorySinkV2

在Spark 3.0.1中，连续处理模式是实验性的，并且支持依赖于源和接收器的特殊查询类型

根据有关以下查询的文档，似乎不支持编写拼花地板：

从Spark 2.4开始，在连续处理模式下仅支持以下类型的查询

Operations: Only map-like Dataset/DataFrame operations are supported in continuous mode, that is, only projections (select, map, flatMap, mapPartitions, etc.) and selections (where, filter, etc.).
   All SQL functions are supported except aggregation functions (since aggregations are not yet supported), current_timestamp() and current_date() (deterministic computations using time is challenging).
Sources:
   Kafka source: All options are supported.
   Rate source: Good for testing. Only options that are supported in the continuous mode are numPartitions and rowsPerSecond.
Sinks:
   Kafka sink: All options are supported.
   Memory sink: Good for debugging.
   Console sink: Good for debugging. All options are supported. Note that the console will print every checkpoint interval that you have specified in the continuous trigger.