Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark 在Databricks中发布流式增量表_Apache Spark_Pyspark_Databricks_Spark Structured Streaming_Delta Lake - Fatal编程技术网

Apache spark 在Databricks中发布流式增量表

Apache spark 在Databricks中发布流式增量表,apache-spark,pyspark,databricks,spark-structured-streaming,delta-lake,Apache Spark,Pyspark,Databricks,Spark Structured Streaming,Delta Lake,我正在数据流中从增量表(源)到增量表(目标) %python df = spark.readStream \ .format("delta") \ .load(path/to/source) query = (df .writeStream .format("delta") .option("mergeSchema", "true")

我正在数据流中从增量表(源)到增量表(目标)

%python

df = spark.readStream \
          .format("delta") \
          .load(path/to/source) 



query = (df
                .writeStream
                .format("delta")
                .option("mergeSchema", "true")
                .outputMode("append")
                .trigger(once=True) # Every 30 min
                .option("checkpointLocation","{0}/{1}/".format(checkpointsPath,key))           
                .table(tableName)
          )

但似乎在某个时间点,作业开始处理其应处理的“较少”数据:

您知道处理流式数据或其他数据是否有最大大小吗

我正在尝试调试读取日志,但找不到任何问题