Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark spark结构化流媒体检查点目录下的子目录_Apache Spark_Spark Structured Streaming - Fatal编程技术网

Apache spark spark结构化流媒体检查点目录下的子目录

Apache spark spark结构化流媒体检查点目录下的子目录,apache-spark,spark-structured-streaming,Apache Spark,Spark Structured Streaming,spark结构化流的检查点目录创建四个子目录。它们都是干什么用的 /warehouse/test_topic/checkpointdir1/commits /warehouse/test_topic/checkpointdir1/metadata /warehouse/test_topic/checkpointdir1/offsets /warehouse/test_topic/checkpointdir1/sources 从StreamExecution类文档: /** * A wri

spark结构化流的检查点目录创建四个子目录。它们都是干什么用的

/warehouse/test_topic/checkpointdir1/commits
/warehouse/test_topic/checkpointdir1/metadata
/warehouse/test_topic/checkpointdir1/offsets
/warehouse/test_topic/checkpointdir1/sources

从StreamExecution类文档:

/**
   * A write-ahead-log that records the offsets that are present in each batch. In order to ensure
   * that a given batch will always consist of the same data, we write to this log *before* any
   * processing is done.  Thus, the Nth record in this log indicated data that is currently being
   * processed and the N-1th entry indicates which offsets have been durably committed to the sink.
   */
  val offsetLog = new OffsetSeqLog(sparkSession, checkpointFile("offsets"))

  /**
   * A log that records the batch ids that have completed. This is used to check if a batch was
   * fully processed, and its output was committed to the sink, hence no need to process it again.
   * This is used (for instance) during restart, to help identify which batch to run next.
   */
  val commitLog = new CommitLog(sparkSession, checkpointFile("commits"))

元数据日志用于获取与查询相关的信息。e、 g在KafkaSource中,它用于写入查询的起始偏移量(每个分区的偏移量)

还需要一些源文件夹的清晰信息。我知道这篇文章太旧了,但是有人在这篇文章上活跃吗。请解释一下。谢谢,根据上面的解释,“元数据”存储查询的起始偏移量,但当我检查我的文件夹时,它的内容像“{”id:“3b7f07a0-2256-4097-83d4-dfeab6cf56cd”}”。我不知道这个十六进制代表什么。你能用例子解释一下吗。