Apache spark 有人能解释一下executors中的rdd块吗_Apache Spark_Rdd

Apache spark 有人能解释一下executors中的rdd块吗

apache-spark

Apache spark 有人能解释一下executors中的rdd块吗,apache-spark,rdd,Apache Spark,Rdd,有人能解释为什么在我第二次运行spark代码时rdd块会增加，即使它们在第一次运行时存储在spark内存中。我使用线程进行输入。rdd块的确切含义是什么。我今天一直在研究这个问题，似乎rdd块是rdd块和非rdd块的总和。请访问以下网址查看代码：如果您转到Github上Apache Spark Repo的以下链接：您将看到以下代码行： /** * Return the number of blocks stored in this block manager in O

有人能解释为什么在我第二次运行spark代码时rdd块会增加，即使它们在第一次运行时存储在spark内存中。我使用线程进行输入。rdd块的确切含义是什么。

我今天一直在研究这个问题，似乎rdd块是rdd块和非rdd块的总和。请访问以下网址查看代码：

如果您转到Github上Apache Spark Repo的以下链接：

您将看到以下代码行：

      /**
   * Return the number of blocks stored in this block manager in O(RDDs) time.
   *
   * @note This is much faster than `this.blocks.size`, which is O(blocks) time.
   */
  def numBlocks: Int = _nonRddBlocks.size + numRddBlocks

非rdd块是广播变量创建的块，因为它们作为缓存块存储在内存中。驱动程序通过广播变量将任务发送给执行者。现在，通过ContextCleaner服务删除这些系统创建的广播变量，从而删除相应的非RDD块。 RDD块通过RDD.unpersist（）取消持久化

      /**
   * Return the number of blocks stored in this block manager in O(RDDs) time.
   *
   * @note This is much faster than `this.blocks.size`, which is O(blocks) time.
   */
  def numBlocks: Int = _nonRddBlocks.size + numRddBlocks