Apache spark 查找Spark的调度程序延迟_Apache Spark

Apache spark 查找Spark的调度程序延迟

apache-spark

Apache spark 查找Spark的调度程序延迟,apache-spark,Apache Spark,我希望能够为每个任务生成一个度量表，就像访问特定阶段时收集器Spark UI上的表一样其中一列是Scheduler delay，我在Spark提供的任何REST api中都找不到它所有其他列都存在（当我浏览/api/v1/applications/[app id]/stages/[stage id]/[trunt]/taskList时）调度程序延迟是如何计算的/是否有一种方法可以在不刮取收集器Spark UI网页的情况下提取数据？历史api中没有提供调度程序延迟，是的。对于UI，其计算如下

我希望能够为每个任务生成一个度量表，就像访问特定阶段时收集器Spark UI上的表一样

其中一列是Scheduler delay，我在Spark提供的任何REST api中都找不到它

所有其他列都存在（当我浏览/api/v1/applications/[app id]/stages/[stage id]/[trunt]/taskList时）

调度程序延迟是如何计算的/是否有一种方法可以在不刮取收集器Spark UI网页的情况下提取数据？

历史api中没有提供调度程序延迟，是的。对于UI，其计算如下：

private[ui] def getSchedulerDelay(info: TaskInfo, metrics: TaskMetricsUIData, currentTime: Long): Long = {
    if (info.finished) {
        val totalExecutionTime = info.finishTime - info.launchTime
        val executorOverhead = (metrics.executorDeserializeTime + metrics.resultSerializationTime)
        math.max(0,totalExecutionTime - metrics.executorRunTime - executorOverhead - getGettingResultTime(info, currentTime))
    } else {
        // The task is still running and the metrics like executorRunTime are not available.
        0L
    }
}

请至少参阅spark 1.6的行号770，如果您正在查找spark streaming batch的调度延迟，请参阅

它使用一个类，其中定义了

调度延迟

：

/**
 * Time taken for the first job of this batch to start processing from the time this batch
 * was submitted to the streaming scheduler. Essentially, it is
 * `processingStartTime` - `submissionTime`.
 */
def schedulingDelay: Option[Long] = processingStartTime.map(_ - submissionTime)