Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark 查找Spark的调度程序延迟_Apache Spark - Fatal编程技术网

Apache spark 查找Spark的调度程序延迟

Apache spark 查找Spark的调度程序延迟,apache-spark,Apache Spark,我希望能够为每个任务生成一个度量表,就像访问特定阶段时收集器Spark UI上的表一样 其中一列是Scheduler delay,我在Spark提供的任何REST api中都找不到它 所有其他列都存在(当我浏览/api/v1/applications/[app id]/stages/[stage id]/[trunt]/taskList时) 调度程序延迟是如何计算的/是否有一种方法可以在不刮取收集器Spark UI网页的情况下提取数据?历史api中没有提供调度程序延迟,是的。对于UI,其计算如下

我希望能够为每个任务生成一个度量表,就像访问特定阶段时收集器Spark UI上的表一样

其中一列是Scheduler delay,我在Spark提供的任何REST api中都找不到它

所有其他列都存在(当我浏览/api/v1/applications/[app id]/stages/[stage id]/[trunt]/taskList时)


调度程序延迟是如何计算的/是否有一种方法可以在不刮取收集器Spark UI网页的情况下提取数据?

历史api中没有提供调度程序延迟,是的。对于UI,其计算如下:

private[ui] def getSchedulerDelay(info: TaskInfo, metrics: TaskMetricsUIData, currentTime: Long): Long = {
    if (info.finished) {
        val totalExecutionTime = info.finishTime - info.launchTime
        val executorOverhead = (metrics.executorDeserializeTime + metrics.resultSerializationTime)
        math.max(0,totalExecutionTime - metrics.executorRunTime - executorOverhead - getGettingResultTime(info, currentTime))
    } else {
        // The task is still running and the metrics like executorRunTime are not available.
        0L
    }
}

请至少参阅spark 1.6的行号770,如果您正在查找spark streaming batch的调度延迟,请参阅

它使用一个类,其中定义了
调度延迟

/**
 * Time taken for the first job of this batch to start processing from the time this batch
 * was submitted to the streaming scheduler. Essentially, it is
 * `processingStartTime` - `submissionTime`.
 */
def schedulingDelay: Option[Long] = processingStartTime.map(_ - submissionTime)