Apache spark 每个执行器中的单个长时间运行任务_Apache Spark

Apache spark 每个执行器中的单个长时间运行任务

apache-spark

Apache spark 每个执行器中的单个长时间运行任务,apache-spark,Apache Spark,很抱歉，如果这个问题看起来无效，我试图找到调试任务处理时间的一般指南，但还没有找到。我认为我的问题是已知的，所以任何调试问题或理解问题的帮助（相关讨论或博客帖子）都会回答我的问题我做了很多流媒体spark的工作，几乎所有的工作都有同样的问题；每个执行器中的一个任务比所有其他任务花费的时间要长得多：但任务的输入大小并没有那么大的不同：我的工作流程是在具有40个分区的直接Kafka流源上进行平面映射（mapParitionsWithPair（flatMap）），以从事件生成更多对象，然后减

很抱歉，如果这个问题看起来无效，我试图找到调试任务处理时间的一般指南，但还没有找到。我认为我的问题是已知的，所以任何调试问题或理解问题的帮助（相关讨论或博客帖子）都会回答我的问题

我做了很多流媒体spark的工作，几乎所有的工作都有同样的问题；每个执行器中的一个任务比所有其他任务花费的时间要长得多：

但任务的输入大小并没有那么大的不同：

我的工作流程是在具有40个分区的直接Kafka流源上进行平面映射（

mapParitionsWithPair（flatMap）

），以从事件生成更多对象，然后减少它们（

reduceByKey

），并将聚合值保存到某些DB：

任务时间线图用于缩减阶段

这是一个基于ApacheMesos的集群，每个节点有两个节点和两个核心，所有作业的第二阶段都有这种不均匀的任务处理时间分布

更新：

我用Java reduce操作（实际上是Kotlin序列操作）替换了
```
reduceByKey
```
，仍然出现同样的问题
在重做这项工作后，我意识到这个问题对更大的投入造成了如此大的伤害；它在1.8到4.8分钟内处理160K个事件（更糟糕的情况是每秒580个事件），虽然有些任务需要更长的时间，但最终的影响比处理速率在660到54之间的小输入的危害小得多。有趣的是，在这两种情况下，长时间运行的任务得到相同的时间（大约41秒）
即使在增加RAM之后，问题仍然存在。执行器现在有%30个可用RAM

更新：

通过在每个分区中使用Java8 Stream reduce，我将工作流更改为不洗牌数据。以下是已更改作业的DAG：

我将批处理间隔增加到20秒，并添加了更多节点；现在，不仅有一个慢任务，还有更多的慢任务和几个快任务，但是：

现在它总体上比以前的版本快得多，间隔更短

我希望CPU的使用率总是很高，特别是在mapPartition中的操作，但这并不总是正确的

只需在每个分区的实际操作中做一些日志记录，我就会奇怪地看到任务有时慢，有时快。当任务进展缓慢时，CPU处于空闲状态，我看不到网络或CPU I/O的任何阻塞。内存使用率恒定在%50。这里提到了执行者日志：

started processing partitioned input: thread 99
started processing partitioned input: thread 98
finished processing partitioned input: thread 99 took 40615ms
finished processing partitioned input: thread 98 took 40469ms
started processing partitioned input: thread 98
started processing partitioned input: thread 99
finished processing partitioned input: thread 98 took 40476ms
finished processing partitioned input: thread 99 took 40523ms
started processing partitioned input: thread 98
started processing partitioned input: thread 99
finished processing partitioned input: thread 98 40465ms
finished processing partitioned input: thread 99 40379ms
started processing partitioned input: thread 98
finished processing partitioned input: thread 98 468
started processing partitioned input: thread 99
finished processing partitioned input: thread 99 525
started processing partitioned input: thread 99
started processing partitioned input: thread 98
finished processing partitioned input: thread 98 738
finished processing partitioned input: thread 99 790
started processing partitioned input: thread 98
finished processing partitioned input: thread 98 took 558
started processing partitioned input: thread 99
finished processing partitioned input: thread 99 took 461
started processing partitioned input: thread 98
finished processing partitioned input: thread 98 took 483
started processing partitioned input: thread 99
finished processing partitioned input: thread 99 took 513
started processing partitioned input: thread 98
finished processing partitioned input: thread 98 took 485
started processing partitioned input: thread 99
finished processing partitioned input: thread 99 took 454

上述日志仅用于将一些传入输入映射到对象以保存在Cassandra中，不包括保存到Cassandra的时间；以下是保存操作的日志，该操作始终快速且不会让CPU闲置：

18/02/07 07:41:47 INFO Executor: Running task 17.0 in stage 5.0 (TID 207)
18/02/07 07:41:47 INFO TorrentBroadcast: Started reading broadcast variable 5
18/02/07 07:41:47 INFO MemoryStore: Block broadcast_5_piece0 stored as bytes in memory (estimated size 7.8 KB, free 1177.1 MB)
18/02/07 07:41:47 INFO TorrentBroadcast: Reading broadcast variable 5 took 33 ms
18/02/07 07:41:47 INFO MemoryStore: Block broadcast_5 stored as values in memory (estimated size 16.4 KB, free 1177.1 MB)
18/02/07 07:41:47 INFO BlockManager: Found block rdd_30_2 locally
18/02/07 07:41:47 INFO BlockManager: Found block rdd_30_17 locally
18/02/07 07:42:02 INFO TableWriter: Wrote 28926 rows to keyspace.table in 15.749 s.
18/02/07 07:42:02 INFO Executor: Finished task 17.0 in stage 5.0 (TID 207). 923 bytes result sent to driver
18/02/07 07:42:02 INFO CoarseGrainedExecutorBackend: Got assigned task 209
18/02/07 07:42:02 INFO Executor: Running task 18.0 in stage 5.0 (TID 209)
18/02/07 07:42:02 INFO BlockManager: Found block rdd_30_18 locally
18/02/07 07:42:03 INFO TableWriter: Wrote 29288 rows to keyspace.table in 16.042 s.
18/02/07 07:42:03 INFO Executor: Finished task 2.0 in stage 5.0 (TID 203). 1713 bytes result sent to driver
18/02/07 07:42:03 INFO CoarseGrainedExecutorBackend: Got assigned task 211
18/02/07 07:42:03 INFO Executor: Running task 21.0 in stage 5.0 (TID 211)
18/02/07 07:42:03 INFO BlockManager: Found block rdd_30_21 locally
18/02/07 07:42:19 INFO TableWriter: Wrote 29315 rows to keyspace.table in 16.308 s.
18/02/07 07:42:19 INFO Executor: Finished task 21.0 in stage 5.0 (TID 211). 923 bytes result sent to driver
18/02/07 07:42:19 INFO CoarseGrainedExecutorBackend: Got assigned task 217
18/02/07 07:42:19 INFO Executor: Running task 24.0 in stage 5.0 (TID 217)
18/02/07 07:42:19 INFO BlockManager: Found block rdd_30_24 locally
18/02/07 07:42:19 INFO TableWriter: Wrote 29422 rows to keyspace.table in 16.783 s.
18/02/07 07:42:19 INFO Executor: Finished task 18.0 in stage 5.0 (TID 209). 923 bytes result sent to driver
18/02/07 07:42:19 INFO CoarseGrainedExecutorBackend: Got assigned task 218
18/02/07 07:42:19 INFO Executor: Running task 25.0 in stage 5.0 (TID 218)
18/02/07 07:42:19 INFO BlockManager: Found block rdd_30_25 locally
18/02/07 07:42:35 INFO TableWriter: Wrote 29427 rows to keyspace.table in 16.509 s.
18/02/07 07:42:35 INFO Executor: Finished task 24.0 in stage 5.0 (TID 217). 923 bytes result sent to driver
18/02/07 07:42:35 INFO CoarseGrainedExecutorBackend: Got assigned task 225

@用户8371915:

{acc:Tuple2，rE:Tuple2->Tuple2（acc.\u 1+rE.\u 1，acc.\u 2+rE.\u 2）}

我们需要更多信息来帮助。这些任务属于哪个阶段？我们从

reduceByKey

输出了多少个分区？您的钥匙可能有任何歪斜吗？给我们看看你的火花警犬。@YuvalItzchakov我想我已经分享了这些信息。。这是DAG（第三图）上方的第二阶段（3317）

reduceByKey

partitions count为40，默认为具有40个分区的Kafka。将

的分区减少到200不会改变长时间运行任务的处理时间。在第二个图中，根据输出大小，我看不到任何扭曲的数据。看起来这些任务只是在进行GC循环。您的executor堆有多大？@YuvalItzchakov executor内存为1GB，堆大小为381MB。您可以看到，输入数据的数量并没有那么大，DAG的复杂性也没有这么大。这些任务的GC时间也很短。