Apache spark 每个执行器中的单个长时间运行任务
很抱歉,如果这个问题看起来无效,我试图找到调试任务处理时间的一般指南,但还没有找到。我认为我的问题是已知的,所以任何调试问题或理解问题的帮助(相关讨论或博客帖子)都会回答我的问题 我做了很多流媒体spark的工作,几乎所有的工作都有同样的问题;每个执行器中的一个任务比所有其他任务花费的时间要长得多: 但任务的输入大小并没有那么大的不同: 我的工作流程是在具有40个分区的直接Kafka流源上进行平面映射(Apache spark 每个执行器中的单个长时间运行任务,apache-spark,Apache Spark,很抱歉,如果这个问题看起来无效,我试图找到调试任务处理时间的一般指南,但还没有找到。我认为我的问题是已知的,所以任何调试问题或理解问题的帮助(相关讨论或博客帖子)都会回答我的问题 我做了很多流媒体spark的工作,几乎所有的工作都有同样的问题;每个执行器中的一个任务比所有其他任务花费的时间要长得多: 但任务的输入大小并没有那么大的不同: 我的工作流程是在具有40个分区的直接Kafka流源上进行平面映射(mapParitionsWithPair(flatMap)),以从事件生成更多对象,然后减
mapParitionsWithPair(flatMap)
),以从事件生成更多对象,然后减少它们(reduceByKey
),并将聚合值保存到某些DB:
任务时间线图用于缩减阶段
这是一个基于ApacheMesos的集群,每个节点有两个节点和两个核心,所有作业的第二阶段都有这种不均匀的任务处理时间分布
更新:
- 我用Java reduce操作(实际上是Kotlin序列操作)替换了
,仍然出现同样的问题reduceByKey
- 在重做这项工作后,我意识到这个问题对更大的投入造成了如此大的伤害;它在1.8到4.8分钟内处理160K个事件(更糟糕的情况是每秒580个事件),虽然有些任务需要更长的时间,但最终的影响比处理速率在660到54之间的小输入的危害小得多。有趣的是,在这两种情况下,长时间运行的任务得到相同的时间(大约41秒)
- 即使在增加RAM之后,问题仍然存在。执行器现在有%30个可用RAM
started processing partitioned input: thread 99
started processing partitioned input: thread 98
finished processing partitioned input: thread 99 took 40615ms
finished processing partitioned input: thread 98 took 40469ms
started processing partitioned input: thread 98
started processing partitioned input: thread 99
finished processing partitioned input: thread 98 took 40476ms
finished processing partitioned input: thread 99 took 40523ms
started processing partitioned input: thread 98
started processing partitioned input: thread 99
finished processing partitioned input: thread 98 40465ms
finished processing partitioned input: thread 99 40379ms
started processing partitioned input: thread 98
finished processing partitioned input: thread 98 468
started processing partitioned input: thread 99
finished processing partitioned input: thread 99 525
started processing partitioned input: thread 99
started processing partitioned input: thread 98
finished processing partitioned input: thread 98 738
finished processing partitioned input: thread 99 790
started processing partitioned input: thread 98
finished processing partitioned input: thread 98 took 558
started processing partitioned input: thread 99
finished processing partitioned input: thread 99 took 461
started processing partitioned input: thread 98
finished processing partitioned input: thread 98 took 483
started processing partitioned input: thread 99
finished processing partitioned input: thread 99 took 513
started processing partitioned input: thread 98
finished processing partitioned input: thread 98 took 485
started processing partitioned input: thread 99
finished processing partitioned input: thread 99 took 454
上述日志仅用于将一些传入输入映射到对象以保存在Cassandra中,不包括保存到Cassandra的时间;以下是保存操作的日志,该操作始终快速且不会让CPU闲置:
18/02/07 07:41:47 INFO Executor: Running task 17.0 in stage 5.0 (TID 207)
18/02/07 07:41:47 INFO TorrentBroadcast: Started reading broadcast variable 5
18/02/07 07:41:47 INFO MemoryStore: Block broadcast_5_piece0 stored as bytes in memory (estimated size 7.8 KB, free 1177.1 MB)
18/02/07 07:41:47 INFO TorrentBroadcast: Reading broadcast variable 5 took 33 ms
18/02/07 07:41:47 INFO MemoryStore: Block broadcast_5 stored as values in memory (estimated size 16.4 KB, free 1177.1 MB)
18/02/07 07:41:47 INFO BlockManager: Found block rdd_30_2 locally
18/02/07 07:41:47 INFO BlockManager: Found block rdd_30_17 locally
18/02/07 07:42:02 INFO TableWriter: Wrote 28926 rows to keyspace.table in 15.749 s.
18/02/07 07:42:02 INFO Executor: Finished task 17.0 in stage 5.0 (TID 207). 923 bytes result sent to driver
18/02/07 07:42:02 INFO CoarseGrainedExecutorBackend: Got assigned task 209
18/02/07 07:42:02 INFO Executor: Running task 18.0 in stage 5.0 (TID 209)
18/02/07 07:42:02 INFO BlockManager: Found block rdd_30_18 locally
18/02/07 07:42:03 INFO TableWriter: Wrote 29288 rows to keyspace.table in 16.042 s.
18/02/07 07:42:03 INFO Executor: Finished task 2.0 in stage 5.0 (TID 203). 1713 bytes result sent to driver
18/02/07 07:42:03 INFO CoarseGrainedExecutorBackend: Got assigned task 211
18/02/07 07:42:03 INFO Executor: Running task 21.0 in stage 5.0 (TID 211)
18/02/07 07:42:03 INFO BlockManager: Found block rdd_30_21 locally
18/02/07 07:42:19 INFO TableWriter: Wrote 29315 rows to keyspace.table in 16.308 s.
18/02/07 07:42:19 INFO Executor: Finished task 21.0 in stage 5.0 (TID 211). 923 bytes result sent to driver
18/02/07 07:42:19 INFO CoarseGrainedExecutorBackend: Got assigned task 217
18/02/07 07:42:19 INFO Executor: Running task 24.0 in stage 5.0 (TID 217)
18/02/07 07:42:19 INFO BlockManager: Found block rdd_30_24 locally
18/02/07 07:42:19 INFO TableWriter: Wrote 29422 rows to keyspace.table in 16.783 s.
18/02/07 07:42:19 INFO Executor: Finished task 18.0 in stage 5.0 (TID 209). 923 bytes result sent to driver
18/02/07 07:42:19 INFO CoarseGrainedExecutorBackend: Got assigned task 218
18/02/07 07:42:19 INFO Executor: Running task 25.0 in stage 5.0 (TID 218)
18/02/07 07:42:19 INFO BlockManager: Found block rdd_30_25 locally
18/02/07 07:42:35 INFO TableWriter: Wrote 29427 rows to keyspace.table in 16.509 s.
18/02/07 07:42:35 INFO Executor: Finished task 24.0 in stage 5.0 (TID 217). 923 bytes result sent to driver
18/02/07 07:42:35 INFO CoarseGrainedExecutorBackend: Got assigned task 225
@用户8371915:
{acc:Tuple2,rE:Tuple2->Tuple2(acc.\u 1+rE.\u 1,acc.\u 2+rE.\u 2)}
我们需要更多信息来帮助。这些任务属于哪个阶段?我们从reduceByKey
输出了多少个分区?您的钥匙可能有任何歪斜吗?给我们看看你的火花警犬。@YuvalItzchakov我想我已经分享了这些信息。。这是DAG(第三图)上方的第二阶段(3317)reduceByKey
partitions count为40,默认为具有40个分区的Kafka。将的分区减少到200不会改变长时间运行任务的处理时间。在第二个图中,根据输出大小,我看不到任何扭曲的数据。看起来这些任务只是在进行GC循环。您的executor堆有多大?@YuvalItzchakov executor内存为1GB,堆大小为381MB。您可以看到,输入数据的数量并没有那么大,DAG的复杂性也没有这么大。这些任务的GC时间也很短。