Java spark streaming mapPartition/map是重新运行的方法吗?

Java spark streaming mapPartition/map是重新运行的方法吗?,java,apache-spark,spark-streaming,yarn,Java,Apache Spark,Spark Streaming,Yarn,我构建了纱线集群(docker)来运行java基础SparkStreaming 我不知道spark为什么在KafkaStreaming重新运行相同的函数映射分区 执行者1本地转入。 我发现executor 2远程传输了两次数据 我需要配置一些东西让spark不重新运行相同的函数吗 ``` 执行人1 日志中的关键字 在本地找到块输入-0-1498129808200 键=170622191007573 执行器2 日志中的关键字被打印了两次 远程找到块rdd_3061_0 键=17062219100

我构建了纱线集群(docker)来运行java基础SparkStreaming

我不知道spark为什么在KafkaStreaming重新运行相同的函数映射分区

执行者1本地转入。
我发现executor 2远程传输了两次数据

我需要配置一些东西让spark不重新运行相同的函数吗


```

执行人1 日志中的关键字

在本地找到块输入-0-1498129808200
键=170622191007573

执行器2
日志中的关键字被打印了两次

远程找到块rdd_3061_0
键=170622191007573


由于您从卡夫卡读取数据,因此流将由spark监听。因此,处理流的任务将在从kafka读取流时重新运行。希望答案能对您有所帮助。

我将方法mapPartitions更改为mapPartitionsToPair.reduceByKey,然后只打印一次日志。我不知道为什么。谁知道这两种方法的工作流程。仍然不知道“处理流的作业将作为流重新运行”如何/为什么
JavaPairInputDStream<String, String> inputDStream = KafkaUtils.createStream(jssc,String.class,String.class,StringDecoder.class,StringDecoder.class,kafkaConfig,topic,StorageLevel.MEMORY_ONLY_SER_2());
...
JavaDStream<Data> mapPartitions = inputDStream.mapPartitions(new FlatMapFunction<Iterator<Tuple2<String, String>>, Data>() {
                private static final long serialVersionUID = -640088436146512943L;

                @Override
                public Iterator<Data> call(Iterator<Tuple2<String, String>> t) throws Exception {
                    List<Data> result = new ArrayList<>();
                    Logger log = Logger.getLogger(this.getClass());
                    while (t.hasNext()) {
                        Tuple2<String,String> tuple = t.next();
                        log.info("key="+tuple._1())
                        Data d = new Data();
                        String[] arr =tuple._2().split(",");
                        d.setKey(tuple._1());
                        d.setUser(arr[0]);
                    ......//do somthing
                        result.add(data);
                    }
                    return result.iterator();

                }

            });
19:10:08.004 INFO org.apache.spark.storage.BlockManager:54 Removing RDD 3057
19:10:08.004 INFO org.apache.spark.rdd.MapPartitionsRDD:54 Removing RDD 3056 from persistence list
19:10:08.004 INFO org.apache.spark.storage.BlockManager:54 Removing RDD 3056
19:10:08.004 INFO org.apache.spark.rdd.BlockRDD:54 Removing RDD 3055 from persistence list
19:10:08.004 INFO org.apache.spark.storage.BlockManager:54 Removing RDD 3055
19:10:08.004 INFO org.apache.spark.streaming.kafka.KafkaInputDStream:54 Removing blocks of RDD BlockRDD[3055] at createStream at KafkaStreaming.java:146 of time 1498129808000 ms
19:10:08.005 INFO org.apache.spark.streaming.scheduler.ReceivedBlockTracker:54 Deleting batches: 1498129807000 ms
19:10:08.005 INFO org.apache.spark.streaming.scheduler.InputInfoTracker:54 remove old batch metadata: 1498129807000 ms
19:10:08.403 INFO org.apache.spark.storage.BlockManagerInfo:54 Added input-0-1498129808200 in memory on slave2:50830 (size: 245.0 B, free: 983.0 MB)
19:10:08.414 INFO org.apache.spark.storage.BlockManagerInfo:54 Added input-0-1498129808200 in memory on slave1:41063 (size: 245.0 B, free: 983.1 MB)
19:10:08.501 INFO org.apache.spark.streaming.scheduler.JobScheduler:54 Added jobs for time 1498129808500 ms
19:10:08.501 INFO org.apache.spark.streaming.scheduler.JobScheduler:54 Starting job streaming job 1498129808500 ms.0 from job set of time 1498129808500 ms
19:10:08.502 INFO org.apache.spark.streaming.scheduler.JobScheduler:54 Finished job streaming job 1498129808500 ms.0 from job set of time 1498129808500 ms
19:10:08.502 INFO org.apache.spark.streaming.scheduler.JobScheduler:54 Starting job streaming job 1498129808500 ms.1 from job set of time 1498129808500 ms
19:10:08.502 DEBUG example.spark.streaming.KafkaStreaming$VoidFunctionImpl:201 run foreachMapPartitionsRDD[3062] at mapPartitions at KafkaStreaming.java:170
19:10:08.502 INFO org.apache.spark.scheduler.DAGScheduler:54 Got job 2044 (foreachPartitionAsync at KafkaStreaming.java:239) with 1 output partitions
19:10:08.503 INFO org.apache.spark.scheduler.DAGScheduler:54 Final stage: ResultStage 15 (foreachPartitionAsync at KafkaStreaming.java:239)
19:10:08.503 INFO org.apache.spark.scheduler.DAGScheduler:54 Parents of final stage: List()
19:10:08.503 INFO org.apache.spark.scheduler.DAGScheduler:54 Missing parents: List()
19:10:08.503 INFO org.apache.spark.scheduler.DAGScheduler:54 Submitting ResultStage 15 (MapPartitionsRDD[3063] at mapPartitions at KafkaStreaming.java:172), which has no missing parents
19:10:08.503 INFO org.apache.spark.streaming.scheduler.JobScheduler:54 Finished job streaming job 1498129808500 ms.1 from job set of time 1498129808500 ms
19:10:08.503 INFO org.apache.spark.streaming.scheduler.JobScheduler:54 Starting job streaming job 1498129808500 ms.2 from job set of time 1498129808500 ms
19:10:08.505 INFO org.apache.spark.storage.memory.MemoryStore:54 Block broadcast_15 stored as values in memory (estimated size 3.4 KB, free 1105.8 MB)
19:10:08.506 INFO org.apache.spark.SparkContext:54 Starting job: print at KafkaStreaming.java:177
19:10:08.508 INFO org.apache.spark.storage.memory.MemoryStore:54 Block broadcast_15_piece0 stored as bytes in memory (estimated size 2.2 KB, free 1105.8 MB)
19:10:08.508 INFO org.apache.spark.storage.BlockManagerInfo:54 Added broadcast_15_piece0 in memory on slave2:53474 (size: 2.2 KB, free: 1105.9 MB)
19:10:08.508 INFO org.apache.spark.SparkContext:54 Created broadcast 15 from broadcast at DAGScheduler.scala:996
19:10:08.509 INFO org.apache.spark.scheduler.DAGScheduler:54 Submitting 1 missing tasks from ResultStage 15 (MapPartitionsRDD[3063] at mapPartitions at KafkaStreaming.java:172)
19:10:08.509 INFO org.apache.spark.scheduler.cluster.YarnClusterScheduler:54 Adding task set 15.0 with 1 tasks
19:10:08.509 INFO org.apache.spark.scheduler.FairSchedulableBuilder:54 Added task set TaskSet_15.0 tasks to pool default
19:10:08.510 INFO org.apache.spark.scheduler.DAGScheduler:54 Got job 2045 (foreachPartitionAsync at KafkaStreaming.java:202) with 1 output partitions
19:10:08.510 INFO org.apache.spark.scheduler.DAGScheduler:54 Final stage: ResultStage 16 (foreachPartitionAsync at KafkaStreaming.java:202)
19:10:08.510 INFO org.apache.spark.scheduler.DAGScheduler:54 Parents of final stage: List()
19:10:08.510 INFO org.apache.spark.scheduler.TaskSetManager:54 Starting task 0.0 in stage 15.0 (TID 83, slave2, executor 1, partition 0, NODE_LOCAL, 6301 bytes)
19:10:08.510 INFO org.apache.spark.scheduler.DAGScheduler:54 Missing parents: List()
19:10:08.510 INFO org.apache.spark.scheduler.DAGScheduler:54 Submitting ResultStage 16 (MapPartitionsRDD[3062] at mapPartitions at KafkaStreaming.java:170), which has no missing parents
19:10:08.511 INFO org.apache.spark.storage.memory.MemoryStore:54 Block broadcast_16 stored as values in memory (estimated size 3.0 KB, free 1105.8 MB)
19:10:08.514 INFO org.apache.spark.storage.memory.MemoryStore:54 Block broadcast_16_piece0 stored as bytes in memory (estimated size 2.0 KB, free 1105.8 MB)
19:10:08.514 INFO org.apache.spark.storage.BlockManagerInfo:54 Added broadcast_16_piece0 in memory on slave2:53474 (size: 2.0 KB, free: 1105.9 MB)
19:10:08.515 INFO org.apache.spark.SparkContext:54 Created broadcast 16 from broadcast at DAGScheduler.scala:996
19:10:08.515 INFO org.apache.spark.scheduler.DAGScheduler:54 Submitting 1 missing tasks from ResultStage 16 (MapPartitionsRDD[3062] at mapPartitions at KafkaStreaming.java:170)
19:10:08.515 INFO org.apache.spark.scheduler.cluster.YarnClusterScheduler:54 Adding task set 16.0 with 1 tasks
19:10:08.515 INFO org.apache.spark.scheduler.FairSchedulableBuilder:54 Added task set TaskSet_16.0 tasks to pool default
19:10:08.516 INFO org.apache.spark.scheduler.DAGScheduler:54 Got job 2046 (print at KafkaStreaming.java:177) with 1 output partitions
19:10:08.516 INFO org.apache.spark.scheduler.DAGScheduler:54 Final stage: ResultStage 17 (print at KafkaStreaming.java:177)
19:10:08.517 INFO org.apache.spark.storage.BlockManagerInfo:54 Added broadcast_15_piece0 in memory on slave2:50830 (size: 2.2 KB, free: 983.0 MB)
19:10:08.516 INFO org.apache.spark.scheduler.TaskSetManager:54 Starting task 0.0 in stage 16.0 (TID 84, slave2, executor 2, partition 0, NODE_LOCAL, 6301 bytes)
19:10:08.517 INFO org.apache.spark.scheduler.DAGScheduler:54 Parents of final stage: List()
19:10:08.518 INFO org.apache.spark.scheduler.DAGScheduler:54 Missing parents: List()
19:10:08.518 INFO org.apache.spark.scheduler.DAGScheduler:54 Submitting ResultStage 17 (MapPartitionsRDD[3063] at mapPartitions at KafkaStreaming.java:172), which has no missing parents
19:10:08.519 INFO org.apache.spark.storage.memory.MemoryStore:54 Block broadcast_17 stored as values in memory (estimated size 3.2 KB, free 1105.8 MB)
19:10:08.522 INFO org.apache.spark.storage.memory.MemoryStore:54 Block broadcast_17_piece0 stored as bytes in memory (estimated size 2.1 KB, free 1105.8 MB)
19:10:08.522 INFO org.apache.spark.storage.BlockManagerInfo:54 Added broadcast_17_piece0 in memory on slave2:53474 (size: 2.1 KB, free: 1105.9 MB)
19:10:08.523 INFO org.apache.spark.SparkContext:54 Created broadcast 17 from broadcast at DAGScheduler.scala:996
19:10:08.523 INFO org.apache.spark.scheduler.DAGScheduler:54 Submitting 1 missing tasks from ResultStage 17 (MapPartitionsRDD[3063] at mapPartitions at KafkaStreaming.java:172)
19:10:08.523 INFO org.apache.spark.scheduler.cluster.YarnClusterScheduler:54 Adding task set 17.0 with 1 tasks
19:10:08.523 INFO org.apache.spark.scheduler.FairSchedulableBuilder:54 Added task set TaskSet_17.0 tasks to pool default
19:10:08.524 INFO org.apache.spark.scheduler.TaskSetManager:54 Starting task 0.0 in stage 17.0 (TID 85, slave2, executor 2, partition 0, NODE_LOCAL, 6904 bytes)
19:10:08.526 INFO org.apache.spark.storage.BlockManagerInfo:54 Added rdd_3061_0 in memory on slave2:50830 (size: 245.0 B, free: 983.0 MB)
19:10:08.528 INFO org.apache.spark.storage.BlockManagerInfo:54 Added broadcast_16_piece0 in memory on slave1:41063 (size: 2.0 KB, free: 983.1 MB)
19:10:08.534 INFO org.apache.spark.storage.BlockManagerInfo:54 Added broadcast_17_piece0 in memory on slave1:41063 (size: 2.1 KB, free: 983.1 MB)
19:10:08.547 INFO org.apache.spark.scheduler.TaskSetManager:54 Finished task 0.0 in stage 17.0 (TID 85) in 23 ms on slave2 (executor 2) (1/1)
19:10:08.547 INFO org.apache.spark.scheduler.cluster.YarnClusterScheduler:54 Removed TaskSet 17.0, whose tasks have all completed, from pool default
19:10:08.547 INFO org.apache.spark.scheduler.DAGScheduler:54 ResultStage 17 (print at KafkaStreaming.java:177) finished in 0.023 s
19:10:08.547 INFO org.apache.spark.scheduler.DAGScheduler:54 Job 2046 finished: print at KafkaStreaming.java:177, took 0.041731 s
19:10:08.548 INFO org.apache.spark.streaming.scheduler.JobScheduler:54 Finished job streaming job 1498129808500 ms.2 from job set of time 1498129808500 ms
19:10:08.548 INFO org.apache.spark.streaming.scheduler.JobScheduler:54 Total delay: 0.048 s for time 1498129808500 ms (execution: 0.047 s)
19:10:08.548 INFO org.apache.spark.rdd.MapPartitionsRDD:54 Removing RDD 3060 from persistence list
19:10:08.548 INFO org.apache.spark.storage.BlockManager:54 Removing RDD 3060
19:10:08.549 INFO org.apache.spark.rdd.MapPartitionsRDD:54 Removing RDD 3059 from persistence list
19:10:08.549 INFO org.apache.spark.storage.BlockManager:54 Removing RDD 3059
19:10:08.549 INFO org.apache.spark.rdd.BlockRDD:54 Removing RDD 3058 from persistence list
19:10:08.549 INFO org.apache.spark.storage.BlockManager:54 Removing RDD 3058
19:10:08.549 INFO org.apache.spark.streaming.kafka.KafkaInputDStream:54 Removing blocks of RDD BlockRDD[3058] at createStream at KafkaStreaming.java:146 of time 1498129808500 ms
19:10:08.550 INFO org.apache.spark.streaming.scheduler.ReceivedBlockTracker:54 Deleting batches: 1498129807500 ms
19:10:08.550 INFO org.apache.spark.streaming.scheduler.InputInfoTracker:54 remove old batch metadata: 1498129807500 ms
19:10:08.588 INFO org.apache.spark.scheduler.TaskSetManager:54 Finished task 0.0 in stage 16.0 (TID 84) in 72 ms on slave1 (executor 2) (1/1)
19:10:08.588 INFO org.apache.spark.scheduler.cluster.YarnClusterScheduler:54 Removed TaskSet 16.0, whose tasks have all completed, from pool default
19:10:08.588 INFO org.apache.spark.scheduler.DAGScheduler:54 ResultStage 16 (foreachPartitionAsync at KafkaStreaming.java:202) finished in 0.073 s
19:10:08.620 INFO org.apache.spark.scheduler.TaskSetManager:54 Finished task 0.0 in stage 15.0 (TID 83) in 110 ms on slave2 (executor 1) (1/1)
19:10:08.620 INFO org.apache.spark.scheduler.cluster.YarnClusterScheduler:54 Removed TaskSet 15.0, whose tasks have all completed, from pool default
19:10:08.620 INFO org.apache.spark.scheduler.DAGScheduler:54 ResultStage 15 (foreachPartitionAsync at KafkaStreaming.java:239) finished in 0.111 s
19:10:09.002 INFO org.apache.spark.streaming.scheduler.JobScheduler:54 Added jobs for time 1498129809000 ms
19:10:09.002 INFO org.apache.spark.streaming.scheduler.JobScheduler:54 Starting job streaming job 1498129809000 ms.0 from job set of time 1498129809000 ms
19:10:09.003 INFO org.apache.spark.streaming.scheduler.JobScheduler:54 Finished job streaming job 1498129809000 ms.0 from job set of time 1498129809000 ms
19:10:09.003 INFO org.apache.spark.streaming.scheduler.JobScheduler:54 Starting job streaming job 1498129809000 ms.1 from job set of time 1498129809000 ms
19:10:09.003 DEBUG example.spark.streaming.KafkaStreaming$VoidFunctionImpl:201 run foreachMapPartitionsRDD[3065] at mapPartitions at KafkaStreaming.java:170
19:10:09.003 INFO org.apache.spark.streaming.scheduler.JobScheduler:54 Finished job streaming job 1498129809000 ms.1 from job set of time 1498129809000 ms
19:10:09.004 INFO org.apache.spark.streaming.scheduler.JobScheduler:54 Starting job streaming job 1498129809000 ms.2 from job set of time 1498129809000 ms
19:10:09.004 INFO org.apache.spark.streaming.scheduler.JobScheduler:54 Finished job streaming job 1498129809000 ms.2 from job set of time 1498129809000 ms
19:10:09.004 INFO org.apache.spark.rdd.MapPartitionsRDD:54 Removing RDD 3063 from persistence list
19:10:09.004 INFO org.apache.spark.streaming.scheduler.JobScheduler:54 Total delay: 0.004 s for time 1498129809000 ms (execution: 0.002 s)
19:10:09.004 INFO org.apache.spark.storage.BlockManager:54 Removing RDD 3063
19:10:09.004 INFO org.apache.spark.rdd.MapPartitionsRDD:54 Removing RDD 3062 from persistence list
19:10:09.004 INFO org.apache.spark.storage.BlockManager:54 Removing RDD 3062
19:10:09.004 INFO org.apache.spark.rdd.BlockRDD:54 Removing RDD 3061 from persistence list
19:10:09.005 INFO org.apache.spark.storage.BlockManager:54 Removing RDD 3061
19:10:09.005 INFO org.apache.spark.streaming.kafka.KafkaInputDStream:54 Removing blocks of RDD BlockRDD[3061] at createStream at KafkaStreaming.java:146 of time 1498129809000 ms
19:10:09.005 INFO org.apache.spark.streaming.scheduler.ReceivedBlockTracker:54 Deleting batches: 1498129808000 ms
19:10:09.005 INFO org.apache.spark.streaming.scheduler.InputInfoTracker:54 remove old batch metadata: 1498129808000 ms
19:10:09.006 INFO org.apache.spark.storage.BlockManagerInfo:54 Removed input-0-1498129808200 on slave2:50830 in memory (size: 245.0 B, free: 983.0 MB)
19:10:09.006 INFO org.apache.spark.storage.BlockManagerInfo:54 Removed input-0-1498129808200 on slave1:41063 in memory (size: 245.0 B, free: 983.1 MB)
19:10:07.506 INFO org.apache.spark.storage.BlockManager:54 Removing RDD 3053
19:10:07.506 INFO org.apache.spark.storage.BlockManager:54 Removing RDD 3052
19:10:08.005 INFO org.apache.spark.storage.BlockManager:54 Removing RDD 3057
19:10:08.006 INFO org.apache.spark.storage.BlockManager:54 Removing RDD 3056
19:10:08.006 INFO org.apache.spark.storage.BlockManager:54 Removing RDD 3055
19:10:08.402 INFO org.apache.spark.storage.memory.MemoryStore:54 Block input-0-1498129808200 stored as bytes in memory (estimated size 245.0 B, free 983.0 MB)
19:10:08.416 INFO org.apache.spark.streaming.receiver.BlockGenerator:54 Pushed block input-0-1498129808200
19:10:08.511 INFO org.apache.spark.executor.CoarseGrainedExecutorBackend:54 Got assigned task 83
19:10:08.512 INFO org.apache.spark.executor.Executor:54 Running task 0.0 in stage 15.0 (TID 83)
19:10:08.513 INFO org.apache.spark.broadcast.TorrentBroadcast:54 Started reading broadcast variable 15
19:10:08.515 INFO org.apache.spark.storage.memory.MemoryStore:54 Block broadcast_15_piece0 stored as bytes in memory (estimated size 2.2 KB, free 983.0 MB)
19:10:08.518 INFO org.apache.spark.broadcast.TorrentBroadcast:54 Reading broadcast variable 15 took 5 ms
19:10:08.520 INFO org.apache.spark.storage.memory.MemoryStore:54 Block broadcast_15 stored as values in memory (estimated size 3.4 KB, free 983.0 MB)
19:10:08.523 INFO org.apache.spark.storage.BlockManager:54 Found block input-0-1498129808200 locally
19:10:08.525 INFO org.apache.spark.storage.memory.MemoryStore:54 Block rdd_3061_0 stored as bytes in memory (estimated size 245.0 B, free 983.0 MB)
19:10:08.528 DEBUG example.spark.streaming.KafkaStreaming$FlatMapFunctionImpl:284 key=170622191007573
19:10:08.549 INFO org.apache.spark.storage.BlockManager:54 Removing RDD 3060
19:10:08.549 INFO org.apache.spark.streaming.receiver.ReceiverSupervisorImpl:54 Received a new rate limit: 100.
19:10:08.549 INFO org.apache.spark.storage.BlockManager:54 Removing RDD 3059
19:10:08.549 INFO org.apache.spark.storage.BlockManager:54 Removing RDD 3058
19:10:07.506 INFO org.apache.spark.storage.BlockManager:54 Removing RDD 3053
19:10:07.506 INFO org.apache.spark.storage.BlockManager:54 Removing RDD 3052
19:10:08.005 INFO org.apache.spark.storage.BlockManager:54 Removing RDD 3057
19:10:08.006 INFO org.apache.spark.storage.BlockManager:54 Removing RDD 3056
19:10:08.006 INFO org.apache.spark.storage.BlockManager:54 Removing RDD 3055
19:10:08.407 INFO org.apache.spark.storage.memory.MemoryStore:54 Block input-0-1498129808200 stored as bytes in memory (estimated size 245.0 B, free 983.1 MB)
19:10:08.519 INFO org.apache.spark.executor.CoarseGrainedExecutorBackend:54 Got assigned task 84
19:10:08.520 INFO org.apache.spark.executor.Executor:54 Running task 0.0 in stage 16.0 (TID 84)
19:10:08.521 INFO org.apache.spark.broadcast.TorrentBroadcast:54 Started reading broadcast variable 16
19:10:08.526 INFO org.apache.spark.executor.CoarseGrainedExecutorBackend:54 Got assigned task 85
19:10:08.527 INFO org.apache.spark.executor.Executor:54 Running task 0.0 in stage 17.0 (TID 85)
19:10:08.528 INFO org.apache.spark.storage.memory.MemoryStore:54 Block broadcast_16_piece0 stored as bytes in memory (estimated size 2.0 KB, free 983.1 MB)
19:10:08.530 INFO org.apache.spark.broadcast.TorrentBroadcast:54 Reading broadcast variable 16 took 9 ms
19:10:08.532 INFO org.apache.spark.storage.memory.MemoryStore:54 Block broadcast_16 stored as values in memory (estimated size 3.0 KB, free 983.1 MB)
19:10:08.532 INFO org.apache.spark.broadcast.TorrentBroadcast:54 Started reading broadcast variable 17
19:10:08.534 INFO org.apache.spark.storage.memory.MemoryStore:54 Block broadcast_17_piece0 stored as bytes in memory (estimated size 2.1 KB, free 983.1 MB)
19:10:08.536 INFO org.apache.spark.broadcast.TorrentBroadcast:54 Reading broadcast variable 17 took 4 ms
19:10:08.536 INFO org.apache.spark.storage.BlockManager:54 Found block rdd_3061_0 remotely
19:10:08.536 DEBUG example.spark.streaming.KafkaStreaming$FlatMapFunctionImpl:284 key=170622191007573
19:10:08.537 INFO org.apache.spark.storage.memory.MemoryStore:54 Block broadcast_17 stored as values in memory (estimated size 3.2 KB, free 983.1 MB)
19:10:08.541 INFO org.apache.spark.storage.BlockManager:54 Found block rdd_3061_0 remotely
19:10:08.542 DEBUG example.spark.streaming.KafkaStreaming$FlatMapFunctionImpl:284 key=170622191007573
19:10:08.547 INFO org.apache.spark.executor.Executor:54 Finished task 0.0 in stage 17.0 (TID 85). 1963 bytes result sent to driver
19:10:08.550 INFO org.apache.spark.storage.BlockManager:54 Removing RDD 3060
19:10:08.550 INFO org.apache.spark.storage.BlockManager:54 Removing RDD 3059
19:10:08.552 INFO org.apache.spark.storage.BlockManager:54 Removing RDD 3058