Scala Spark UnionRDD无法强制转换为HasOffsetRanges

Scala Spark UnionRDD无法强制转换为HasOffsetRanges,scala,apache-spark,apache-kafka,rdd,classcastexception,Scala,Apache Spark,Apache Kafka,Rdd,Classcastexception,因为我收到来自多个不同卡夫卡主题的消息。所以我需要使用StreamingContext.union方法来联合流。但我在尝试将卡夫卡偏移量更新为Zoopkeeper时遇到一些问题 错误如下: java.lang.ClassCastException: org.apache.spark.rdd.UnionRDD cannot be cast to org.apache.spark.streaming.kafka.HasOffsetRanges at com.qingqing.spark.util.K

因为我收到来自多个不同卡夫卡主题的消息。所以我需要使用StreamingContext.union方法来联合流。但我在尝试将卡夫卡偏移量更新为Zoopkeeper时遇到一些问题

错误如下:

java.lang.ClassCastException: org.apache.spark.rdd.UnionRDD cannot be cast to org.apache.spark.streaming.kafka.HasOffsetRanges
at com.qingqing.spark.util.KafkaManager.updateZKOffsets(KafkaManager.scala:75)
at com.qingqing.spark.BinlogConsumer$$anonfun$consumeBinlog$3.apply(BinlogConsumer.scala:43)
at com.qingqing.spark.BinlogConsumer$$anonfun$consumeBinlog$3.apply(BinlogConsumer.scala:41)
at org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:661)
at org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:661)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:50)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:50)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:50)
at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:426)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:49)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:49)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:49)
at scala.util.Try$.apply(Try.scala:161)
at org.apache.spark.streaming.scheduler.Job.run(Job.scala:39)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:224)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:224)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:224)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:223)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
我的代码如下:


有人能帮我找出这个问题吗。提前感谢

这并不是您使用Kafka和Spark流媒体的直接流媒体API的方式。因为您使用的是直接方法,这里没有接收器,所以不需要多个流,只需要一个流就可以使用所有主题。您需要提取要使用的所有主题的偏移量,然后创建由
createDirectStream
创建的
DirectKafkaInputDStream
的单个实例。因为我没有代码,所以粗略的草图应该是:

val offsets: Map[TopicAndPartition, Long] = 
  topics.map { /* create the (TopicAndPartition, Long) tuple here for each topic */ }

val kafkaInputStream = 
  KafkaUtils.createDirectStream(ssc, kafkaParams, offsets, (mmd) => (mmd.key, mmd.value))
对于没有存储任何偏移量的主题,只需从偏移量
0
开始

至于创建kafka流转换后需要直接映射的
HasOffsetRange
,只有这样底层的
RDD
才能真正实现该特性。您需要在流上立即进行
转换

val streamAfterTransform = kafkaInputStream.transform { rdd =>
  val ranges = rdd.asInstanceOf[HasOffsetRanges]
  // To stuff with ranges
}
可能重复的