Scala 通过Spark D Streams提交时无法将消耗的Kafka偏移量存储到HBase表中
我试图在通过业务逻辑处理Kafka Consumer offset之后,将其保存在一个带有成功标志的HBase表中。整个过程是Spark DStream的一部分,我使用下面的代码来实现这一点:Scala 通过Spark D Streams提交时无法将消耗的Kafka偏移量存储到HBase表中,scala,apache-kafka,hbase,spark-streaming,kafka-consumer-api,Scala,Apache Kafka,Hbase,Spark Streaming,Kafka Consumer Api,我试图在通过业务逻辑处理Kafka Consumer offset之后,将其保存在一个带有成功标志的HBase表中。整个过程是Spark DStream的一部分,我使用下面的代码来实现这一点: val hbaseTable = "table" val kafkaParams = Map[String, Object]( "bootstrap.servers" -> "server", "key.deserializer" -> classOf[Str
val hbaseTable = "table"
val kafkaParams = Map[String, Object](
"bootstrap.servers" -> "server",
"key.deserializer" -> classOf[StringDeserializer],
"value.deserializer" -> classOf[StringDeserializer],
"group.id" -> "topic",
"auto.offset.reset" -> "earliest",
"enable.auto.commit" -> (false: java.lang.Boolean))
val topic = Array("topicName")
val ssc = new StreamingContext(sc, Seconds(40))
val stream = KafkaUtils.createDirectStream[String, String](ssc,PreferConsistent,Subscribe[String, String] (topic, kafkaParams))
stream.foreachRDD((rdd, batchTime) => {
val offsetRanges = rdd.asInstanceOf[HasOffsetRanges].offsetRanges
offsetRanges.foreach(offset => println(offset.topic, offset.partition,
offset.fromOffset, offset.untilOffset))
rdd.map(value => (value.value())).saveAsTextFile("path")
println("Saved Data into file")
var commits:OffsetCommitCallback = null
rdd.foreachPartition(message => {
val hbaseConf = HBaseConfiguration.create()
val conn = ConnectionFactory.createConnection(hbaseConf)
val table = conn.getTable(TableName.valueOf(hbaseTable))
commits = new OffsetCommitCallback(){
def onComplete(offsets: java.util.Map[TopicPartition, OffsetAndMetadata],
exception: Exception) {
message.foreach(value => {
val key= value.key()
val offset = value.offset()
println(s"offset is: $offset")
val partitionId = TaskContext.get.partitionId()
println(s"partitionID is: $partitionId")
val rowKey = key
val put = new Put(rowKey.getBytes)
if (exception != null) {
println("Got Error Message:" + exception.getMessage)
put.addColumn("o".getBytes, "flag".toString.getBytes(),"Error".toString.getBytes())
put.addColumn("o".getBytes,"error_message".toString.getBytes(),exception.getMessage.toString.getBytes())
println("Got Error Message:" + exception.getMessage)
table.put(put)
} else {
put.addColumn("o".getBytes, "flag".toString.getBytes(),"Success".toString.getBytes())
table.put(put)
println(offsets.values())
}
}
)
println("Inserted into HBase")
}
}
table.close()
conn.close()
}
)
stream.asInstanceOf[CanCommitOffsets].commitAsync(offsetRanges, commits)
}
)
ssc.start()
此代码成功执行。但是,它既不会将数据保存到HBase中,也不会在执行器级别生成日志(我在遍历RDD的每个分区时打印日志)。不知道我到底错过了什么。任何帮助都将不胜感激。您从未启动过
ssc
。我忘了提到这一点,在上面的代码中添加了它。但我确实在执行代码时添加了这一点,如前所述,无法实现我愿意实现的目标。