Scala Apache Spark,从单个卡夫卡输入主题到两个输出卡夫卡主题
一般来说,我需要的工作流程是:Scala Apache Spark,从单个卡夫卡输入主题到两个输出卡夫卡主题,scala,apache-spark,apache-kafka,Scala,Apache Spark,Apache Kafka,一般来说,我需要的工作流程是: 接收来自卡夫卡主题的消息raw 将字符串解析为case类 将所有错误发送到卡夫卡主题error reduceByKey解析的消息 将结果发送到Kafka topicparsed 卡夫卡主题的提交偏移量raw 后藤(1) 输入数据:Input:String 输出: 我尝试了两种方法来完成我的工作流程: 使用Spark RDD-不起作用,因为在reduceByKey之后,我无法获取提交RDD.asInstanceOf[HasOffsetRanges].offsetRa
raw
error
reduceByKey
解析的消息parsed
raw
后藤(1)Input:String
输出:
我尝试了两种方法来完成我的工作流程:
reduceByKey
之后,我无法获取提交RDD.asInstanceOf[HasOffsetRanges].offsetRanges的卡夫卡偏移范围,因为在第一次RDD.map{…}
之后它丢失了
found : org.apache.spark.sql.Dataset[Product with Serializable with Message]
required: org.apache.spark.sql.Dataset[Message]
Note: Product with Serializable with Message <: Message, but class Dataset is invariant in type T.
found:org.apache.spark.sql.Dataset[带有可序列化消息的产品]
必需:org.apache.spark.sql.Dataset[消息]
注意:带有可序列化消息的产品
.flatMap {
case raw: RawMessage =>
implicit val mapperInfo: ProviderKey = ProviderKey(RawMessageParser.name, RawMessageParser.version, 0)
Try(parse[List[ParsedMessage]](Left(raw.nxMessage))) match {
case Success(msg) =>
msg
case Failure(ex) =>
Seq(ParseError(raw.nxMessage, ex))
}
}
found : org.apache.spark.sql.Dataset[Product with Serializable with Message]
required: org.apache.spark.sql.Dataset[Message]
Note: Product with Serializable with Message <: Message, but class Dataset is invariant in type T.