Scala apacheflink使用coGroup实现左外连接
我一直在尝试使用Flink中的Scala apacheflink使用coGroup实现左外连接,scala,apache-flink,Scala,Apache Flink,我一直在尝试使用Flink中的CoGroupFunction连接两个流 我有两条小溪;它们是: S1 val m = env .addSource(new FlinkKafkaConsumer010[String]("topic-1", schema, props)) .map(gson.fromJson(_, classOf[Master])) .assignAscendingTimestamps(_.time) S2 val d = env .addSource(new FlinkKafk
CoGroupFunction
连接两个流
我有两条小溪;它们是:
S1
val m = env
.addSource(new FlinkKafkaConsumer010[String]("topic-1", schema, props))
.map(gson.fromJson(_, classOf[Master]))
.assignAscendingTimestamps(_.time)
S2
val d = env
.addSource(new FlinkKafkaConsumer010[String]("topic-2", schema, props))
.map(gson.fromJson(_, classOf[Detail]))
.assignAscendingTimestamps(_.time)
我的coGroup
实现是
class MasterDetailOuterJoin extends CoGroupFunction[Master, Detail,
(Master, Option[Detail])] {
override def coGroup(
leftElements : java.lang.Iterable[Master],
rightElements: java.lang.Iterable[Detail],
out: Collector[(Master, Option[Detail]) ]): Unit = {
for (leftElem <- leftElements) {
var isMatch = false
println(leftElem.orderNo)
for (rightElem <- rightElements) {
println(rightElem.orderNo)
out.collect((leftElem, Some(rightElem)))
isMatch = true
}
if (!isMatch) {
out.collect((leftElem, None))
}
}
}
}
但是,即使有一个匹配的大师和细节,也没有印刷!
我用console consumer监控kafka流,顺便说一句,它们工作得很好
如果我用一个内部连接来代替,我会得到结果
m.keyBy(_.orderNo)
.connect(d.keyBy(_.orderNo))
.flatMap(new MasterDetailInnerJoin) //RichCoFlatMapFunction
.map(gson.toJson(_, classOf[(Master, Detail)]))
.print
原来,我缺少的是,
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
- 并为每个流分配时间戳和水印提取器
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
- 并为每个流分配时间戳和水印提取器
m.keyBy(_.orderNo)
.connect(d.keyBy(_.orderNo))
.flatMap(new MasterDetailInnerJoin) //RichCoFlatMapFunction
.map(gson.toJson(_, classOf[(Master, Detail)]))
.print