Apache flink 使用coGroupFunction查找未分组的事件流

Apache flink 使用coGroupFunction查找未分组的事件流,apache-flink,correlation,flink-streaming,Apache Flink,Correlation,Flink Streaming,使用CoGroupFunction时,我们如何找到与其他事件不匹配的事件流 让人们考虑通过电话进行交流。在Tuple2中,f0是人名,f1是他们拨打或接听电话的电话号码。 我们使用coGroup对他们进行配对,但是我们缺少的是从外部世界接到电话的人 final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); env.setStreamTimeCharacteristic(

使用
CoGroupFunction
时,我们如何找到与其他事件不匹配的事件流

让人们考虑通过电话进行交流。在
Tuple2
中,
f0
是人名,
f1
是他们拨打或接听电话的电话号码。 我们使用
coGroup
对他们进行配对,但是我们缺少的是从外部世界接到电话的人

final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
DataStream<Tuple2<String, Integer>> callers = env.fromElements(
        new Tuple2<String, Integer>("alice->", 12), // alice dials 12
        new Tuple2<String, Integer>("bob->", 13),   // bob dials 13
        new Tuple2<String, Integer>("charlie->", 19))
        .assignTimestampsAndWatermarks(new TimestampExtractor(Time.seconds(5)));

DataStream<Tuple2<String, Integer>> callees = env.fromElements(
        new Tuple2<String, Integer>("->carl", 12), // carl received call
        new Tuple2<String, Integer>("->ted", 13),
        new Tuple2<String, Integer>("->chris", 7))
        .assignTimestampsAndWatermarks(new TimestampExtractor(Time.seconds(5)));;

DataStream<Tuple1<String>> groupedStream = callers.coGroup(callees)
        .where(evt -> evt.f1).equalTo(evt -> evt.f1)
        .window(TumblingEventTimeWindows.of(Time.seconds(10)))
        .apply(new IntEqualCoGroupFunc());

groupedStream.print(); // prints 1> (alice->-->carl) \n 1> (bob->-->ted)

//DataStream<Tuple1<String>> notGroupedStream = ..; // people without pairs in last window
//notGroupedStream.print(); // should print charlie->-->someone \n someone->-->chris

env.execute();
final StreamExecutionEnvironment env=StreamExecutionEnvironment.getExecutionEnvironment();
环境setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
DataStream调用者=env.fromElements(
新的元组2(“爱丽丝->”,12),//爱丽丝拨12
新的元组2(“bob->”,13),//bob拨13
新元组2(“查理->”,19))
.assignTimestampsAndWatermarks(新的TimestampExtractor(时间秒(5)));
数据流被调用方=env.fromElements(
新的元组2(“->carl”,12),//carl收到了呼叫
新元组2(“->ted”,13),
新的Tuple2(“->chris”,7))
.assignTimestampsAndWatermarks(新的TimestampExtractor(Time.seconds(5)));;
DataStream groupedStream=调用者.coGroup(被调用者)
.where(evt->evt.f1).equalTo(evt->evt.f1)
.window(TumblingEventTimeWindows.of(时间秒(10)))
.apply(新的IntequalGroupFunc());
groupedStream.print();//打印1>(爱丽丝->-->卡尔)\n 1>(鲍勃->-->泰德)
//DataStream notGroupedStream=..;//最后一个窗口中没有成对的人
//notGroupedStream.print();//应该打印charlie->-->某人\n某人->-->chris
execute();

老实说,最简单的解决方案似乎是更改
IntequalGroupFunc
,这样它将返回
(布尔值,字符串)
,而不是
字符串。
这是因为
coGroup
还处理那些没有匹配键的元素,这些元素在
coGroup(Iterable first,Iterable second,Collector out)函数中将有一个
Iterable
即对于您的情况,它将接收
(“->chris”,7)
as
first
和empty
Iterable
as
second

签名的更改还可以让您轻松地发出没有匹配密钥的结果,并在处理的后期将它们简单地拆分为单独的流

// Implementation of IntEqualCoGroupFunc
@Override
public void coGroup(Iterable<Tuple2<String, Integer>> outbound, Iterable<Tuple2<String, Integer>> inbound,
        Collector<Tuple1<String>> out) throws Exception {

    for (Tuple2<String, Integer> outboundObj : outbound) {
        for (Tuple2<String, Integer> inboundObj : inbound) {
            out.collect(Tuple1.of(outboundObj.f0 + "-" + inboundObj.f0)); //matching pair
            return;
        }
        out.collect(Tuple1.of(outboundObj.f0 + "->someone")); //inbound is empty
        return;
    }

    // outbound is empty
    for (Tuple2<String, Integer> inboundObj : inbound) {
        out.collect(Tuple1.of("someone->-" + inboundObj.f0));
        return;
    }
    //inbound also empty
    out.collect(Tuple1.of("someone->-->someone"));
}
2> (someone->-->chris)
2> (charlie->->someone)
1> (alice->-->carl)
1> (bob->-->ted)