Session .kafka streams会话窗口:聚合间歇性失败
我们使用Kafka streams(0.10.2.0)来聚合相关事件。我们正在使用SessionWindows聚合事件。50%的情况下,这种聚合似乎不会发生。 以下是场景: 请求-1:已成功聚合所有事件Session .kafka streams会话窗口:聚合间歇性失败,session,apache-kafka,aggregation,apache-kafka-streams,Session,Apache Kafka,Aggregation,Apache Kafka Streams,我们使用Kafka streams(0.10.2.0)来聚合相关事件。我们正在使用SessionWindows聚合事件。50%的情况下,这种聚合似乎不会发生。 以下是场景: 请求-1:已成功聚合所有事件 GroupID:abc 活动: E1;事件时间=2017-05-31T14:36:56.653Z E2;事件时间=2017-05-31T14:36:56.653Z E3;事件时间=2017-05-31T14:36:56.653Z 请求-2:未聚合任何事件 GroupID:efg 活动: E
- GroupID:abc
- 活动: E1;事件时间=2017-05-31T14:36:56.653Z E2;事件时间=2017-05-31T14:36:56.653Z E3;事件时间=2017-05-31T14:36:56.653Z
- GroupID:efg
- 活动: E1;事件时间=2017-05-31T14:36:56.653Z E2;事件时间=2017-05-31T14:36:56.653Z E3;事件时间=2017-05-31T14:36:56.653Z
TimestampExtractor:流使用事件时间提取器将事件分组到windows 窗口类型:会话窗口。
Windows非活动期=2分钟 流配置: 缓存\最大\字节\缓冲\配置=0
时间戳\u提取器\u类\u配置=EventTimeExtractorImpl 代码段:
KStreamBuilder builder = new KStreamBuilder();
final KStream<String, GenericRecord> events = builder.stream(_appProperties.collationSourceTopic);
events.print();
KGroupedStream<String, GenericRecord> groupedStream = events.groupByKey(Serdes.String(), GenericCdsSerde.GenericCdsSerde());
SessionWindows tmpSessionWindows = SessionWindows.with(TimeUnit.MINUTES.toMillis(Long.parseLong(_appProperties.collationWindowInMins)));
KTable<Windowed<String>, List<GenericRecord>> sessionizedAggregatedStream = groupedStream.aggregate(
ArrayList::new,
(aggKey, newValue, aggValue) -> {
try {
aggValue.add(newValue);
} catch (Exception e) {
logger.error("failed aggr session windows", e);
return null;
}
return aggValue;
},
(aggKey, leftAggValue, rightAggValue) -> {
try {
leftAggValue.addAll(rightAggValue);
} catch (Exception e) {
logger.error("failed merging session windows", e);
return null;
}
return leftAggValue;
},
tmpSessionWindows, /* session window */
GenericListCdsSerde.GenericListCdsSerde(),
"session-store";
sessionizedAggregatedStream.print();
sessionizedAggregatedStream.toStream().foreach((stringWindowed, s) ->
logger.info("WindowedTable: window: " + stringWindowed.key()
+ "start ==> " + ((SessionWindow)stringWindowed.window()).start()
+ " end ==> " + stringWindowed.window().end()
+ " windowedValue: " + s));
KStreamBuilder builder=new KStreamBuilder();
final KStream events=builder.stream(_appProperties.collationSourceTopic);
events.print();
KGroupedStream groupedStream=events.groupByKey(Serdes.String(),GenericCdsSerde.GenericCdsSerde());
SessionWindows tmpSessionWindows=SessionWindows.with(TimeUnit.MINUTES.toMillis(Long.parseLong(_-appProperties.collationWindowInMins));
KTable sessionedaggregatedstream=groupedStream.aggregate(
ArrayList::新,
(aggKey、newValue、aggValue)->{
试一试{
aggValue.add(newValue);
}捕获(例外e){
logger.错误(“aggr会话窗口失败”,e);
返回null;
}
返回值;
},
(aggKey、leftAggValue、rightAggValue)->{
试一试{
addAll(rightAggValue);
}捕获(例外e){
logger.error(“合并会话窗口失败”,e);
返回null;
}
返回leftagg值;
},
TMP会话窗口,/*会话窗口*/
GenericListCdsSerde.GenericListCdsSerde(),
“会话存储”;
sessionedaggregatedstream.print();
SessionedAggregatedStream.toStream().foreach((stringWindowed,s)->
logger.info(“WindowedTable:window:+stringWindowed.key()
+“start==>”+((SessionWindow)stringWindowed.window()).start()
+“end==>”+stringWindowed.window().end()
+“windowedValue:”+s));
成功分组事件的日志:
GroupID:ng28
事件-1到达:
2017-06-01 09:57:23861信息流线程-1 c.s.c.c.s.EventTimeExtractor:49-=======读取组ID:ng28事件ID:1事件时间:2017-05-31T14:36:56.653Z
[KSTREAM-SOURCE-0000000000]:ng28,{“eventID”:“1”,“groupID”:“ng28”,“eventTime”:“2017-05-31T14:36:56.653Z”}
[KSTREAM-AGGREGATE-0000000003]:[ng28@1496241416653],([{“eventID”:“1”,“groupID”:“ng28”,“eventTime”:“2017-05-31T14:36:56.653Z”}]14962416653 end=>14962416653 windowedValue:[{“eventID”:“1”,“groupID”:“ng28”,“eventTime”:“2017-05-31T14:36:56.653Z”}]
事件-2到达:
2017-06-01 09:57:27158信息流线程-1 c.s.c.c.s.EventTimeExtractor:49-=======读取组ID:ng28事件ID:2事件时间:2017-05-31T14:36:56.653Z
[KSTREAM-SOURCE-0000000000]:ng28,{“eventID”:“2”,“groupID”:“ng28”,“eventTime”:“2017-05-31T14:36:56.653Z”}
2017-06-01 09:57:27158信息流线程-1 c.s.c.c.s.CSIStreamFactory:164----原始流:key:ng28值:{“eventID”:“2”,“groupID”:“ng28”,“eventTime”:“2017-05-31T14:36:56.653Z”}
[KSTREAM-AGGREGATE-0000000003]:[ng28@1496241416653],([{eventID:“1”,“groupID:“ng28”,“eventTime:“2017-05-31T14:36:56.653Z”},{“eventID:“2”,“groupID:“ng28”,“eventTime:“2017-05-31T14:36:56.653Z”}]1496241416653 end=>14962416653 windowedValue:[{“eventID:“1”,“groupID:“ng28”,“eventTime:“2017-05-31T14:36:56.653Z”}{“事件ID”:“2”,“组ID”:“ng28”,“事件时间”:“2017-05-31T14:36:56.653Z”}]
事件-3到达:2017-06-01 09:57:31481信息流线程-1 c.s.c.c.s.EventTimeExtractor:49-=======读取组ID:ng28事件ID:3事件时间:2017-05-31T14:36:56.653Z [KSTREAM-SOURCE-0000000000]:ng28,{“eventID”:“3”,“groupID”:“ng28”,“eventTime”:“2017-05-31T14:36:56.653Z”} 2017-06-01 09:57:31482信息StreamThread-1 c.s.c.s.CSIStreamFactory:164----原始流:key:ng28值:{“eventID”:“3”,“groupID”:“ng28”,“eventTime”:“2017-05-31T14:36:56.653Z”} [KSTREAM-AGGREGATE-0000000003]:[ng28@1496241416653],([{eventID:“1”,“clientEventID:“123”,“groupID:“ng28”,“eventTime:“2017-05-31T14:36:56.653Z”},{“eventID:“2”,“groupID:“ng28”,“eventTime:“2017-05-31T14:36:56.653Z”},{“eventID:“3”,“groupID:“ng28”,“eventTime:“2017-05-31T14:36:56.653Z”}]14962416653 end=>14962416653 windowedValue:[{“eventID”:“1”,“clientEventID”:“123”,“groupID”:“ng28”,“eventTime”:“2017-05-31T14:36:56.653Z”},{“eventID”:“2”,“groupID”:“ng28”,“eventTime”:“2017-05-31T14:36:56.653Z”},{“eventID”:“3”,“groupID”:“ng28”,“eventTime”:“2017-05-31T14:36:56.653Z”}] 分组事件失败的日志:GroupID:ng30 GroupID:ng30;所有部分的事件时间与ng28组相似 事件-1到达:
2017-06-01 10:00:03004信息流线程-1 c.s.c.c.s.EventTimeExtractor:49-=======读取组ID:ng30事件ID:1事件时间:2017-05-31T14:36:56.653Z [KSTREAM-SOURCE-0000000000]:ng30,{“eventID”:“1”,“groupID”:“ng30”,“eventTime”:“2017”-