Java Flink CEP模式与启动作业后的第一个事件不匹配,并且始终与以前的事件集匹配
我想用以下代码匹配Flink 1.4.0流媒体中的CEP模式:Java Flink CEP模式与启动作业后的第一个事件不匹配,并且始终与以前的事件集匹配,java,apache-flink,flink-streaming,flink-cep,Java,Apache Flink,Flink Streaming,Flink Cep,我想用以下代码匹配Flink 1.4.0流媒体中的CEP模式: DataStream<Event> input = inputFromSocket.map(new IncomingMessageProcessor()).filter(new FilterEmptyAndInvalidEvents()); DataStream<Event> inputFiltered = input.assignTimestampsAndWatermarks(new Bo
DataStream<Event> input = inputFromSocket.map(new IncomingMessageProcessor()).filter(new FilterEmptyAndInvalidEvents());
DataStream<Event> inputFiltered = input.assignTimestampsAndWatermarks(new BoundedOutOfOrdernessGenerator());
KeyedStream<Event, String> partitionedInput = inputFiltered.keyBy(new MyKeySelector());
Pattern<Event, ?> pattern = Pattern.<Event>begin("start")
.where(new ActionCondition("action1"))
.followedBy("middle").where(new ActionCondition("action2"))
.followedBy("end").where(new ActionCondition("action3"));
pattern = pattern.within(Time.seconds(30));
PatternStream<Event> patternStream = CEP.pattern(partitionedInput, pattern);
从我的自定义源(Google PubSub)中提取。
第一个过滤器FilterEmptyAndInvalidEvents()
只过滤格式不正确的事件等,但在这种情况下不会出现这种情况。由于日志输出,我可以验证这一点。
因此,每个事件都通过MyKeySelector.getKey()
方法运行
BoundedAutoforNeressGenerator
仅从一个字段提取时间戳:
public class BoundedOutOfOrdernessGenerator implements AssignerWithPeriodicWatermarks<Event> {
private static Logger LOG = LoggerFactory.getLogger(BoundedOutOfOrdernessGenerator.class);
private final long maxOutOfOrderness = 5500; // 5.5 seconds
private long currentMaxTimestamp;
@Override
public long extractTimestamp(Event element, long previousElementTimestamp) {
long timestamp = element.getOccurrenceTimeStamp();
currentMaxTimestamp = Math.max(timestamp, currentMaxTimestamp);
return timestamp;
}
@Override
public Watermark getCurrentWatermark() {
// return the watermark as current highest timestamp minus the out-of-orderness bound
Watermark newWatermark = new Watermark(currentMaxTimestamp - maxOutOfOrderness);
return newWatermark;
}
}
ActionCondition
只是对事件中的一个字段进行比较,如下所示:
public class ActionCondition extends SimpleCondition<Event> {
private static Logger LOG = LoggerFactory.getLogger(ActionCondition.class);
private String filterForCommand = "";
public ActionCondition(String filterForCommand) {
this.filterForCommand = filterForCommand;
}
@Override
public boolean filter(Event value) throws Exception {
LOG.info("Filtering event for {} action: {}", filterForCommand, value);
if (value == null) {
return false;
}
if (value.getAction() == null) {
return false;
}
if (value.getAction().equals(filterForCommand)) {
LOG.info("It's a hit for the {} action for event {}", filterForCommand, value);
return true;
} else {
LOG.info("It's a miss for the {} action for event {}", filterForCommand, value);
return false;
}
}
}
FilterEmptyAndInvalidEvents - Letting event Event::27ef8d25-8c3b-43fc-a228-fa0dda8e564d --- action: start, sender: RHHLWUi8sXH33AJIAAAA, timestamp: 1518194448701 through
MyKeySelector - Partioning event Event::27ef8d25-8c3b-43fc-a228-fa0dda8e564d --- action: start, sender: RHHLWUi8sXH33AJIAAAA, timestamp: 1518194448701 by key RHHLWUi8sXH33AJIAAAA
FilterEmptyAndInvalidEvents - Letting event Event::18b45a9c-b837-4b61-acf3-0b545a097203 --- action: click, sender: RHHLWUi8sXH33AJIAAAA, timestamp: 1518194448702 through
MyKeySelector - Partioning event Event::18b45a9c-b837-4b61-acf3-0b545a097203 --- action: click, sender: RHHLWUi8sXH33AJIAAAA, timestamp: 1518194448702 by key RHHLWUi8sXH33AJIAAAA
FilterEmptyAndInvalidEvents - Letting event Event::fe1486ab-d702-421d-be32-98dd38a1d306 --- action: connect, sender: RHHLWUi8sXH33AJIAAAA, timestamp: 1518194448703 through
MyKeySelector - Partioning event Event::fe1486ab-d702-421d-be32-98dd38a1d306 --- action: connect, sender: RHHLWUi8sXH33AJIAAAA, timestamp: 1518194448703 by key RHHLWUi8sXH33AJIAAAA
MyKeySelector - Partioning event Event::27ef8d25-8c3b-43fc-a228-fa0dda8e564d --- action: start, sender: RHHLWUi8sXH33AJIAAAA, timestamp: 1518194448701 by key RHHLWUi8sXH33AJIAAAA
MyKeySelector - Partioning event Event::18b45a9c-b837-4b61-acf3-0b545a097203 --- action: click, sender: RHHLWUi8sXH33AJIAAAA, timestamp: 1518194448702 by key RHHLWUi8sXH33AJIAAAA
MyKeySelector - Partioning event Event::fe1486ab-d702-421d-be32-98dd38a1d306 --- action: connect, sender: RHHLWUi8sXH33AJIAAAA, timestamp: 1518194448703 by key RHHLWUi8sXH33AJIAAAA
TimeCharacteristic通过设置为EventTime
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
事件包含正确的时间戳
如果我现在发送另外3个带有动作的事件(但带有新的时间戳等)
我的代码有什么问题吗?或者为什么flink总是将前一组事件与模式匹配?我确实解决了它-我总是在流源点搜索,但我的事件处理实际上完全没有问题。问题是,我的水印生成没有持续发生。 正如您在上面的代码中所看到的,我只在收到事件时生成了水印 但在发送前3个事件后,在我的设置中没有其他事件。因此,不再生成新的水印 由于没有创建时间戳大于序列最后一次接收事件时间戳的新水印,Flink从未处理这些元素。原因如下: 重要的一句话是: …当水印到达时,将处理缓冲区中时间戳小于水印的所有元素 因此,由于我在
BoundedAutofordernessGenerator
中以5.5秒的延迟生成水印,所以最新的水印总是在最后一个事件的时间戳之前5.5秒。因此,事件从未被处理
所以,解决这个问题的一个方法是定期生成水印,假设事件发生的特定延迟。为此,我们需要为ExecutionConfig设置setAutoWatermarkInterval
:
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
..
ExecutionConfig executionConfig = env.getConfig();
executionConfig.setAutoWatermarkInterval(1000L);
这使Flink能够在给定的时间(在本例中为每秒)周期性地调用水印生成器,并提取新的水印
此外,我们需要调整时间戳/水印生成器,以便即使没有新事件流入,它也会发出新的时间戳。为此,我操纵了弗林克的飞船:
public class BoundedOutOfOrdernessGenerator implements AssignerWithPeriodicWatermarks<Event> {
private static final long serialVersionUID = 1L;
/** The current maximum timestamp seen so far. */
private long currentMaxTimestamp;
/** The timestamp of the last emitted watermark. */
private long lastEmittedWatermark = Long.MIN_VALUE;
/**
* The (fixed) interval between the maximum seen timestamp seen in the records
* and that of the watermark to be emitted.
*/
private final long maxOutOfOrderness;
public BoundedOutOfOrdernessGenerator() {
Time maxOutOfOrderness = Time.seconds(5);
if (maxOutOfOrderness.toMilliseconds() < 0) {
throw new RuntimeException("Tried to set the maximum allowed " + "lateness to " + maxOutOfOrderness
+ ". This parameter cannot be negative.");
}
this.maxOutOfOrderness = maxOutOfOrderness.toMilliseconds();
this.currentMaxTimestamp = Long.MIN_VALUE + this.maxOutOfOrderness;
}
public long getMaxOutOfOrdernessInMillis() {
return maxOutOfOrderness;
}
/**
* Extracts the timestamp from the given element.
*
* @param element The element that the timestamp is extracted from.
* @return The new timestamp.
*/
public long extractTimestamp(Event element) {
long timestamp = element.getOccurrenceTimeStamp();
return timestamp;
}
@Override
public final Watermark getCurrentWatermark() {
Instant instant = Instant.now();
long nowTimestampMillis = instant.toEpochMilli();
long latenessTimestamp = nowTimestampMillis - maxOutOfOrderness;
if (latenessTimestamp >= currentMaxTimestamp) {
currentMaxTimestamp = latenessTimestamp;
}
// this guarantees that the watermark never goes backwards.
long potentialWM = currentMaxTimestamp - maxOutOfOrderness;
if (potentialWM >= lastEmittedWatermark) {
lastEmittedWatermark = potentialWM;
}
return new Watermark(lastEmittedWatermark);
}
@Override
public final long extractTimestamp(Event element, long previousElementTimestamp) {
long timestamp = extractTimestamp(element);
if (timestamp > currentMaxTimestamp) {
currentMaxTimestamp = timestamp;
}
return timestamp;
}
}
公共类BoundedAutoFordernessGenerator实现了带有周期性水印的赋值器{
私有静态最终长serialVersionUID=1L;
/**到目前为止看到的当前最大时间戳*/
私有长currentMaxTimestamp;
/**上次发出的水印的时间戳*/
私有long lastEmittedWatermark=long.MIN_值;
/**
*在记录中看到的最大时间戳之间的(固定)间隔
*以及要发射的水印的。
*/
私有最终长最大有序度;
公共边界自动FordernessGenerator(){
时间maxOutOfOrderness=时间。秒(5);
if(maxOutOfOrderness.toMilliseconds()<0){
抛出新的RuntimeException(“试图将允许的最大“+”延迟设置为“+maxOutOfOrderness
+“。此参数不能为负。”);
}
this.maxOutOfOrderness=maxOutOfOrderness.tomillesons();
this.currentMaxTimestamp=Long.MIN_值+this.maxOutOfOrderness;
}
公共长getMaxOutOfOrdernessInMillis(){
返回maxOutOfOrderness;
}
/**
*从给定元素中提取时间戳。
*
*@param element从中提取时间戳的元素。
*@返回新的时间戳。
*/
公共长提取时间戳(事件元素){
long timestamp=element.getOccurrenceTimeStamp();
返回时间戳;
}
@凌驾
公共最终水印getCurrentWatermark(){
Instant-Instant=Instant.now();
long now timestampillis=instant.toEpochMilli();
long latenessTimestamp=NOWTIMESTAMPILLIAMS-maxOutOfOrderness;
如果(latenessTimestamp>=currentMaxTimestamp){
currentMaxTimestamp=latenessTimestamp;
}
//这保证了水印永远不会倒退。
长电位WM=currentMaxTimestamp-maxOutOfOrderness;
if(电位WM>=最后发射的水印){
lastEmittedWatermark=电位wm;
}
返回新水印(lastEmittedWatermark);
}
@凌驾
公共最终长提取时间戳(事件元素,长previousElementTimestamp){
长时间戳=提取时间戳(元素);
如果(时间戳>currentMaxTimestamp){
currentMaxTimestamp=时间戳;
}
返回时间戳;
}
}
正如您在getCurrentWatermark()
中所看到的,我获取当前历元时间戳,减去我们期望的最大延迟,然后从该时间戳创建水印
弗林克现在一起拉了一把
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
..
ExecutionConfig executionConfig = env.getConfig();
executionConfig.setAutoWatermarkInterval(1000L);
public class BoundedOutOfOrdernessGenerator implements AssignerWithPeriodicWatermarks<Event> {
private static final long serialVersionUID = 1L;
/** The current maximum timestamp seen so far. */
private long currentMaxTimestamp;
/** The timestamp of the last emitted watermark. */
private long lastEmittedWatermark = Long.MIN_VALUE;
/**
* The (fixed) interval between the maximum seen timestamp seen in the records
* and that of the watermark to be emitted.
*/
private final long maxOutOfOrderness;
public BoundedOutOfOrdernessGenerator() {
Time maxOutOfOrderness = Time.seconds(5);
if (maxOutOfOrderness.toMilliseconds() < 0) {
throw new RuntimeException("Tried to set the maximum allowed " + "lateness to " + maxOutOfOrderness
+ ". This parameter cannot be negative.");
}
this.maxOutOfOrderness = maxOutOfOrderness.toMilliseconds();
this.currentMaxTimestamp = Long.MIN_VALUE + this.maxOutOfOrderness;
}
public long getMaxOutOfOrdernessInMillis() {
return maxOutOfOrderness;
}
/**
* Extracts the timestamp from the given element.
*
* @param element The element that the timestamp is extracted from.
* @return The new timestamp.
*/
public long extractTimestamp(Event element) {
long timestamp = element.getOccurrenceTimeStamp();
return timestamp;
}
@Override
public final Watermark getCurrentWatermark() {
Instant instant = Instant.now();
long nowTimestampMillis = instant.toEpochMilli();
long latenessTimestamp = nowTimestampMillis - maxOutOfOrderness;
if (latenessTimestamp >= currentMaxTimestamp) {
currentMaxTimestamp = latenessTimestamp;
}
// this guarantees that the watermark never goes backwards.
long potentialWM = currentMaxTimestamp - maxOutOfOrderness;
if (potentialWM >= lastEmittedWatermark) {
lastEmittedWatermark = potentialWM;
}
return new Watermark(lastEmittedWatermark);
}
@Override
public final long extractTimestamp(Event element, long previousElementTimestamp) {
long timestamp = extractTimestamp(element);
if (timestamp > currentMaxTimestamp) {
currentMaxTimestamp = timestamp;
}
return timestamp;
}
}