Java Flink CEP模式与启动作业后的第一个事件不匹配,并且始终与以前的事件集匹配

Java Flink CEP模式与启动作业后的第一个事件不匹配,并且始终与以前的事件集匹配,java,apache-flink,flink-streaming,flink-cep,Java,Apache Flink,Flink Streaming,Flink Cep,我想用以下代码匹配Flink 1.4.0流媒体中的CEP模式: DataStream<Event> input = inputFromSocket.map(new IncomingMessageProcessor()).filter(new FilterEmptyAndInvalidEvents()); DataStream<Event> inputFiltered = input.assignTimestampsAndWatermarks(new Bo

我想用以下代码匹配Flink 1.4.0流媒体中的CEP模式:

    DataStream<Event> input = inputFromSocket.map(new IncomingMessageProcessor()).filter(new FilterEmptyAndInvalidEvents());

    DataStream<Event> inputFiltered = input.assignTimestampsAndWatermarks(new BoundedOutOfOrdernessGenerator());
    KeyedStream<Event, String> partitionedInput = inputFiltered.keyBy(new MyKeySelector());

    Pattern<Event, ?> pattern = Pattern.<Event>begin("start")
    .where(new ActionCondition("action1"))
    .followedBy("middle").where(new ActionCondition("action2"))
    .followedBy("end").where(new ActionCondition("action3"));

    pattern = pattern.within(Time.seconds(30));

    PatternStream<Event> patternStream = CEP.pattern(partitionedInput, pattern);
从我的自定义源(Google PubSub)中提取。 第一个过滤器
FilterEmptyAndInvalidEvents()
只过滤格式不正确的事件等,但在这种情况下不会出现这种情况。由于日志输出,我可以验证这一点。 因此,每个事件都通过
MyKeySelector.getKey()
方法运行

BoundedAutoforNeressGenerator
仅从一个字段提取时间戳:

public class BoundedOutOfOrdernessGenerator implements AssignerWithPeriodicWatermarks<Event> {
    private static Logger LOG = LoggerFactory.getLogger(BoundedOutOfOrdernessGenerator.class);
    private final long maxOutOfOrderness = 5500; // 5.5 seconds

    private long currentMaxTimestamp;

    @Override
    public long extractTimestamp(Event element, long previousElementTimestamp) {
        long timestamp = element.getOccurrenceTimeStamp();
        currentMaxTimestamp = Math.max(timestamp, currentMaxTimestamp);
        return timestamp;
    }

    @Override
    public Watermark getCurrentWatermark() {
        // return the watermark as current highest timestamp minus the out-of-orderness bound
        Watermark newWatermark = new Watermark(currentMaxTimestamp - maxOutOfOrderness);
        return newWatermark;
    }
}
ActionCondition
只是对事件中的一个字段进行比较,如下所示:

public class ActionCondition extends SimpleCondition<Event> {
    private static Logger LOG = LoggerFactory.getLogger(ActionCondition.class);

    private String filterForCommand = "";

    public ActionCondition(String filterForCommand) {
        this.filterForCommand = filterForCommand;
    }

    @Override
    public boolean filter(Event value) throws Exception {
        LOG.info("Filtering event for {} action: {}", filterForCommand, value);

        if (value == null) {
            return false;
        }

        if (value.getAction() == null) {
            return false;
        }

        if (value.getAction().equals(filterForCommand)) {
            LOG.info("It's a hit for the {} action for event {}", filterForCommand, value);
            return true;
        } else {
            LOG.info("It's a miss for the {} action for event {}", filterForCommand, value);
            return false;
        }
    }
}
FilterEmptyAndInvalidEvents   - Letting event Event::27ef8d25-8c3b-43fc-a228-fa0dda8e564d --- action: start, sender: RHHLWUi8sXH33AJIAAAA, timestamp: 1518194448701 through
MyKeySelector  - Partioning event Event::27ef8d25-8c3b-43fc-a228-fa0dda8e564d --- action: start, sender: RHHLWUi8sXH33AJIAAAA, timestamp: 1518194448701 by key RHHLWUi8sXH33AJIAAAA
FilterEmptyAndInvalidEvents   - Letting event Event::18b45a9c-b837-4b61-acf3-0b545a097203 --- action: click, sender: RHHLWUi8sXH33AJIAAAA, timestamp: 1518194448702 through
MyKeySelector  - Partioning event Event::18b45a9c-b837-4b61-acf3-0b545a097203 --- action: click, sender: RHHLWUi8sXH33AJIAAAA, timestamp: 1518194448702 by key RHHLWUi8sXH33AJIAAAA
FilterEmptyAndInvalidEvents   - Letting event Event::fe1486ab-d702-421d-be32-98dd38a1d306 --- action: connect, sender: RHHLWUi8sXH33AJIAAAA, timestamp: 1518194448703 through
MyKeySelector  - Partioning event Event::fe1486ab-d702-421d-be32-98dd38a1d306 --- action: connect, sender: RHHLWUi8sXH33AJIAAAA, timestamp: 1518194448703 by key RHHLWUi8sXH33AJIAAAA
MyKeySelector  - Partioning event Event::27ef8d25-8c3b-43fc-a228-fa0dda8e564d --- action: start, sender: RHHLWUi8sXH33AJIAAAA, timestamp: 1518194448701 by key RHHLWUi8sXH33AJIAAAA
MyKeySelector  - Partioning event Event::18b45a9c-b837-4b61-acf3-0b545a097203 --- action: click, sender: RHHLWUi8sXH33AJIAAAA, timestamp: 1518194448702 by key RHHLWUi8sXH33AJIAAAA
MyKeySelector  - Partioning event Event::fe1486ab-d702-421d-be32-98dd38a1d306 --- action: connect, sender: RHHLWUi8sXH33AJIAAAA, timestamp: 1518194448703 by key RHHLWUi8sXH33AJIAAAA
TimeCharacteristic通过设置为EventTime

env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
事件包含正确的时间戳

如果我现在发送另外3个带有动作的事件(但带有新的时间戳等)

  • 行动1
  • 行动2
  • 行动3
  • 模式与第一个事件集相匹配。 我知道它与第一组事件匹配,因为出于调试目的,我用guid标记了每个事件,并打印了匹配的事件的guid

    当发送第三、第四条时。。。在这3个事件的集合中,始终会匹配上一组事件。 因此,在模式检测中似乎存在某种“偏移”。不过,这似乎不是时间问题,因为如果我在发送后等待很长时间(并且看到Flink对事件进行了分区),那么第一组事件也不匹配


    我的代码有什么问题吗?或者为什么flink总是将前一组事件与模式匹配?

    我确实解决了它-我总是在流源点搜索,但我的事件处理实际上完全没有问题。问题是,我的水印生成没有持续发生。 正如您在上面的代码中所看到的,我只在收到事件时生成了水印

    但在发送前3个事件后,在我的设置中没有其他事件。因此,不再生成新的水印

    由于没有创建时间戳大于序列最后一次接收事件时间戳的新水印,Flink从未处理这些元素。原因如下:

    重要的一句话是:

    …当水印到达时,将处理缓冲区中时间戳小于水印的所有元素

    因此,由于我在
    BoundedAutofordernessGenerator
    中以5.5秒的延迟生成水印,所以最新的水印总是在最后一个事件的时间戳之前5.5秒。因此,事件从未被处理

    所以,解决这个问题的一个方法是定期生成水印,假设事件发生的特定延迟。为此,我们需要为ExecutionConfig设置
    setAutoWatermarkInterval

    final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
    ..
    ExecutionConfig executionConfig = env.getConfig();
    executionConfig.setAutoWatermarkInterval(1000L);
    
    这使Flink能够在给定的时间(在本例中为每秒)周期性地调用水印生成器,并提取新的水印

    此外,我们需要调整时间戳/水印生成器,以便即使没有新事件流入,它也会发出新的时间戳。为此,我操纵了弗林克的飞船:

    public class BoundedOutOfOrdernessGenerator implements AssignerWithPeriodicWatermarks<Event> {
    
        private static final long serialVersionUID = 1L;
    
        /** The current maximum timestamp seen so far. */
        private long currentMaxTimestamp;
    
        /** The timestamp of the last emitted watermark. */
        private long lastEmittedWatermark = Long.MIN_VALUE;
    
        /**
         * The (fixed) interval between the maximum seen timestamp seen in the records
         * and that of the watermark to be emitted.
         */
        private final long maxOutOfOrderness;
    
        public BoundedOutOfOrdernessGenerator() {
            Time maxOutOfOrderness = Time.seconds(5);
    
            if (maxOutOfOrderness.toMilliseconds() < 0) {
                throw new RuntimeException("Tried to set the maximum allowed " + "lateness to " + maxOutOfOrderness
                        + ". This parameter cannot be negative.");
            }
            this.maxOutOfOrderness = maxOutOfOrderness.toMilliseconds();
            this.currentMaxTimestamp = Long.MIN_VALUE + this.maxOutOfOrderness;
        }
    
        public long getMaxOutOfOrdernessInMillis() {
            return maxOutOfOrderness;
        }
    
        /**
         * Extracts the timestamp from the given element.
         *
         * @param element The element that the timestamp is extracted from.
         * @return The new timestamp.
         */
        public long extractTimestamp(Event element) {
            long timestamp = element.getOccurrenceTimeStamp();
            return timestamp;
        }
    
        @Override
        public final Watermark getCurrentWatermark() {
            Instant instant = Instant.now();
            long nowTimestampMillis = instant.toEpochMilli();
            long latenessTimestamp = nowTimestampMillis - maxOutOfOrderness;
    
            if (latenessTimestamp >= currentMaxTimestamp) {
                currentMaxTimestamp = latenessTimestamp;
            }
    
            // this guarantees that the watermark never goes backwards.
            long potentialWM = currentMaxTimestamp - maxOutOfOrderness;
            if (potentialWM >= lastEmittedWatermark) {
                lastEmittedWatermark = potentialWM;
            }
            return new Watermark(lastEmittedWatermark);
        }
    
        @Override
        public final long extractTimestamp(Event element, long previousElementTimestamp) {
            long timestamp = extractTimestamp(element);
            if (timestamp > currentMaxTimestamp) {
                currentMaxTimestamp = timestamp;
            }
            return timestamp;
        }
    }
    
    公共类BoundedAutoFordernessGenerator实现了带有周期性水印的赋值器{
    私有静态最终长serialVersionUID=1L;
    /**到目前为止看到的当前最大时间戳*/
    私有长currentMaxTimestamp;
    /**上次发出的水印的时间戳*/
    私有long lastEmittedWatermark=long.MIN_值;
    /**
    *在记录中看到的最大时间戳之间的(固定)间隔
    *以及要发射的水印的。
    */
    私有最终长最大有序度;
    公共边界自动FordernessGenerator(){
    时间maxOutOfOrderness=时间。秒(5);
    if(maxOutOfOrderness.toMilliseconds()<0){
    抛出新的RuntimeException(“试图将允许的最大“+”延迟设置为“+maxOutOfOrderness
    +“。此参数不能为负。”);
    }
    this.maxOutOfOrderness=maxOutOfOrderness.tomillesons();
    this.currentMaxTimestamp=Long.MIN_值+this.maxOutOfOrderness;
    }
    公共长getMaxOutOfOrdernessInMillis(){
    返回maxOutOfOrderness;
    }
    /**
    *从给定元素中提取时间戳。
    *
    *@param element从中提取时间戳的元素。
    *@返回新的时间戳。
    */
    公共长提取时间戳(事件元素){
    long timestamp=element.getOccurrenceTimeStamp();
    返回时间戳;
    }
    @凌驾
    公共最终水印getCurrentWatermark(){
    Instant-Instant=Instant.now();
    long now timestampillis=instant.toEpochMilli();
    long latenessTimestamp=NOWTIMESTAMPILLIAMS-maxOutOfOrderness;
    如果(latenessTimestamp>=currentMaxTimestamp){
    currentMaxTimestamp=latenessTimestamp;
    }
    //这保证了水印永远不会倒退。
    长电位WM=currentMaxTimestamp-maxOutOfOrderness;
    if(电位WM>=最后发射的水印){
    lastEmittedWatermark=电位wm;
    }
    返回新水印(lastEmittedWatermark);
    }
    @凌驾
    公共最终长提取时间戳(事件元素,长previousElementTimestamp){
    长时间戳=提取时间戳(元素);
    如果(时间戳>currentMaxTimestamp){
    currentMaxTimestamp=时间戳;
    }
    返回时间戳;
    }
    }
    
    正如您在
    getCurrentWatermark()
    中所看到的,我获取当前历元时间戳,减去我们期望的最大延迟,然后从该时间戳创建水印

    弗林克现在一起拉了一把
    final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
    ..
    ExecutionConfig executionConfig = env.getConfig();
    executionConfig.setAutoWatermarkInterval(1000L);
    
    public class BoundedOutOfOrdernessGenerator implements AssignerWithPeriodicWatermarks<Event> {
    
        private static final long serialVersionUID = 1L;
    
        /** The current maximum timestamp seen so far. */
        private long currentMaxTimestamp;
    
        /** The timestamp of the last emitted watermark. */
        private long lastEmittedWatermark = Long.MIN_VALUE;
    
        /**
         * The (fixed) interval between the maximum seen timestamp seen in the records
         * and that of the watermark to be emitted.
         */
        private final long maxOutOfOrderness;
    
        public BoundedOutOfOrdernessGenerator() {
            Time maxOutOfOrderness = Time.seconds(5);
    
            if (maxOutOfOrderness.toMilliseconds() < 0) {
                throw new RuntimeException("Tried to set the maximum allowed " + "lateness to " + maxOutOfOrderness
                        + ". This parameter cannot be negative.");
            }
            this.maxOutOfOrderness = maxOutOfOrderness.toMilliseconds();
            this.currentMaxTimestamp = Long.MIN_VALUE + this.maxOutOfOrderness;
        }
    
        public long getMaxOutOfOrdernessInMillis() {
            return maxOutOfOrderness;
        }
    
        /**
         * Extracts the timestamp from the given element.
         *
         * @param element The element that the timestamp is extracted from.
         * @return The new timestamp.
         */
        public long extractTimestamp(Event element) {
            long timestamp = element.getOccurrenceTimeStamp();
            return timestamp;
        }
    
        @Override
        public final Watermark getCurrentWatermark() {
            Instant instant = Instant.now();
            long nowTimestampMillis = instant.toEpochMilli();
            long latenessTimestamp = nowTimestampMillis - maxOutOfOrderness;
    
            if (latenessTimestamp >= currentMaxTimestamp) {
                currentMaxTimestamp = latenessTimestamp;
            }
    
            // this guarantees that the watermark never goes backwards.
            long potentialWM = currentMaxTimestamp - maxOutOfOrderness;
            if (potentialWM >= lastEmittedWatermark) {
                lastEmittedWatermark = potentialWM;
            }
            return new Watermark(lastEmittedWatermark);
        }
    
        @Override
        public final long extractTimestamp(Event element, long previousElementTimestamp) {
            long timestamp = extractTimestamp(element);
            if (timestamp > currentMaxTimestamp) {
                currentMaxTimestamp = timestamp;
            }
            return timestamp;
        }
    }