Apache flink 事件时间窗口从卡夫卡流导致；违反了时间戳单调性；错误_Apache Flink

Apache flink 事件时间窗口从卡夫卡流导致；违反了时间戳单调性；错误

apache-flink

Apache flink 事件时间窗口从卡夫卡流导致；违反了时间戳单调性；错误,apache-flink,Apache Flink,我正在阅读卡夫卡主题数据，该主题基于设备ID字段进行分区。共有15个分区，每个设备ID对应一个分区主题中的数据如下所示： { “时间戳”：“2018-05-03T14:32:04.910Z”， “系列”：“产量”， “设备ID”：“5454-07”， “价值”：1 } 在设备ID下的同一分区中，可能有两条记录中的一条，即生产输出或生产输入。我的目标是基于eventTime对每分钟的生产输出求和到目前为止，我的代码就是这样的 env.setStreamTimeCharacteri

我正在阅读卡夫卡主题

数据

，该主题基于

设备ID

字段进行分区。共有15个分区，每个设备ID对应一个分区

主题中的数据如下所示：


{
“时间戳”：“2018-05-03T14:32:04.910Z”，
“系列”：“产量”，
“设备ID”：“5454-07”，
“价值”：1
}

在

设备ID

下的同一分区中，可能有两条记录中的一条，即

生产输出

或

生产输入

。我的目标是基于eventTime对每分钟的生产输出求和

到目前为止，我的代码就是这样的

    env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
    env.setParallelism(15);

        // Add kafka consumer to DataStream
    DataStream<String> stream = env.addSource(kafkaConsumer);

    DataStream keyedStream = stream
            .map(new SeriesMap())
            // Filter "production-output" seriesType
            .filter(new FilterFunction<Tuple4<Long, String, String, Double>>() {
                @Override
                public boolean filter(Tuple4<Long, String, String, Double> data) throws Exception {
                    if (data.f1.equals("production-output")) {
                        return true;
                    }
                    return false;
                }
            })
            // Key on "equipmentId"
            .keyBy(2);

    DataStreamSink sink = keyedStream
            .assignTimestampsAndWatermarks(new AscendingTimestampExtractor<Tuple4<Long, String, String, Double>>() {
                @Override
                public long extractAscendingTimestamp(Tuple4<Long, String, String, Double> data) {
                    return data.f0;
                }
            })
            // Key on "equipmentId"
            .keyBy(2)
            .timeWindow(Time.seconds(1))
            .sum(3)
            .print();

输出如下所示：

15> (1525358087756,production-output,5454-07,1.0)
2> (1525358080269,production-output,5454-05,1.0)
2> (1525358085361,production-output,5454-05,1.0)
2> (1525358088469,production-output,5454-05,1.0)
2> (1525358097630,production-output,5454-05,1.0)
13> (1525358222081,production-output,5454-06,1.0)
13> (1525358223162,production-output,5454-06,1.0)
...
13> (1525358230305,production-output,5454-06,1.0)
13> (1525358234453,production-output,5454-06,1.0)
15> (1525358231998,production-output,5454-01,1.0)
15> (1525358231783,production-output,5454-10,1.0)
15> (1525358232803,production-output,5454-01,1.0)
15> (1525358233811,production-output,5454-01,1.0)
...
15> (1525358238878,production-output,5454-10,1.0)

因此，流15正在获取设备5454-10、01和07的数据而流4、5、6、7、8、10、11、12和14不存在于输出中

不是每台机器都有数据，所以我想我可能会面临

然而，我认为正在发生的是，一个线程被分配了不止一个键

非常感谢您的帮助

注意：我可以保证每个分区的时间戳顺序是连续的

更新：我按照约书亚·德瓦尔德的建议做了，并在源文件上调用了
assignTimestampsAndWatermarks
。我不再看到
时间戳单调性违反的原始问题
，但现在遇到了

谢谢
我认为，除非您能够保证时间戳在所有分区中向前推进，因为您在源代码之外提取时间戳和水印，否则您将得到此错误
您可以潜在地使用
seriemap
类作为Kafka反序列化模式，然后对Kafka源执行
assignTimestampsAndWatermarks
。然后，Kafka将不会出现时间戳在每个分区内分别向前移动的问题，并且它发出的全局水印将是所有分区中遇到的水印中的最小值
换句话说，这样做，您的全局事件时间将以最慢分区的速度向前移动。这里需要注意的一点是，每个分区必须至少发出一些数据，否则时间的前进进程将停止

请注意，Flink中的时间是全局的，而不是每个键
我认为，除非您能够保证时间戳在所有分区中向前推进，因为您在源代码之外提取时间戳和水印，否则您将得到此错误
您可以潜在地使用
seriemap
类作为Kafka反序列化模式，然后对Kafka源执行
assignTimestampsAndWatermarks
。然后，Kafka将不会出现时间戳在每个分区内分别向前移动的问题，并且它发出的全局水印将是所有分区中遇到的水印中的最小值
换句话说，这样做，您的全局事件时间将以最慢分区的速度向前移动。这里需要注意的一点是，每个分区必须至少发出一些数据，否则时间的前进进程将停止
请注意，Flink中的时间是全局的，而不是每个键

15> (1525358087756,production-output,5454-07,1.0) 2> (1525358080269,production-output,5454-05,1.0) 2> (1525358085361,production-output,5454-05,1.0) 2> (1525358088469,production-output,5454-05,1.0) 2> (1525358097630,production-output,5454-05,1.0) 13> (1525358222081,production-output,5454-06,1.0) 13> (1525358223162,production-output,5454-06,1.0) ... 13> (1525358230305,production-output,5454-06,1.0) 13> (1525358234453,production-output,5454-06,1.0) 15> (1525358231998,production-output,5454-01,1.0) 15> (1525358231783,production-output,5454-10,1.0) 15> (1525358232803,production-output,5454-01,1.0) 15> (1525358233811,production-output,5454-01,1.0) ... 15> (1525358238878,production-output,5454-10,1.0)