Java 使用Flink窗口和折叠功能，元素丢失？_Java_Apache Flink_Flink Streaming_Flink Cep_Flinkml

Java 使用Flink窗口和折叠功能，元素丢失？

java apache-flink

Java 使用Flink窗口和折叠功能，元素丢失？,java,apache-flink,flink-streaming,flink-cep,flinkml,Java,Apache Flink,Flink Streaming,Flink Cep,Flinkml,当我尝试使用window和fold函数聚合元素时，有些在进行聚合时，缺少元素的个数。消费要素来自卡夫卡（值：0，值：1，值：2，值：3）并聚合它们作为奇数和偶数值输出为： {even=[0, 2, 4], odd=[1, 3]} {even=[6, 8], odd=[5, 7, 9]} {even=[14, 16, 18], odd=[15, 17]} {even=[20, 22], odd=[19, 21, 23]} {even=[24, 26, 28], odd=[25, 27]}

当我尝试使用window和fold函数聚合元素时，有些在进行聚合时，缺少元素的个数。消费要素来自卡夫卡

（值：0，值：1，值：2，值：3）

并聚合它们作为奇数和偶数值

输出为：

{even=[0, 2, 4], odd=[1, 3]}
{even=[6, 8], odd=[5, 7, 9]}
{even=[14, 16, 18], odd=[15, 17]}
{even=[20, 22], odd=[19, 21, 23]}
{even=[24, 26, 28], odd=[25, 27]}

10-13之间的数字缺失，这发生在一组随机的数字。有人能建议下面的代码中遗漏了什么吗如何确保处理所有元素

public static class Splitter implements FlatMapFunction<String, 
    Tuple3<String, String, List<String>>{
    private static final long serialVersionUID = 1L;

    @Override
    public void flatMap(String value, Collector<Tuple3<String, String, 
        List<String>>out) throws Exception {
        String[] vals = value.split(":");

        if(vals.length 1 && Integer.parseInt(vals[1]) % 2 == 0){
            out.collect(new Tuple3<String, String, List<String>>
             ("test","even", Arrays.asList(vals[1])));
        }else{
            out.collect(new Tuple3<String, String, List<String>>
            ("test","odd", Arrays.asList(vals[1])));
        }
    }
}


    DataStream<Map<String, List<String>>streamValue = 
    kafkaStream.flatMap(new Splitter()).keyBy(0)
    .window(TumblingEventTimeWindows.of(Time.milliseconds(3000))).
    trigger(CustomizedCountTrigger.of(5L))//.trigger(CountTrigger.of(2))
    .fold(new HashMap<String, List<String>>(), new 
    FoldFunction<Tuple3<String, String, List<String>>, Map<String, 
    List<String>>>() {
        private static final long serialVersionUID = 1L;

        @Override
        public Map<String, List<String>fold(Map<String, 
        List<String>accumulator,
        Tuple3<String, String, List<String>value) throws 
        Exception {
            if(accumulator.get(value.f1) != null){
                List<Stringlist = new ArrayList<>();
                list.addAll(accumulator.get(value.f1));
                list.addAll(value.f2);
                accumulator.put(value.f1, list);
            }else{
                accumulator.put(value.f1, value.f2);
            }
            return accumulator;
        }
    });

    streamValue.print();
    env.execute("window test");
}


public class CustomizedCountTrigger<W extends Windowextends 
Trigger<Object, W{

    private static final long serialVersionUID = 1L;
    private final long maxCount;

    private final ReducingStateDescriptor<LongstateDesc =
    new ReducingStateDescriptor<>("count", new Sum(), 
    LongSerializer.INSTANCE);

    private CustomizedCountTrigger(long maxCount) {
        this.maxCount = maxCount;
    }

    @Override
    public TriggerResult onElement(Object element, long timestamp, W window,
    TriggerContext ctx) throws Exception {
        ReducingState<Longcount = ctx.getPartitionedState(stateDesc);
        count.add(1L);
        if (count.get() >= maxCount) {
            count.clear();
            return TriggerResult.FIRE_AND_PURGE;
        }
        return TriggerResult.CONTINUE;
    }

    @Override
    public TriggerResult onProcessingTime(long time, W window,

    org.apache.flink.streaming.api.windowing.triggers.Trigger.TriggerContext

    ctx) throws Exception {
        return TriggerResult.CONTINUE;
    }

    @Override
    public TriggerResult onEventTime(long time, W window,

    org.apache.flink.streaming.api.windowing.triggers.Trigger.TriggerContext

    ctx) throws Exception {
        return TriggerResult.CONTINUE;
    }

    @Override
    public void clear(W window, 
    org.apache.flink.streaming.api.windowing.triggers.Trigger.TriggerContext

    ctx)
    throws Exception {
        ctx.getPartitionedState(stateDesc).clear();
    }

    @Override
    public String toString() {
        return "CountTrigger(" +  maxCount + ")";
    }

    public static <W extends WindowCustomizedCountTrigger<Wof(long 
    maxCount) {
        return new CustomizedCountTrigger<>(maxCount);
    }

    private static class Sum implements ReduceFunction<Long{
        private static final long serialVersionUID = 1L;

        @Override
        public Long reduce(Long value1, Long value2) throws Exception {
            return value1 + value2;
        }

    }
}

公共静态类拆分器实现FlatMapFunction，因此我开始编写本文的第一部分，之前我注意到您的自定义触发器使您使用的TumblingEventTime窗口有点不相关，但我还是想包括我的原始想法，因为我不能完全确定为什么在不使用EventTime窗口的情况下使用它。意识到这一点后，我的反应低于原来的水平
您是在单并行还是多并行上运行此操作？我之所以问这个问题，是因为如果它是多重并行的（而且卡夫卡主题也由多个分区组成），那么消息的接收和处理可能是以非顺序的顺序进行的。这可能导致带有时间戳的消息导致水印前进，从而导致窗口处理消息。然后，下一条消息的事件时间早于当前水印时间（也称为“延迟”），这将导致消息被丢弃
例如：如果有20个元素，每个元素的事件时间如下：
message1:eventTime:1000
信息1:eventTime:2000
等等
您的活动时间窗口为5001ms
现在消息message1到message9按顺序发送。将处理第一个窗口并包含消息1-5（消息6将导致处理该窗口）。现在，如果message11在message10之前出现，它将导致处理包含消息6-9的窗口。当message10下一个出现时，水印已经超过了message10的事件时间，导致它作为“延迟事件”被删除
正确答案
尝试使用countWindow，而不是使用eventTime窗口和自定义触发器
因此，请将其替换为：
.window(TumblingEventTimeWindows.of(Time.milliseconds(3000))).
trigger(CustomizedCountTrigger.of(5L))//.trigger(CountTrigger.of(2))

为此：
.countWindow(5L)

非常感谢您的时间和解释。我同意使用eventTimeWindow并导致邮件被丢弃。但我的用例如下所示。在此之前，我想澄清一下，我尝试使用并行（1）和并行（2），但问题仍然是一样的，有些事件被丢弃了。我的用例是在一个业务逻辑被评估为true时处理一组事件。e、 例如，如果事件总数大于3，或偶数事件总数大于5，或预定义的时间窗口交叉（例如，2秒）。另外，我理解如果你用我们自己的一个来覆盖窗口触发器，那么实际的触发器将不再被考虑。在这种情况下，窗口的时间流逝。环境设置流时间特征（时间特征、摄取时间）；环境(一)@我很欣赏你的观点。包括一个自定义触发器会覆盖默认触发器。但是触发自定义触发器的内容仍然处于活动状态。因此，当3000ms的TumblingEventTime窗口完成时，它将触发自定义触发器中的自定义OneEventTime方法。但是您将onEventTime方法设置为仅继续，而不触发和/或清除（而默认触发器将返回fire_和_purge），从我所知，这使得事件时间窗口基本上毫无意义。此外，我看不到您在计算事件的事件时间，所以我猜您的意思是使用处理时间来代替？但是，即使您是，也会被告知在您拥有的自定义触发器中继续，因此不会发生任何事情。但是如果您确实想使用eventTime，那么在自定义触发器返回TriggerResult.FIRE\u和\u PURGE中使用onEventTime方法。如果要使用处理时间，请在自定义触发器return TriggerResult.FIRE\u和\u PURGE中使用onProcessingTime方法，并将TumblingEventTimeWindows.of（）更改为TumblingProcessingTimeWindows.of（），非常感谢@Jicaar。这有助于更好地理解它。