Google cloud dataflow 全局窗口允许延迟_Google Cloud Dataflow

Google cloud dataflow 全局窗口允许延迟

google-cloud-dataflow

Google cloud dataflow 全局窗口允许延迟,google-cloud-dataflow,Google Cloud Dataflow,在游戏管道示例中，有一个定义了允许延迟的全局/内场窗口示例 public PCollection<KV<String, Integer>> apply(PCollection<GameActionInfo> input) { return input.apply("LeaderboardUserGlobalWindow", Window.<GameActionInfo>into(new GlobalWindows())

在游戏管道示例中，有一个定义了允许延迟的全局/内场窗口示例

public PCollection<KV<String, Integer>> apply(PCollection<GameActionInfo> input) {
    return input.apply("LeaderboardUserGlobalWindow",
        Window.<GameActionInfo>into(new GlobalWindows())
            // Get periodic results every ten minutes.
            .triggering(Repeatedly.forever(AfterProcessingTime.pastFirstElementInPane()
                .plusDelayOf(TEN_MINUTES)))
            .accumulatingFiredPanes()
            **.withAllowedLateness(allowedLateness)**)
        // Extract and sum username/score pairs from the event data.
        .apply("ExtractUserScore", new ExtractAndSumScore("user"));
  }

如果在处理时间的第35分钟，消息（X）到达，并且事件时间戳为第5分钟，那么我将有以下输出

old: A(10min)
old: B(20min)
old: C(30min)
new: A1(35min, equals all elements in A+X)

我的理解正确吗？

是的。你的理解是正确的

此外，任何超过36小时的数据都将被丢弃

阿尼尔，谢谢你的回复。我有没有办法确定输出是哪种类型的，特别是与哪个时期相关的？目前，我将A、B、C和A1写入同一个大查询表。它们都有相同的窗口时间戳，这是唯一窗口的时间戳，所以这对我没有帮助，它们都有提前的时间，索引只是一个增量，这对我也没有帮助？Michael，这是一个有趣的问题。IIUC，你的问题是如何区分每个触发器触发的元素，因为它们的窗口时间戳是相同的？是的，这正是我的问题。如果没有办法区分，那么无限窗口的允许延迟是没有用的，因为您现在从不知道结果集针对哪个触发器？

old: A(10min)
old: B(20min)
old: C(30min)
new: A1(35min, equals all elements in A+X)