Java 梁:使用窗口边界写入每个窗口元素计数

Java 梁:使用窗口边界写入每个窗口元素计数,java,google-bigquery,google-cloud-dataflow,apache-beam,Java,Google Bigquery,Google Cloud Dataflow,Apache Beam,对于一个简单的概念证明,我尝试在两分钟的窗口中单击窗口数据。我只想从那里打印每个窗口的计数,以及窗口的边界到BigQuery。在运行管道时,我不断收到以下错误: org.apache.beam.sdk.Pipeline$PipelineExecutionException: java.lang.RuntimeException: java.io.IOException: Insert failed: [{"errors":[{"debugInfo":"","location":"windowen

对于一个简单的概念证明,我尝试在两分钟的窗口中单击窗口数据。我只想从那里打印每个窗口的计数,以及窗口的边界到BigQuery。在运行管道时,我不断收到以下错误:

org.apache.beam.sdk.Pipeline$PipelineExecutionException: java.lang.RuntimeException: java.io.IOException: Insert failed: [{"errors":[{"debugInfo":"","location":"windowend","message":"This field is not a record.","reason":"invalid"}],"index":0}]
管道如下所示:

// Creating the pipeline
Pipeline p = Pipeline.create(options);

// Window items
PCollection<TableRow> counts = p.apply("ReadFromPubSub", PubsubIO.readStrings().fromTopic(options.getTopic()))
.apply("AddEventTimestamps", WithTimestamps.of(TotalCountPipeline::ExtractTimeStamp).withAllowedTimestampSkew(Duration.standardDays(10000)))
        .apply("Window", Window.<String>into(
                FixedWindows.of(Duration.standardHours(options.getWindowSize())))
                .triggering(
                        AfterWatermark.pastEndOfWindow()
                                .withLateFirings(AfterPane.elementCountAtLeast(1)))
                .withAllowedLateness(Duration.standardDays(10000))
                .accumulatingFiredPanes())
        .apply("CalculateSum", Combine.globally(Count.<String>combineFn()).withoutDefaults())
        .apply("BigQueryFormat", ParDo.of(new FormatCountsFn()));

// Writing to BigQuery
counts.apply("WriteToBigQuery",BigQueryIO.writeTableRows()
                .to(options.getOutputTable())
                .withSchema(getSchema())
.withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND));

// Execute pipeline
p.run().waitUntilFinish();
//创建管道
Pipeline p=Pipeline.create(选项);
//窗口项
PCollection counts=p.apply(“ReadFromPubSub”,PubsubIO.readStrings().fromTopic(options.getTopic()))
.apply(“AddEventTimestamps”,带有时间戳.of(TotalCountPipeline::ExtractTimeStamp)。带有允许的时间戳偏移(Duration.standardDays(10000)))
.应用(“窗口”,Window.into(
FixedWindows.of(持续时间.standardHours(options.getWindowsSize()))
.触发(
AfterWatermark.pastEndOfWindow()
.最晚点火(后窗格。元素计数至少(1)))
.允许延迟(持续时间.标准天数(10000))
.累积FiredPanes())
.apply(“CalculateSum”,Combine.globally(Count.combineFn()).withoutDefaults())
.apply(“BigQueryFormat”,ParDo.of(new FormatCountsFn()));
//写入BigQuery
counts.apply(“WriteToBigQuery”,BigQueryIO.writeTableRows()
.to(options.getOutputTable())
.withSchema(getSchema())
.withWriteDisposition(BigQueryIO.Write.WriteDisposition.Write_APPEND));
//执行管道
p、 run().waitUntilFinish();
我猜这与BigQuery格式化函数有关,该函数的实现如下:

static class FormatCountsFn extends DoFn<Long, TableRow> {
    @ProcessElement
    public void processElement(ProcessContext c, BoundedWindow window) {
        TableRow row =
                new TableRow()
                        .set("windowStart", window.maxTimestamp().toDateTime())
                        .set("count", c.element().intValue());
        c.output(row);
    }
}
静态类FormatCountsFn扩展了DoFn{
@过程元素
公共void processElement(ProcessContext c,BoundedWindow){
表行=
新表格行()
.set(“WindowsStart”,window.maxTimestamp().toDateTime())
.set(“count”,c.element().intValue());
c、 输出(行);
}
}

正如受到启发一样。有人能解释一下吗?我似乎无法理解它。

显然,这个问题的答案与光束窗口无关,只与BigQuery有关。将DateTime对象写入BigQuery行需要正确的yyyy-MM-dd HH:MM:ss格式的字符串,这与我提供的DateTime对象形成对比。

显然,这个问题的答案与beam窗口无关,仅与BigQuery相关。将DateTime对象写入BigQuery行需要使用正确的yyyy-MM-dd HH:MM:ss格式的字符串,这与我提供的DateTime对象不同