Google cloud platform 将突变组流到扳手中

Google cloud platform 将突变组流到扳手中,google-cloud-platform,google-cloud-dataflow,apache-beam,google-cloud-spanner,apache-beam-io,Google Cloud Platform,Google Cloud Dataflow,Apache Beam,Google Cloud Spanner,Apache Beam Io,我正试图用SpaneRio将突变组流到Spaner中。 我们的目标是每10秒编写一次新的MuationGroups,因为我们将使用扳手查询接近时间的KPI 当我不使用任何窗口时,会出现以下错误: Exception in thread "main" java.lang.IllegalStateException: GroupByKey cannot be applied to non-bounded PCollection in the GlobalWindow without a trigge

我正试图用SpaneRio将突变组流到Spaner中。 我们的目标是每10秒编写一次新的MuationGroups,因为我们将使用扳手查询接近时间的KPI

当我不使用任何窗口时,会出现以下错误:

Exception in thread "main" java.lang.IllegalStateException: GroupByKey cannot be applied to non-bounded PCollection in the GlobalWindow without a trigger. Use a Window.into or Window.triggering transform prior to GroupByKey.
    at org.apache.beam.sdk.transforms.GroupByKey.applicableTo(GroupByKey.java:173)
    at org.apache.beam.sdk.transforms.GroupByKey.expand(GroupByKey.java:204)
    at org.apache.beam.sdk.transforms.GroupByKey.expand(GroupByKey.java:120)
    at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:537)
    at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:472)
    at org.apache.beam.sdk.values.PCollection.apply(PCollection.java:286)
    at org.apache.beam.sdk.transforms.Combine$PerKey.expand(Combine.java:1585)
    at org.apache.beam.sdk.transforms.Combine$PerKey.expand(Combine.java:1470)
    at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:537)
    at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:491)
    at org.apache.beam.sdk.values.PCollection.apply(PCollection.java:299)
    at org.apache.beam.sdk.io.gcp.spanner.SpannerIO$WriteGrouped.expand(SpannerIO.java:868)
    at org.apache.beam.sdk.io.gcp.spanner.SpannerIO$WriteGrouped.expand(SpannerIO.java:823)
    at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:537)
    at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:472)
    at org.apache.beam.sdk.values.PCollection.apply(PCollection.java:286)
    at quantum.base.transform.entity.spanner.SpannerProtoWrite.expand(SpannerProtoWrite.java:52)
    at quantum.base.transform.entity.spanner.SpannerProtoWrite.expand(SpannerProtoWrite.java:20)
    at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:537)
    at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:491)
    at org.apache.beam.sdk.values.PCollection.apply(PCollection.java:299)
    at quantum.entitybuilder.pipeline.EntityBuilderPipeline$Write$SpannerWrite.expand(EntityBuilderPipeline.java:388)
    at quantum.entitybuilder.pipeline.EntityBuilderPipeline$Write$SpannerWrite.expand(EntityBuilderPipeline.java:372)
    at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:537)
    at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:491)
    at org.apache.beam.sdk.values.PCollection.apply(PCollection.java:299)
    at quantum.entitybuilder.pipeline.EntityBuilderPipeline.main(EntityBuilderPipeline.java:122)
:entityBuilder FAILED
由于上述错误,我假设输入集合需要打开窗口并触发,因为SpanRio使用GroupByKey(这也是我的用例所需要的):

进一步调查后,这似乎是由于Panerio中的
.apply(Wait.on(input))
造成的:它有一个全局端输入,似乎不适用于我的固定窗口,因为
Wait.java的文档状态为:

If signal is globally windowed, main input must also be. This typically would be useful
 *       only in a batch pipeline, because the global window of an infinite PCollection never
 *       closes, so the wait signal will never be ready.
作为临时解决办法,我尝试了以下方法:

  • 添加带有触发器的全局窗口,而不是固定窗口:

        .apply("globalwindow", Window.<MutationGroup>into(new GlobalWindows())
                .triggering(Repeatedly.forever(AfterProcessingTime
                        .pastFirstElementInPane()
                        .plusDelayOf(Duration.standardSeconds(10))
                ).orFinally(AfterWatermark.pastEndOfWindow()))
                .discardingFiredPanes()
                .withAllowedLateness(Duration.ZERO))
    
请注意,一切都与DirectRunner一起工作,我正在尝试使用DataflowRunner

有没有人对我可以试着让它运行的事情有任何其他建议?我很难想象,我是唯一一个试图将变异组流式转换成扳手的人


提前谢谢

目前,梁流不支持SpanRio连接器。请遵循为扳手IO连接器添加流式支持的步骤

If signal is globally windowed, main input must also be. This typically would be useful
 *       only in a batch pipeline, because the global window of an infinite PCollection never
 *       closes, so the wait signal will never be ready.
    .apply("globalwindow", Window.<MutationGroup>into(new GlobalWindows())
            .triggering(Repeatedly.forever(AfterProcessingTime
                    .pastFirstElementInPane()
                    .plusDelayOf(Duration.standardSeconds(10))
            ).orFinally(AfterWatermark.pastEndOfWindow()))
            .discardingFiredPanes()
            .withAllowedLateness(Duration.ZERO))
logger:  "org.apache.beam.sdk.coders.SerializableCoder"
message:  "Can't verify serialized elements of type SpannerSchema have well defined equals method. This may produce incorrect results on some PipelineRunner
logger:  "org.apache.beam.sdk.coders.SerializableCoder"   
message:  "Can't verify serialized elements of type BoundedSource have well defined equals method. This may produce incorrect results on some PipelineRunner"