Google cloud dataflow 没有为GroupByKey注册的转换器。仅限GroupByKey

Google cloud dataflow 没有为GroupByKey注册的转换器。仅限GroupByKey,google-cloud-dataflow,Google Cloud Dataflow,尝试使用数据流执行GoGroupByKey时遇到此错误。在高层,我想加入两个PCollection,一个是KV,另一个是KV。我只是通过TupleTags、KeyedPCollection和CoGroupByKey进行标准连接,与中列出的示例非常相似 仅供参考,我正在使用带有BlockingDataflowPipelineRunner的java库 编辑,翻遍源代码后,我发现这是因为DataflowPipelineTranslator.java没有在DataflowPipelineRunner中注

尝试使用数据流执行GoGroupByKey时遇到此错误。在高层,我想加入两个PCollection,一个是
KV
,另一个是
KV
。我只是通过TupleTags、KeyedPCollection和CoGroupByKey进行标准连接,与中列出的示例非常相似

仅供参考,我正在使用带有BlockingDataflowPipelineRunner的java库


编辑,翻遍源代码后,我发现这是因为
DataflowPipelineTranslator.java
没有在
DataflowPipelineRunner
中注册Transformer GroupByKeyOnly,所以在DataflowPipelineOptions上运行的任何管道(及其任何扩展)将仅注册GroupByKeyOnly…?

GroupByKeyOnly不应出现在应用于DataflowPipelineRunner图形的转换集中,这可能是因为管道可能是在没有在PipelineOptions上设置运行程序的情况下构造的,然后调用[Blocking]DataflowPipelineRunner.run(管道). 预期的模式是不直接使用DataflowPipeline/DataflowPipelineRunner方法,例如:

PipelineOptions options = PipelineOptionsFactory.fromArgs(args);

// Make sure that runner is set before calling Pipeline.create(options)
Pipeline p = Pipeline.create(options);

// Apply all your transforms
p.apply(... transforms ...);

PipelineResult result = p.run();

在上面的示例中,您将能够通过调整应用程序的命令行参数来交换运行程序。例如,使用BlockingDataflowPipelineRunner将确保作业结果在从p.run()返回之前达到终端状态。

您使用的是哪个版本的Dataflow SDK?
com.google.cloud.Dataflow:google cloud Dataflow java SDK all:1.4.0
您可以尝试使用更新的1.5.0 SDK吗,似乎不起作用您是否可以在此处发布更多示例,或者作为(要点)[?
CoGroupByKey
在使用时起作用--问题可能存在于管道的其他地方。
    Exception in thread "main" java.lang.IllegalStateException: no translator registered for GroupByKey.GroupByKeyOnly
at com.google.cloud.dataflow.sdk.runners.DataflowPipelineTranslator$Translator.visitTransform(DataflowPipelineTranslator.java:500)
at com.google.cloud.dataflow.sdk.runners.TransformTreeNode.visit(TransformTreeNode.java:219)
at com.google.cloud.dataflow.sdk.runners.TransformTreeNode.visit(TransformTreeNode.java:215)
at com.google.cloud.dataflow.sdk.runners.TransformTreeNode.visit(TransformTreeNode.java:215)
at com.google.cloud.dataflow.sdk.runners.TransformTreeNode.visit(TransformTreeNode.java:215)
at com.google.cloud.dataflow.sdk.runners.TransformHierarchy.visit(TransformHierarchy.java:102)
at com.google.cloud.dataflow.sdk.Pipeline.traverseTopologically(Pipeline.java:259)
at com.google.cloud.dataflow.sdk.runners.DataflowPipelineTranslator$Translator.translate(DataflowPipelineTranslator.java:455)
at com.google.cloud.dataflow.sdk.runners.DataflowPipelineTranslator.translate(DataflowPipelineTranslator.java:146)
at com.google.cloud.dataflow.sdk.runners.DataflowPipelineRunner.run(DataflowPipelineRunner.java:325)
at com.google.cloud.dataflow.sdk.runners.BlockingDataflowPipelineRunner.run(BlockingDataflowPipelineRunner.java:95)
PipelineOptions options = PipelineOptionsFactory.fromArgs(args);

// Make sure that runner is set before calling Pipeline.create(options)
Pipeline p = Pipeline.create(options);

// Apply all your transforms
p.apply(... transforms ...);

PipelineResult result = p.run();