Google cloud dataflow java.io.IOException:无效的_参数:无法解析com.google.cloud.dataflow.sdk.runners.worker.ApplianceHuffleWriter.write上的键

Google cloud dataflow java.io.IOException:无效的_参数:无法解析com.google.cloud.dataflow.sdk.runners.worker.ApplianceHuffleWriter.write上的键,google-cloud-dataflow,Google Cloud Dataflow,我在运行某个从g3读取的作业时遇到以下异常,然后按键对数据进行分组。 异常发生在读取过程中 java.io.IOException:无效的_参数:无法在com.google.cloud.dataflow.sdk.runners.worker.ApplianceShuffleWriter.write(本机方法)解析com.google.cloud.dataflow.sdk.runners.worker.shufleSink$shufleSinkWriter.outputChunk(shufleSi

我在运行某个从g3读取的作业时遇到以下异常,然后按键对数据进行分组。 异常发生在读取过程中

java.io.IOException:无效的_参数:无法在com.google.cloud.dataflow.sdk.runners.worker.ApplianceShuffleWriter.write(本机方法)解析com.google.cloud.dataflow.sdk.runners.worker.shufleSink$shufleSinkWriter.outputChunk(shufleSink.java:293)处的键 com.google.cloud.dataflow.sdk.runners.worker.ShuffleSink$ShuffleSinkWriter.close(ShuffleSink.java:288)位于 com.google.cloud.dataflow.sdk.util.common.worker.WriteOperation.finish(WriteOperation.java:100)位于 com.google.cloud.dataflow.sdk.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:79)位于 com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.executeWork(DataflowWorker.java:288)位于 com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.doWork(DataflowWorker.java:221)位于 com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.getAndPerformWork(DataflowWorker.java:173)位于 com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.doWork(DataflowWorkerHarness.java:193)位于 com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.call(DataflowWorkerHarness.java:173)位于 com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.call(DataflowWorkerHarness.java:160)位于 java.util.concurrent.FutureTask.run(FutureTask.java:266)位于 位于的java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)位于 run(Thread.java:745)


有什么想法吗?

当您尝试应用
GroupByKey
但某些映射键为空时,会引发此异常

此代码引发异常:

pCollection
            .apply(ParDo.of(new DoFn<KV<MyObject, MyObject>, Object>() {
                @Override
                public void processElement(ProcessContext c) throws Exception {
                    c.output(KV.of(null, c.element()));
                }
            }))
            .apply(GroupByKey.<String, Statusable>create())
pCollection
.适用(新DoFn()的第{
@凌驾
public void processElement(ProcessContext c)引发异常{
c、 输出(千伏(零,c元素());
}
}))
.apply(GroupByKey.create())
不能写入空密钥。 因此,当密钥可为空时,必须执行以下操作:

pCollection
            .apply(ParDo.of(new DoFn<KV<MyObject, MyObject>, Object>() {
                @Override
                public void processElement(ProcessContext c) throws Exception {
                    String key == c.element().getKeyField();
                    if (key == null){
                        // Handle some how....
                        key = ... // not null value

                    }
                    c.output(KV.of(key, c.element()));
                }
            }))
pCollection
.适用(新DoFn()的第{
@凌驾
public void processElement(ProcessContext c)引发异常{
字符串键==c.element().getKeyField();
if(key==null){
//如何处理一些。。。。
key=…//非空值
}
c、 输出(千伏)(键,c元件());
}
}))

pCollection上的编码人员是否了解如何解释空键?编码是否表示空字节数组?我更新了答案,因此不会写入空密钥。如果写入了null键,则会得到问题中的异常。