Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/java/400.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Java GCP:在apache beam中用编码器替换旧的数据流功能?_Java_Google Cloud Platform_Google Cloud Dataflow_Apache Beam - Fatal编程技术网

Java GCP:在apache beam中用编码器替换旧的数据流功能?

Java GCP:在apache beam中用编码器替换旧的数据流功能?,java,google-cloud-platform,google-cloud-dataflow,apache-beam,Java,Google Cloud Platform,Google Cloud Dataflow,Apache Beam,我正在阅读“存储”的json,并将其转储到“bigQuery”中。为此,我写了以下内容: PCollectionTuple collectionTuple = p .apply(TextIO.Read.named("Read Input Files : " + dataFeedCode).from(file) .withCoder(TableRowJsonCoder.of())) .apply(ParDo.nam

我正在阅读“存储”的json,并将其转储到“bigQuery”中。为此,我写了以下内容:

PCollectionTuple collectionTuple = p
              .apply(TextIO.Read.named("Read Input Files : " + dataFeedCode).from(file)
              .withCoder(TableRowJsonCoder.of()))
              .apply(ParDo.named("Perform Transformation")
              .of(performTransformation(fieldMappings, inValidRecords, actualFileName))
              .withOutputTags(validRecords, TupleTagList.of(inValidRecords)));
现在,我必须将此代码转换为Apache beam实现,因此我编写了以下代码:-

 PCollectionTuple collectionTuple = p.apply(TextIO.read().from(file))
                        .apply(MapElements.via(new ParseTableRowJson()))
                        .apply(ParDo.of(performTransformation(fieldMappings, inValidRecords, actualFileName))
                                .withOutputTags(validRecords, TupleTagList.of(inValidRecords)))

static class ParseTableRowJson extends SimpleFunction<String, TableRow> {
        @Override
        public TableRow apply(String input) {
            try {
                //return Transport.getJsonFactory().fromString(input, TableRow.class);
                ;
                return TableRowJsonCoder.of().decode(new ByteArrayInputStream(CharStreams.toString(CharSource.wrap(input).openStream()).getBytes()));
            } catch (IOException e) {
                throw new RuntimeException("Failed parsing table row json", e);
            }
        }
    }

提前感谢您的支持。

因为您已经按照的文档建议使用了
MapElements
ParDo
。然后,您可以尝试将
ParseTableRowJson
a
DoFn
而不是SimpleFunction,例如,通过将
MapElements
转换为
DoFn

datafeed.dataflow.pipeline.DataFeedWriteToBigQueryPipeline$ParseTableRowJson.apply(DataFeedWriteToBigQueryPipeline.java:302)
    at com.morrisons.datafeed.dataflow.pipeline.DataFeedWriteToBigQueryPipeline$ParseTableRowJson.apply(DataFeedWriteToBigQueryPipeline.java:1)
    at org.apache.beam.sdk.transforms.Contextful.lambda$fn$79bf234f$1(Contextful.java:112)
    at org.apache.beam.sdk.transforms.MapElements$1.processElement(MapElements.java:129)
    at org.apache.beam.sdk.transforms.MapElements$1$DoFnInvoker.invokeProcessElement(Unknown Source)
    at org.apache.beam.runners.core.SimpleDoFnRunner.invokeProcessElement(SimpleDoFnRunner.java:177)
    at org.apache.beam.runners.core.SimpleDoFnRunner.processElement(SimpleDoFnRunner.java:141)
    at com.google.cloud.dataflow.worker.SimpleParDoFn.processElement(SimpleParDoFn.java:323)
    at com.google.cloud.dataflow.worker.util.common.worker.ParDoOperation.process(ParDoOperation.java:43)
    at com.google.cloud.dataflow.worker.util.common.worker.OutputReceiver.process(OutputReceiver.java:48)
    at com.google.cloud.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:200)
    at com.google.cloud.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:158)
    at com.google.cloud.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:75)
    at com.google.cloud.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:383)
    at com.google.cloud.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:355)
    at com.google.cloud.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:286)
    at com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:134)
    at com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:114)
    at com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:101)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by:0 java.io.EOFException
    at java.io.DataInputStream.readFully(DataInputStream.java:197)
    at java.io.DataInputStream.readFully(DataInputStream.java:169)
    at org.apache.beam.sdk.coders.StringUtf8Coder.readString(StringUtf8Coder.java:63)
    at org.apache.beam.sdk.coders.StringUtf8Coder.decode(StringUtf8Coder.java:106)
    ... 25 more