Google cloud dataflow 数据流读取GCS错误

Google cloud dataflow 数据流读取GCS错误,google-cloud-dataflow,Google Cloud Dataflow,以下异常对任何人来说都很熟悉吗?上周使用了完全相同的管道和数据,但今天由于相同的异常而失败了几次。我没有从堆栈跟踪中看到我的代码的任何痕迹。想知道这可能与。。。例如,GCS读取配额 而且,由于它在我的direct runner上运行良好,对于这些类型的数据流异常,我如何在数据流上进行调试 { insertId: "7289985381136617647:828219:0:906922" jsonPayload: { exception: "java.io.IOException

以下异常对任何人来说都很熟悉吗?上周使用了完全相同的管道和数据,但今天由于相同的异常而失败了几次。我没有从堆栈跟踪中看到我的代码的任何痕迹。想知道这可能与。。。例如,GCS读取配额

而且,由于它在我的direct runner上运行良好,对于这些类型的数据流异常,我如何在数据流上进行调试

{


insertId:  "7289985381136617647:828219:0:906922"  
 jsonPayload: {
  exception:  "java.io.IOException: Failed to advance reader of source: gs://fiona_dataflow/tmp/BigQueryExtractTemp/5c813875537d4c1a89b74a800bb37c50/000000000864.avro range [0, 808559590)
    at com.google.cloud.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.advance(WorkerCustomSources.java:605)
    at com.google.cloud.dataflow.worker.util.common.worker.ReadOperation$SynchronizedReaderIterator.advance(ReadOperation.java:398)
    at com.google.cloud.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:193)
    at com.google.cloud.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:158)
    at com.google.cloud.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:75)
    at com.google.cloud.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:383)
    at com.google.cloud.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:355)
    at com.google.cloud.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:286)
    at com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:134)
    at com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:114)
    at com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:101)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException
    at com.geotab.bigdata.streaming.mapserver.backfill.MapServerBatchBeamApplication.lambda$main$fd9fc9ef$1(MapServerBatchBeamApplication.java:82)
    at org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase$1.apply(BigQuerySourceBase.java:211)
    at org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase$1.apply(BigQuerySourceBase.java:205)
    at org.apache.beam.sdk.io.AvroSource$AvroBlock.readNextRecord(AvroSource.java:579)
    at org.apache.beam.sdk.io.BlockBasedSource$BlockBasedReader.readNextRecord(BlockBasedSource.java:223)
    at org.apache.beam.sdk.io.FileBasedSource$FileBasedReader.advanceImpl(FileBasedSource.java:473)
    at org.apache.beam.sdk.io.OffsetBasedSource$OffsetBasedReader.advance(OffsetBasedSource.java:267)
    at com.google.cloud.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.advance(WorkerCustomSources.java:602)
    ... 14 more
"   
  job:  "2018-04-23_07_30_32-17662367668739576363"   
  logger:  "com.google.cloud.dataflow.worker.WorkItemStatusClient"   
  message:  "Uncaught exception occurred during work unit execution. This will be retried."   
  stage:  "s19"   
  thread:  "27"   
  work:  "1213589185295287945"   
  worker:  "mapserverbatchbeamapplica-04230730-s20x-harness-713d"   

在将此错误归因于作业运行在与其读取的bucket不同的区域之前,我已经看到过此错误。例如,如果您已在EU中上载数据,并试图在不久之后从我们处访问这些数据,带宽限制可能会导致您的读取被网络限制。基本上,您应该等待一段时间(甚至几天),然后才能访问我们在欧盟上传的数据。这适合你的情况吗?@LefterisS不太适合,数据和集群都在同一个区域。谢谢你的回复。对我来说,这听起来更像是一个内在的和临时的数据流服务问题。因为它没有出现在我最近几次跑步中