Google cloud storage 数据流作业在从google云存储读取第35000个文件时阻塞

Google cloud storage 数据流作业在从google云存储读取第35000个文件时阻塞,google-cloud-storage,google-cloud-dataflow,Google Cloud Storage,Google Cloud Dataflow,tsv文件包含0.4M文件名(.mp3)的列表。解析后,它读取每个mp3文件并进行一些处理。当我在tsv中使用5个文件的列表进行测试时,它工作正常。但在测试0.4M文件时,它在 正在读取第35000个文件,错误为500。它似乎重试了很多次,最后失败了 仅供参考,mp3文件位于“gs://bucket\u name/same\u subdir/id\u string.mp3”中,其中id的顺序为10000110003。请使用而不是存储客户端。 请重试您的呼叫,对于可重试的错误,请使用exponen

tsv文件包含0.4M文件名(.mp3)的列表。解析后,它读取每个mp3文件并进行一些处理。当我在tsv中使用5个文件的列表进行测试时,它工作正常。但在测试0.4M文件时,它在 正在读取第35000个文件,错误为500。它似乎重试了很多次,最后失败了

仅供参考,mp3文件位于“gs://bucket\u name/same\u subdir/id\u string.mp3”中,其中id的顺序为10000110003。

请使用而不是存储客户端。
请重试您的呼叫,对于可重试的错误,请使用exponential。

我通过在管道中显式提供身份验证凭据解决了此问题。在我看来,工人在失败后重试时会失去许可

Traceback (most recent call last):
  File "apache_beam/runners/common.py", line 744, in apache_beam.runners.common.DoFnRunner.process
  File "apache_beam/runners/common.py", line 423, in apache_beam.runners.common.SimpleInvoker.invoke_process
  File "main2_mod.py", line 57, in process
  File "/usr/local/lib/python3.7/site-packages/google/cloud/storage/client.py", line 227, in get_bucket
    bucket.reload(client=self)
  File "/usr/local/lib/python3.7/site-packages/google/cloud/storage/_helpers.py", line 130, in reload
    _target_object=self,
  File "/usr/local/lib/python3.7/site-packages/google/cloud/_http.py", line 293, in api_request
    raise exceptions.from_http_response(response)
google.api_core.exceptions.InternalServerError: 500 GET https://www.googleapis.com/storage/v1/b/my_db?projection=noAcl: Backend Error

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/dataflow_worker/batchworker.py", line 649, in do_work
    work_executor.execute()
  File "/usr/local/lib/python3.7/site-packages/dataflow_worker/executor.py", line 176, in execute
    op.start()
  File "dataflow_worker/native_operations.py", line 38, in dataflow_worker.native_operations.NativeReadOperation.start
  File "dataflow_worker/native_operations.py", line 39, in dataflow_worker.native_operations.NativeReadOperation.start
  File "dataflow_worker/native_operations.py", line 44, in dataflow_worker.native_operations.NativeReadOperation.start
  File "dataflow_worker/native_operations.py", line 54, in dataflow_worker.native_operations.NativeReadOperation.start
  File "apache_beam/runners/worker/operations.py", line 246, in apache_beam.runners.worker.operations.Operation.output
  File "apache_beam/runners/worker/operations.py", line 142, in apache_beam.runners.worker.operations.SingletonConsumerSet.receive
  File "apache_beam/runners/worker/operations.py", line 560, in apache_beam.runners.worker.operations.DoOperation.process
  File "apache_beam/runners/worker/operations.py", line 561, in apache_beam.runners.worker.operations.DoOperation.process
  File "apache_beam/runners/common.py", line 740, in apache_beam.runners.common.DoFnRunner.receive
  File "apache_beam/runners/common.py", line 746, in apache_beam.runners.common.DoFnRunner.process
  File "apache_beam/runners/common.py", line 785, in apache_beam.runners.common.DoFnRunner._reraise_augmented
  File "apache_beam/runners/common.py", line 744, in apache_beam.runners.common.DoFnRunner.process
  File "apache_beam/runners/common.py", line 422, in apache_beam.runners.common.SimpleInvoker.invoke_process
  File "apache_beam/runners/common.py", line 870, in apache_beam.runners.common._OutputProcessor.process_outputs
  File "apache_beam/runners/worker/operations.py", line 142, in apache_beam.runners.worker.operations.SingletonConsumerSet.receive
  File "apache_beam/runners/worker/operations.py", line 560, in apache_beam.runners.worker.operations.DoOperation.process
  File "apache_beam/runners/worker/operations.py", line 561, in apache_beam.runners.worker.operations.DoOperation.process
  File "apache_beam/runners/common.py", line 740, in apache_beam.runners.common.DoFnRunner.receive
  File "apache_beam/runners/common.py", line 746, in apache_beam.runners.common.DoFnRunner.process
  File "apache_beam/runners/common.py", line 800, in apache_beam.runners.common.DoFnRunner._reraise_augmented
  File "/usr/local/lib/python3.7/site-packages/future/utils/__init__.py", line 421, in raise_with_traceback
    raise exc.with_traceback(traceback)
  File "apache_beam/runners/common.py", line 744, in apache_beam.runners.common.DoFnRunner.process
  File "apache_beam/runners/common.py", line 423, in apache_beam.runners.common.SimpleInvoker.invoke_process
  File "main2_mod.py", line 57, in process
  File "/usr/local/lib/python3.7/site-packages/google/cloud/storage/client.py", line 227, in get_bucket
    bucket.reload(client=self)
  File "/usr/local/lib/python3.7/site-packages/google/cloud/storage/_helpers.py", line 130, in reload
    _target_object=self,
  File "/usr/local/lib/python3.7/site-packages/google/cloud/_http.py", line 293, in api_request
    raise exceptions.from_http_response(response)
google.api_core.exceptions.InternalServerError: 500 GET https://www.googleapis.com/storage/v1/b/cochlear_db?projection=noAcl: Backend Error [while running 'MP3 to npy']
#从存储器中获取mp3
凭据=计算引擎。凭据()
项目=
客户端=存储。客户端(凭据=凭据,项目=项目)
bucket=client.get_bucket()

谢谢您的建议!我解决了这个问题(见我的答案)。我也会尝试你的答案!!我试过了,但在这个例子中,我在自动缩放方面遇到了问题。现在,我的回答很有效。
Traceback (most recent call last):
  File "apache_beam/runners/common.py", line 744, in apache_beam.runners.common.DoFnRunner.process
  File "apache_beam/runners/common.py", line 423, in apache_beam.runners.common.SimpleInvoker.invoke_process
  File "main2_mod.py", line 57, in process
  File "/usr/local/lib/python3.7/site-packages/google/cloud/storage/client.py", line 227, in get_bucket
    bucket.reload(client=self)
  File "/usr/local/lib/python3.7/site-packages/google/cloud/storage/_helpers.py", line 130, in reload
    _target_object=self,
  File "/usr/local/lib/python3.7/site-packages/google/cloud/_http.py", line 293, in api_request
    raise exceptions.from_http_response(response)
google.api_core.exceptions.InternalServerError: 500 GET https://www.googleapis.com/storage/v1/b/my_db?projection=noAcl: Backend Error

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/dataflow_worker/batchworker.py", line 649, in do_work
    work_executor.execute()
  File "/usr/local/lib/python3.7/site-packages/dataflow_worker/executor.py", line 176, in execute
    op.start()
  File "dataflow_worker/native_operations.py", line 38, in dataflow_worker.native_operations.NativeReadOperation.start
  File "dataflow_worker/native_operations.py", line 39, in dataflow_worker.native_operations.NativeReadOperation.start
  File "dataflow_worker/native_operations.py", line 44, in dataflow_worker.native_operations.NativeReadOperation.start
  File "dataflow_worker/native_operations.py", line 54, in dataflow_worker.native_operations.NativeReadOperation.start
  File "apache_beam/runners/worker/operations.py", line 246, in apache_beam.runners.worker.operations.Operation.output
  File "apache_beam/runners/worker/operations.py", line 142, in apache_beam.runners.worker.operations.SingletonConsumerSet.receive
  File "apache_beam/runners/worker/operations.py", line 560, in apache_beam.runners.worker.operations.DoOperation.process
  File "apache_beam/runners/worker/operations.py", line 561, in apache_beam.runners.worker.operations.DoOperation.process
  File "apache_beam/runners/common.py", line 740, in apache_beam.runners.common.DoFnRunner.receive
  File "apache_beam/runners/common.py", line 746, in apache_beam.runners.common.DoFnRunner.process
  File "apache_beam/runners/common.py", line 785, in apache_beam.runners.common.DoFnRunner._reraise_augmented
  File "apache_beam/runners/common.py", line 744, in apache_beam.runners.common.DoFnRunner.process
  File "apache_beam/runners/common.py", line 422, in apache_beam.runners.common.SimpleInvoker.invoke_process
  File "apache_beam/runners/common.py", line 870, in apache_beam.runners.common._OutputProcessor.process_outputs
  File "apache_beam/runners/worker/operations.py", line 142, in apache_beam.runners.worker.operations.SingletonConsumerSet.receive
  File "apache_beam/runners/worker/operations.py", line 560, in apache_beam.runners.worker.operations.DoOperation.process
  File "apache_beam/runners/worker/operations.py", line 561, in apache_beam.runners.worker.operations.DoOperation.process
  File "apache_beam/runners/common.py", line 740, in apache_beam.runners.common.DoFnRunner.receive
  File "apache_beam/runners/common.py", line 746, in apache_beam.runners.common.DoFnRunner.process
  File "apache_beam/runners/common.py", line 800, in apache_beam.runners.common.DoFnRunner._reraise_augmented
  File "/usr/local/lib/python3.7/site-packages/future/utils/__init__.py", line 421, in raise_with_traceback
    raise exc.with_traceback(traceback)
  File "apache_beam/runners/common.py", line 744, in apache_beam.runners.common.DoFnRunner.process
  File "apache_beam/runners/common.py", line 423, in apache_beam.runners.common.SimpleInvoker.invoke_process
  File "main2_mod.py", line 57, in process
  File "/usr/local/lib/python3.7/site-packages/google/cloud/storage/client.py", line 227, in get_bucket
    bucket.reload(client=self)
  File "/usr/local/lib/python3.7/site-packages/google/cloud/storage/_helpers.py", line 130, in reload
    _target_object=self,
  File "/usr/local/lib/python3.7/site-packages/google/cloud/_http.py", line 293, in api_request
    raise exceptions.from_http_response(response)
google.api_core.exceptions.InternalServerError: 500 GET https://www.googleapis.com/storage/v1/b/cochlear_db?projection=noAcl: Backend Error [while running 'MP3 to npy']
# get mp3 from the storage
    credentials = compute_engine.Credentials()
    project = <PROJECT_NAME>

    client = storage.Client(credentials=credentials, project=project)
    bucket = client.get_bucket(<BUCKET_NAME>)