Google cloud dataflow 将文件从gcp下载到本地系统时,Google云数据流会显示一条神秘消息

Google cloud dataflow 将文件从gcp下载到本地系统时,Google云数据流会显示一条神秘消息,google-cloud-dataflow,google-cloud-python,Google Cloud Dataflow,Google Cloud Python,我正在编写一个数据流管道,用于处理来自谷歌云存储桶的视频。我的管道将每个工作项下载到本地系统,然后将结果重新上传回GCP bucket。继上一个 管道在本地DirectRunner上工作,在DataFlowRunner上调试时遇到问题 错误为 File "run_clouddataflow.py", line 41, in process File "/usr/local/lib/python2.7/dist-packages/google/cloud/storage/blob.py", li

我正在编写一个数据流管道,用于处理来自谷歌云存储桶的视频。我的管道将每个工作项下载到本地系统,然后将结果重新上传回GCP bucket。继上一个

管道在本地DirectRunner上工作,在DataFlowRunner上调试时遇到问题

错误为

File "run_clouddataflow.py", line 41, in process 
File "/usr/local/lib/python2.7/dist-packages/google/cloud/storage/blob.py", line 464, in download_to_file self._do_download(transport, file_obj, download_url, headers) 
File "/usr/local/lib/python2.7/dist-packages/google/cloud/storage/blob.py", line 418, in _do_download download.consume(transport) File "/usr/local/lib/python2.7/dist-packages/google/resumable_media/requests/download.py", line 101, in consume self._write_to_stream(result) 
File "/usr/local/lib/python2.7/dist-packages/google/resumable_media/requests/download.py", line 62, in _write_to_stream with response: AttributeError: __exit__ [while running 'Run DeepMeerkat']
当尝试执行blob时。下载到文件(文件obj)中:

storage_client=storage.Client()
bucket = storage_client.get_bucket(parsed.hostname)
blob=storage.Blob(parsed.path[1:],bucket)

#store local path
local_path="/tmp/" + parsed.path.split("/")[-1]

print('local path: ' + local_path)
with open(local_path, 'wb') as file_obj:
  blob.download_to_file(file_obj)

print("Downloaded" + local_path)
我猜工人们不允许在本地写作?或者,数据流容器中可能没有/tmp文件夹。我应该在哪里写对象?如果不访问环境,很难进行调试。是否可以从Worker访问标准输出以进行调试(串行控制台?)

编辑#1

我已尝试显式传递凭据:

  try:
      credentials, project = google.auth.default()
  except:
      os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = known_args.authtoken
      credentials, project = google.auth.default()
以及写入cwd(),而不是/tmp/

local_path=parsed.path.split("/")[-1]

print('local path: ' + local_path)
with open(local_path, 'wb') as file_obj:
  blob.download_to_file(file_obj)
仍然在从gcp下载blob时出现神秘错误

下面是完整的管道脚本,setup.py是


我和谷歌云存储包mantainer谈过,这是一个已知的问题。将my setup.py中的特定版本更新为

REQUIRED_PACKAGES = ["google-cloud-storage==1.3.2","google-auth","requests>=2.18.0"]
修正了这个问题


您可以在云用户界面上记录和读取日志数据。这样就够了吗?此外,您应该能够写入本地磁盘。我会再打给你的。谢谢巴勃罗。我正在检查新的google.auth模块,也可能是工作人员没有从数据流继承我的凭据。我刚刚添加了try:credentials,project=google.auth.default(),除了:os.environ[“google\u APPLICATION\u credentials”]=known\u args.authtoken credentials,project=google.auth.default()添加到编辑中。更新了脚本以反映编辑。
REQUIRED_PACKAGES = ["google-cloud-storage==1.3.2","google-auth","requests>=2.18.0"]