Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/google-cloud-platform/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 尝试上载大文件时发生GCS断管错误_Python_Google Cloud Platform_Google Cloud Storage_Airflow - Fatal编程技术网

Python 尝试上载大文件时发生GCS断管错误

Python 尝试上载大文件时发生GCS断管错误,python,google-cloud-platform,google-cloud-storage,airflow,Python,Google Cloud Platform,Google Cloud Storage,Airflow,我试图将一个.csv.gz文件解压缩到.csv后上传到GCS,文件大小从500MB变为5GB左右。我能够将.csv.gz文件提取到一个临时路径,但当我尝试将该文件上载到GCS时,它失败了。我得到以下错误: [2019-11-11 13:59:58,180] {models.py:1796} ERROR - [Errno 32] Broken pipe Traceback (most recent call last) File "/usr/local/lib/airflow/airflow/

我试图将一个.csv.gz文件解压缩到.csv后上传到GCS,文件大小从500MB变为5GB左右。我能够将.csv.gz文件提取到一个临时路径,但当我尝试将该文件上载到GCS时,它失败了。我得到以下错误:

[2019-11-11 13:59:58,180] {models.py:1796} ERROR - [Errno 32] Broken pipe
Traceback (most recent call last)
  File "/usr/local/lib/airflow/airflow/models.py", line 1664, in _run_raw_tas
    result = task_copy.execute(context=context
  File "/home/airflow/gcs/dags/operators/s3_to_gcs_transform_operator.py", line 220, in execut
    gcs_hook.upload(dest_gcs_bucket, dest_gcs_object, target_file, gzip=True
  File "/home/airflow/gcs/dags/hooks/gcs_hook_conn.py", line 208, in uploa
    .insert(bucket=bucket, name=object, media_body=media) 
  File "/opt/python3.6/lib/python3.6/site-packages/googleapiclient/_helpers.py", line 130, in positional_wrappe
    return wrapped(*args, **kwargs
  File "/opt/python3.6/lib/python3.6/site-packages/googleapiclient/http.py", line 835, in execut
    method=str(self.method), body=self.body, headers=self.headers
  File "/opt/python3.6/lib/python3.6/site-packages/googleapiclient/http.py", line 179, in _retry_reques
    raise exceptio
  File "/opt/python3.6/lib/python3.6/site-packages/googleapiclient/http.py", line 162, in _retry_reques
    resp, content = http.request(uri, method, *args, **kwargs
  File "/opt/python3.6/lib/python3.6/site-packages/google_auth_httplib2.py", line 198, in reques
    uri, method, body=body, headers=request_headers, **kwargs
  File "/usr/local/lib/airflow/airflow/contrib/hooks/gcp_api_base_hook.py", line 155, in new_reques
    redirections, connection_type
  File "/opt/python3.6/lib/python3.6/site-packages/httplib2/__init__.py", line 1924, in reques
    cachekey
  File "/opt/python3.6/lib/python3.6/site-packages/httplib2/__init__.py", line 1595, in _reques
    conn, request_uri, method, body, header
  File "/opt/python3.6/lib/python3.6/site-packages/httplib2/__init__.py", line 1502, in _conn_reques
    conn.request(method, request_uri, body, headers
  File "/opt/python3.6/lib/python3.6/http/client.py", line 1239, in reques
    self._send_request(method, url, body, headers, encode_chunked
  File "/opt/python3.6/lib/python3.6/http/client.py", line 1285, in _send_reques
    self.endheaders(body, encode_chunked=encode_chunked
  File "/opt/python3.6/lib/python3.6/http/client.py", line 1234, in endheader
    self._send_output(message_body, encode_chunked=encode_chunked
  File "/opt/python3.6/lib/python3.6/http/client.py", line 1065, in _send_outpu
    self.send(chunk
  File "/opt/python3.6/lib/python3.6/http/client.py", line 986, in sen
    self.sock.sendall(data
  File "/opt/python3.6/lib/python3.6/ssl.py", line 975, in sendal
    v = self.send(byte_view[count:]
  File "/opt/python3.6/lib/python3.6/ssl.py", line 944, in sen
    return self._sslobj.write(data
  File "/opt/python3.6/lib/python3.6/ssl.py", line 642, in writ
    return self._sslobj.write(data
BrokenPipeError: [Errno 32] Broken pip
据我所知,错误可能是由以下原因造成的:

您的服务器进程已收到一个向套接字写入的SIGPIPE。这 通常发生在您写入另一个完全关闭的套接字时 (客户)方。当客户端程序不运行时,可能会发生这种情况 等待接收到来自服务器的所有数据,然后简单地关闭 插座(使用关闭功能)


但我不知道这是否是问题所在,也不知道如何解决。有人能帮忙吗?

你应该试着成批上传大文件

from google.cloud import storage

CHUNK_SIZE = 128 * 1024 * 1024  

client = storage.Client()
bucket = client.bucket('destination')
blob = bucket.blob('really-big-blob', chunk_size=CHUNK_SIZE)
blob.upload_from_filename('/path/to/really-big-file')
你也可以查一下


类似的SO问题。

请在运行程序时更新代码,我希望看到您所做的更改。是的,通过设置resumable=True并在gcs_hook.upload()中的MediaFileUpload()方法中指定chunksize()来修复此问题。