Python 从GCP存储桶读取文件-超过最大重试次数

Python 从GCP存储桶读取文件-超过最大重试次数,python,google-cloud-platform,python-requests,multiprocessing,Python,Google Cloud Platform,Python Requests,Multiprocessing,我正在通过一个多处理库读取存储在GCP存储桶中的xml文件,如下所示 def extract_entity(entity_number): # extract the entity ID from the database # os.environ["GOOGLE_APPLICATION_CREDENTIALS"]= "/Users/mygbucketaccess.json" # gauth = GoogleAuth() # gauth.LoadCredentia

我正在通过一个多处理库读取存储在GCP存储桶中的xml文件,如下所示

def extract_entity(entity_number):
    # extract the entity ID from the database
    # os.environ["GOOGLE_APPLICATION_CREDENTIALS"]= "/Users/mygbucketaccess.json"
    # gauth = GoogleAuth()
    # gauth.LoadCredentialsFile("mycreds.txt")
    # print(entity_number)
    time.sleep(1.5)

    blob = bucket.get_blob('xlmfull/' + entity_number + '_full.xml')
    xml_file = blob.download_as_string()

    xml_data = str(xml_file, 'utf-8')
    y=BeautifulSoup(xml_data)
    ...
    ...
    DO SOME ANALYSIS
    return(res)
这是我的共享部分

from multiprocessing import Pool
start_time = time.time()
pool = Pool(processes=8)
results = pool.map(extract_entity, all_ids[0:100])
pool.close()
这是非常随机的,不管1)增加
时间。sleep
2)将身份验证放在内部-我得到以下错误

---------------------------------------------------------------------------
RemoteTraceback                           Traceback (most recent call last)
RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/Users//opt/anaconda3/envs/husx/lib/python3.7/multiprocessing/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "/Users//opt/anaconda3/envs/husx/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))
  File "<ipython-input-65-b023de58d494>", line 10, in extract_entity
    blob = bucket.get_blob('xlmfull/' + entity_number + '_full.xml')
  File "/Users//opt/anaconda3/envs/husx/lib/python3.7/site-packages/google/cloud/storage/bucket.py", line 899, in get_blob
    blob.reload(client=client, timeout=timeout)
  File "/Users//opt/anaconda3/envs/husx/lib/python3.7/site-packages/google/cloud/storage/_helpers.py", line 150, in reload
    timeout=timeout,
  File "/Users//opt/anaconda3/envs/husx/lib/python3.7/site-packages/google/cloud/_http.py", line 426, in api_request
    return response.json()
  File "/Users//opt/anaconda3/envs/husx/lib/python3.7/site-packages/requests/models.py", line 898, in json
    return complexjson.loads(self.text, **kwargs)
  File "/Users//opt/anaconda3/envs/husx/lib/python3.7/json/__init__.py", line 348, in loads
    return _default_decoder.decode(s)
  File "/Users//opt/anaconda3/envs/husx/lib/python3.7/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/Users//opt/anaconda3/envs/husx/lib/python3.7/json/decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Unterminated string starting at: line 17 column 14 (char 720)
"""

The above exception was the direct cause of the following exception:

JSONDecodeError                           Traceback (most recent call last)
<ipython-input-66-d14f24547c6b> in <module>
      2 start_time = time.time()
      3 pool = Pool(processes=7)
----> 4 results = pool.map(extract_entity, all_ids[0:100])
      5 du = time.time() - start_time
      6 du

~/opt/anaconda3/envs/husx/lib/python3.7/multiprocessing/pool.py in map(self, func, iterable, chunksize)
    266         in a list that is returned.
    267         '''
--> 268         return self._map_async(func, iterable, mapstar, chunksize).get()
    269 
    270     def starmap(self, func, iterable, chunksize=None):

~/opt/anaconda3/envs/husx/lib/python3.7/multiprocessing/pool.py in get(self, timeout)
    655             return self._value
    656         else:
--> 657             raise self._value
    658 
    659     def _set(self, i, obj):

JSONDecodeError: Unterminated string starting at: line 17 column 14 (char 720)
---------------------------------------------------------------------------
远程回溯回溯(最近一次呼叫最后一次)
远程回溯:
"""
回溯(最近一次呼叫最后一次):
worker中的文件“/Users//opt/anaconda3/envs/husx/lib/python3.7/multiprocessing/pool.py”,第121行
结果=(True,func(*args,**kwds))
mapstar中的文件“/Users//opt/anaconda3/envs/husx/lib/python3.7/multiprocessing/pool.py”,第44行
返回列表(映射(*args))
文件“”,第10行,在提取实体中
blob=bucket.get_blob('xlmfull/'+实体号+'\u full.xml'))
文件“/Users//opt/anaconda3/envs/husx/lib/python3.7/site packages/google/cloud/storage/bucket.py”,第899行,在get_blob中
重载(客户端=客户端,超时=超时)
文件“/Users//opt/anaconda3/envs/husx/lib/python3.7/site packages/google/cloud/storage/_helpers.py”,第150行,重新加载
超时=超时,
文件“/Users//opt/anaconda3/envs/husx/lib/python3.7/site packages/google/cloud/_http.py”,第426行,在api_请求中
返回response.json()
json格式的文件“/Users//opt/anaconda3/envs/husx/lib/python3.7/site packages/requests/models.py”,第898行
返回complexjson.load(self.text,**kwargs)
文件“/Users//opt/anaconda3/envs/husx/lib/python3.7/json/_init__.py”,第348行,加载
返回\u默认\u解码器。解码
文件“/Users//opt/anaconda3/envs/husx/lib/python3.7/json/decoder.py”,第337行,在decode中
obj,end=self.raw\u decode(s,idx=\u w(s,0.end())
文件“/Users//opt/anaconda3/envs/husx/lib/python3.7/json/decoder.py”,第353行,原始解码
obj,end=self.scan_一次(s,idx)
json.decoder.JSONDecodeError:从第17行第14列(char 720)开始的未终止字符串
"""
上述异常是以下异常的直接原因:
JSONDecodeError回溯(最近一次调用)
在里面
2开始时间=时间。时间()
3池=池(进程=7)
---->4 results=pool.map(提取\u实体,所有\u id[0:100])
5 du=时间。时间()-开始时间
6杜
映射中的~/opt/anaconda3/envs/husx/lib/python3.7/multiprocessing/pool.py(self、func、iterable、chunksize)
266在返回的列表中。
267         '''
-->268返回self.\u map\u async(func、iterable、mapstar、chunksize).get()
269
270 def星图(self、func、iterable、chunksize=None):
get中的~/opt/anaconda3/envs/husx/lib/python3.7/multiprocessing/pool.py(self,超时)
655返回自身值
656其他:
-->657提高自我价值
658
659 def_装置(自身、i、obj):
JSONDECODEROR:从第17行第14列开始的未终止字符串(字符720)
似乎有什么东西可以中断连接来读取文件。因此,我取消了函数中凭据的注释,并让它随时间运行。sleep(2),我得到一个max retries excelled错误

RemoteTraceback                           Traceback (most recent call last)
RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/Users/alireza/opt/anaconda3/envs/husx/lib/python3.7/site-packages/urllib3/connectionpool.py", line 672, in urlopen
    chunked=chunked,
  File "/Users/alireza/opt/anaconda3/envs/husx/lib/python3.7/site-packages/urllib3/connectionpool.py", line 421, in _make_request
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
  File "/Users/alireza/opt/anaconda3/envs/husx/lib/python3.7/site-packages/urllib3/connectionpool.py", line 416, in _make_request
    httplib_response = conn.getresponse()
  File "/Users/alireza/opt/anaconda3/envs/husx/lib/python3.7/http/client.py", line 1344, in getresponse
    response.begin()
  File "/Users/alireza/opt/anaconda3/envs/husx/lib/python3.7/http/client.py", line 306, in begin
    version, status, reason = self._read_status()
  File "/Users/alireza/opt/anaconda3/envs/husx/lib/python3.7/http/client.py", line 267, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/Users/alireza/opt/anaconda3/envs/husx/lib/python3.7/socket.py", line 589, in readinto
    return self._sock.recv_into(b)
  File "/Users/a/opt/anaconda3/envs/husx/lib/python3.7/ssl.py", line 1071, in recv_into
    return self.read(nbytes, buffer)
  File "/Users/a/opt/anaconda3/envs/husx/lib/python3.7/ssl.py", line 929, in read
    return self._sslobj.read(len, buffer)
ssl.SSLError: [SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:2555)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/a/opt/anaconda3/envs/husx/lib/python3.7/site-packages/requests/adapters.py", line 449, in send
    timeout=timeout
  File "/Users/a/opt/anaconda3/envs/husx/lib/python3.7/site-packages/urllib3/connectionpool.py", line 720, in urlopen
    method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
  File "/Users/a/opt/anaconda3/envs/husx/lib/python3.7/site-packages/urllib3/util/retry.py", line 436, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='storage.googleapis.com', port=443): Max retries exceeded with url: /storage/v1/b//o/xlmfull%2F310000052_full.xml?projection=noAcl (Caused by SSLError(SSLError(1, '[SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:2555)')))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/a/opt/anaconda3/envs/husx/lib/python3.7/multiprocessing/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "/Users/a/opt/anaconda3/envs/husx/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))
  File "<ipython-input-80-8c760f72ecd4>", line 11, in extract_entity
    blob = bucket.get_blob('xlmfull/' + entity_number + '_full.xml')
  File "/Users/a/opt/anaconda3/envs/husx/lib/python3.7/site-packages/google/cloud/storage/bucket.py", line 899, in get_blob
    blob.reload(client=client, timeout=timeout)
  File "/Users/a/opt/anaconda3/envs/husx/lib/python3.7/site-packages/google/cloud/storage/_helpers.py", line 150, in reload
    timeout=timeout,
  File "/Users/a/opt/anaconda3/envs/husx/lib/python3.7/site-packages/google/cloud/_http.py", line 419, in api_request
    timeout=timeout,
  File "/Users/a/opt/anaconda3/envs/husx/lib/python3.7/site-packages/google/cloud/_http.py", line 277, in _make_request
    method, url, headers, data, target_object, timeout=timeout
  File "/Users/a/opt/anaconda3/envs/husx/lib/python3.7/site-packages/google/cloud/_http.py", line 315, in _do_request
    url=url, method=method, headers=headers, data=data, timeout=timeout
  File "/Users/a/opt/anaconda3/envs/husx/lib/python3.7/site-packages/google/auth/transport/requests.py", line 317, in request
    **kwargs
  File "/Users/a/opt/anaconda3/envs/husx/lib/python3.7/site-packages/requests/sessions.py", line 530, in request
    resp = self.send(prep, **send_kwargs)
  File "/Users/a/opt/anaconda3/envs/husx/lib/python3.7/site-packages/requests/sessions.py", line 643, in send
    r = adapter.send(request, **kwargs)
  File "/Users/a/opt/anaconda3/envs/husx/lib/python3.7/site-packages/requests/adapters.py", line 514, in send
    raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='storage.googleapis.com', port=443): Max retries exceeded with url: /storage/v1/b//o/xlmfull%2F310000052_full.xml?projection=noAcl (Caused by SSLError(SSLError(1, '[SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:2555)')))
"""

The above exception was the direct cause of the following exception:

SSLError                                  Traceback (most recent call last)
<ipython-input-81-5c48af30bc80> in <module>
      2 start_time = time.time()
      3 pool = Pool(processes=8)
----> 4 results = pool.map(extract_entity, range(0, 100))
      5 du = time.time() - start_time
      6 du

~/opt/anaconda3/envs/husx/lib/python3.7/multiprocessing/pool.py in map(self, func, iterable, chunksize)
    266         in a list that is returned.
    267         '''
--> 268         return self._map_async(func, iterable, mapstar, chunksize).get()
    269 
    270     def starmap(self, func, iterable, chunksize=None):

~/opt/anaconda3/envs/husx/lib/python3.7/multiprocessing/pool.py in get(self, timeout)
    655             return self._value
    656         else:
--> 657             raise self._value
    658 
    659     def _set(self, i, obj):

SSLError: None: Max retries exceeded with url: /storage/v1/b//o/xlmfull%2F310000052_full.xml?projection=noAcl (Caused by None)
RemoteTraceback回溯(最近一次调用)
远程回溯:
"""
回溯(最近一次呼叫最后一次):
文件“/Users/alireza/opt/anaconda3/envs/husx/lib/python3.7/site packages/urlib3/connectionpool.py”,第672行,在urlopen中
分块的,
文件“/Users/alireza/opt/anaconda3/envs/husx/lib/python3.7/site packages/urllib3/connectionpool.py”,第421行,在请求中
六、从(e,无)中提高
文件“”,第3行,从
文件“/Users/alireza/opt/anaconda3/envs/husx/lib/python3.7/site packages/urlib3/connectionpool.py”,第416行,在请求中
httplib_response=conn.getresponse()
文件“/Users/alireza/opt/anaconda3/envs/husx/lib/python3.7/http/client.py”,第1344行,在getresponse中
response.begin()
文件“/Users/alireza/opt/anaconda3/envs/husx/lib/python3.7/http/client.py”,第306行,在begin中
版本、状态、原因=self.\u读取\u状态()
文件“/Users/alireza/opt/anaconda3/envs/husx/lib/python3.7/http/client.py”,第267行,处于读取状态
line=str(self.fp.readline(_MAXLINE+1),“iso-8859-1”)
readinto中的文件“/Users/alireza/opt/anaconda3/envs/husx/lib/python3.7/socket.py”,第589行
返回自我。将袜子重新放入(b)
文件“/Users/a/opt/anaconda3/envs/husx/lib/python3.7/ssl.py”,第1071行,记录到
返回自读(N字节,缓冲区)
文件“/Users/a/opt/anaconda3/envs/husx/lib/python3.7/ssl.py”,第929行,已读
返回self.\u sslobj.read(len,buffer)
ssl.SSLError:[ssl:error\u VERSION\u NUMBER]error VERSION NUMBER(\u ssl.c:2555)
在处理上述异常期间,发生了另一个异常:
回溯(最近一次呼叫最后一次):
文件“/Users/a/opt/anaconda3/envs/husx/lib/python3.7/site packages/requests/adapters.py”,第449行,在send中
超时=超时
文件“/Users/a/opt/anaconda3/envs/husx/lib/python3.7/site packages/urllib3/connectionpool.py”,第720行,在urlopen中
方法,url,error=e,_pool=self,_stacktrace=sys.exc_info()[2]
文件“/Users/a/opt/anaconda3/envs/husx/lib/python3.7/site packages/urllib3/util/retry.py”,第436行,增量
引发MaxRetryError(_池、url、错误或响应错误(原因))
urllib3.exceptions.MaxRetryError:HTTPSConnectionPool(host='storage.googleapis.com',port=443):url超过最大重试次数:/storage/v1/b//o/xlmfull%2F310000052_full.xml?projection=noAcl(由SSLError引起(SSLError(1,[SSL:error_VERSION_VERSION_NUMBER]error版本号(_SSL.c:2555)))
在处理上述异常期间,发生了另一个异常:
回溯(最近一次呼叫最后一次):
worker中的文件“/Users/a/opt/anaconda3/envs/husx/lib/python3.7/multiprocessing/pool.py”,第121行
结果=(True,func(*args,**kwds))
文件“/Users/a/opt/anaconda3/envs/husx/lib/python3.7/multiprocessing/pool.py”,第44行,单位为m