Python 数据流作业在超过6小时后失败,带有“0”;工人与服务部门失去了联系;?

Python 数据流作业在超过6小时后失败,带有“0”;工人与服务部门失去了联系;?,python,google-cloud-platform,google-bigquery,google-cloud-dataflow,spacy,Python,Google Cloud Platform,Google Bigquery,Google Cloud Dataflow,Spacy,我正在使用Dataflow从BigQuery读取数据,然后使用python进行NLP预处理。我正在使用python3和sdk2.16.0。我在europe-west6和europe-west1使用100名工人(提供IP、私人访问和云NAT)。BigQuery表位于US中。测试作业正常工作,但在尝试处理完整表(32 GB)时,作业在6小时40分钟后失败,很难完全理解潜在错误 首先,Dataflow报告以下内容: 这有点令人困惑:在一个案例中,工作项目失败,另外两名工人与服务部门失去联系,一名工人被

我正在使用
Dataflow
BigQuery
读取数据,然后使用python进行NLP预处理。我正在使用
python3
sdk2.16.0
。我在
europe-west6
europe-west1
使用100名工人(提供IP、私人访问和云NAT)。
BigQuery
表位于
US
中。测试作业正常工作,但在尝试处理完整表(32 GB)时,作业在6小时40分钟后失败,很难完全理解潜在错误

首先,Dataflow报告以下内容: 这有点令人困惑:在一个案例中,工作项目失败,另外两名工人与服务部门失去联系,一名工人被报告死亡

现在让我们看看读取BigQuery数据的日志: 第一件可疑的事情是在完整数据流作业期间每隔3秒出现一条消息“由于401(尝试1/2)而刷新”。我认为这与坠机事件无关,但这很奇怪。BigQuery问题的时间戳(16:28:07和16:28:15)出现在向工作人员报告的问题(16:27:44)之后

尝试执行工作项7962803802081012962:Traceback(最近一次调用last)时引发异常:
文件“/usr/local/lib/python3.6/site packages/dataflow\u-worker/batchworker.py”,第649行,在do\u-work中
工作执行器。执行器()
文件“/usr/local/lib/python3.6/site packages/dataflow_worker/executor.py”,执行中第176行
作品:开始()
文件“dataflow\u worker/native\u operations.py”,第38行,位于dataflow\u worker.native\u operations.nativereadcoperation.start中
文件“dataflow\u worker/native\u operations.py”,第39行,位于dataflow\u worker.native\u operations.NativeReadOperation.start中
文件“dataflow_worker/native_operations.py”,第44行,位于dataflow_worker.native_operations.NativeReadOperation.start中
文件“dataflow\u worker/native\u operations.py”,第48行,位于dataflow\u worker.native\u operations.nativereadcoperation.start中
文件“/usr/local/lib/python3.6/site packages/dataflow\u worker/nativefileio.py”,第204行,在__
对于self.read_next_block()中的记录:
文件“/usr/local/lib/python3.6/site packages/dataflow\u worker/nativeavroio.py”,第198行,在read\u next\u块中
fastavro_block=next(self._block_迭代器)
文件“fastavro/_read.pyx”,第738行,在fastavro.\u read.File\u reader.next中
文件“fastavro/\u read.pyx”,第662行,在\u iter\u avro\u块中
文件“fastavro/_read.pyx”,第595行,在fastavro._read.null_read_块中
文件“fastavro/_read.pyx”,第597行,在fastavro._read.null_read_块中
文件“fastavro/_read.pyx”,第304行,以fastavro._read.read_字节表示
文件“/usr/local/lib/python3.6/site packages/apache_beam/io/filesystemio.py”,第113行,在readinto中
数据=自身。下载程序。获取范围(开始、结束)
文件“/usr/local/lib/python3.6/site packages/apache_beam/io/gcp/gcsio.py”,第522行,在get_范围内
self.\u downloader.GetRange(开始,结束-1)
文件“/usr/local/lib/python3.6/site packages/apitools/base/py/transfer.py”,第486行,在GetRange中
响应=自。\处理响应(响应)
文件“/usr/local/lib/python3.6/site packages/apitools/base/py/transfer.py”,第424行,在u_ProcessResponse中
引发异常.HttpError.FromResponse(响应)
apitools.base.py.exceptions.HttpNotFoundError:HttpError访问:响应:,内容
回溯(最近一次呼叫最后一次):
文件“/usr/local/lib/python3.6/site packages/dataflow\u-worker/batchworker.py”,第649行,在do\u-work中
工作执行器。执行器()
文件“/usr/local/lib/python3.6/site packages/dataflow_worker/executor.py”,执行中第176行
作品:开始()
文件“dataflow\u worker/native\u operations.py”,第38行,位于dataflow\u worker.native\u operations.nativereadcoperation.start中
文件“dataflow\u worker/native\u operations.py”,第39行,位于dataflow\u worker.native\u operations.NativeReadOperation.start中
文件“dataflow_worker/native_operations.py”,第44行,位于dataflow_worker.native_operations.NativeReadOperation.start中
文件“dataflow\u worker/native\u operations.py”,第48行,位于dataflow\u worker.native\u operations.nativereadcoperation.start中
文件“/usr/local/lib/python3.6/site packages/dataflow\u worker/nativefileio.py”,第204行,在__
对于self.read_next_block()中的记录:
文件“/usr/local/lib/python3.6/site packages/dataflow\u worker/nativeavroio.py”,第198行,在read\u next\u块中
fastavro_block=next(self._block_迭代器)
文件“fastavro/_read.pyx”,第738行,在fastavro.\u read.File\u reader.next中
文件“fastavro/\u read.pyx”,第662行,在\u iter\u avro\u块中
文件“fastavro/_read.pyx”,第595行,在fastavro._read.null_read_块中
文件“fastavro/_read.pyx”,第597行,在fastavro._read.null_read_块中
文件“fastavro/_read.pyx”,第304行,以fastavro._read.read_字节表示
文件“/usr/local/lib/python3.6/site packages/apache_beam/io/filesystemio.py”,第113行,在readinto中
数据=自身。下载程序。获取范围(开始、结束)
文件“/usr/local/lib/python3.6/site packages/apache_beam/io/gcp/gcsio.py”,第522行,在get_范围内
self.\u downloader.GetRange(开始,结束-1)
文件“/usr/local/lib/python3.6/site packages/apitools/base/py/transfer.py”,第486行,在GetRange中
响应=自。\处理响应(响应)
文件“/usr/local/lib/python3.6/site packages/apitools/base/py/transfer.py”,第424行,在u_ProcessResponse中
引发异常.HttpError.FromResponse(响应)
apitools.base.py.exceptions.HttpNotFoundError:HttpError访问:响应:,内容
时间戳
2019-11-19T15:28:07.770312309Z
记录器
root:batchworker.py:do\u work
严重程度
错误
工人
stackoverflow-xxxx-191-11190044-7wyy-线束-2k89
步
阅读BigQuery的帖子
线
73:140029564072960
工人们似乎在云存储上找不到一些avro文件。这可能与消息“工人与服务失去联系”有关

如果我看“错误”,我会看到很多错误,所以它是错误的
An exception was raised when trying to execute the workitem 7962803802081012962 : Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/dataflow_worker/batchworker.py", line 649, in do_work
    work_executor.execute()
  File "/usr/local/lib/python3.6/site-packages/dataflow_worker/executor.py", line 176, in execute
    op.start()
  File "dataflow_worker/native_operations.py", line 38, in dataflow_worker.native_operations.NativeReadOperation.start
  File "dataflow_worker/native_operations.py", line 39, in dataflow_worker.native_operations.NativeReadOperation.start
  File "dataflow_worker/native_operations.py", line 44, in dataflow_worker.native_operations.NativeReadOperation.start
  File "dataflow_worker/native_operations.py", line 48, in dataflow_worker.native_operations.NativeReadOperation.start
  File "/usr/local/lib/python3.6/site-packages/dataflow_worker/nativefileio.py", line 204, in __iter__
    for record in self.read_next_block():
  File "/usr/local/lib/python3.6/site-packages/dataflow_worker/nativeavroio.py", line 198, in read_next_block
    fastavro_block = next(self._block_iterator)
  File "fastavro/_read.pyx", line 738, in fastavro._read.file_reader.next
  File "fastavro/_read.pyx", line 662, in _iter_avro_blocks
  File "fastavro/_read.pyx", line 595, in fastavro._read.null_read_block
  File "fastavro/_read.pyx", line 597, in fastavro._read.null_read_block
  File "fastavro/_read.pyx", line 304, in fastavro._read.read_bytes
  File "/usr/local/lib/python3.6/site-packages/apache_beam/io/filesystemio.py", line 113, in readinto
    data = self._downloader.get_range(start, end)
  File "/usr/local/lib/python3.6/site-packages/apache_beam/io/gcp/gcsio.py", line 522, in get_range
    self._downloader.GetRange(start, end - 1)
  File "/usr/local/lib/python3.6/site-packages/apitools/base/py/transfer.py", line 486, in GetRange
    response = self.__ProcessResponse(response)
  File "/usr/local/lib/python3.6/site-packages/apitools/base/py/transfer.py", line 424, in __ProcessResponse
    raise exceptions.HttpError.FromResponse(response)
apitools.base.py.exceptions.HttpNotFoundError: HttpError accessing <https://www.googleapis.com/storage/v1/b/xxx/o/beam%2Ftemp%2Fstackoverflow-raphael-191119-084402.1574153042.687677%2F11710707918635668555%2F000000000009.avro?alt=media&generation=1574154204169350>: response: <{'x-guploader-uploadid': 'AEnB2UpgIuanY0AawrT7fRC_VW3aRfWSdrrTwT_TqQx1fPAAAUohVoL-8Z8Zw_aYUQcSMNqKIh5R2TulvgHHsoxLWo2gl6wUEA', 'content-type': 'text/html; charset=UTF-8', 'date': 'Tue, 19 Nov 2019 15:28:07 GMT', 'vary': 'Origin, X-Origin', 'expires': 'Tue, 19 Nov 2019 15:28:07 GMT', 'cache-control': 'private, max-age=0', 'content-length': '142', 'server': 'UploadServer', 'status': '404'}>, content <No such object: nlp-text-classification/beam/temp/stackoverflow-xxxx-191119-084402.1574153042.687677/11710707918635668555/000000000009.avro>

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/dataflow_worker/batchworker.py", line 649, in do_work
    work_executor.execute()
  File "/usr/local/lib/python3.6/site-packages/dataflow_worker/executor.py", line 176, in execute
    op.start()
  File "dataflow_worker/native_operations.py", line 38, in dataflow_worker.native_operations.NativeReadOperation.start
  File "dataflow_worker/native_operations.py", line 39, in dataflow_worker.native_operations.NativeReadOperation.start
  File "dataflow_worker/native_operations.py", line 44, in dataflow_worker.native_operations.NativeReadOperation.start
  File "dataflow_worker/native_operations.py", line 48, in dataflow_worker.native_operations.NativeReadOperation.start
  File "/usr/local/lib/python3.6/site-packages/dataflow_worker/nativefileio.py", line 204, in __iter__
    for record in self.read_next_block():
  File "/usr/local/lib/python3.6/site-packages/dataflow_worker/nativeavroio.py", line 198, in read_next_block
    fastavro_block = next(self._block_iterator)
  File "fastavro/_read.pyx", line 738, in fastavro._read.file_reader.next
  File "fastavro/_read.pyx", line 662, in _iter_avro_blocks
  File "fastavro/_read.pyx", line 595, in fastavro._read.null_read_block
  File "fastavro/_read.pyx", line 597, in fastavro._read.null_read_block
  File "fastavro/_read.pyx", line 304, in fastavro._read.read_bytes
  File "/usr/local/lib/python3.6/site-packages/apache_beam/io/filesystemio.py", line 113, in readinto
    data = self._downloader.get_range(start, end)
  File "/usr/local/lib/python3.6/site-packages/apache_beam/io/gcp/gcsio.py", line 522, in get_range
    self._downloader.GetRange(start, end - 1)
  File "/usr/local/lib/python3.6/site-packages/apitools/base/py/transfer.py", line 486, in GetRange
    response = self.__ProcessResponse(response)
  File "/usr/local/lib/python3.6/site-packages/apitools/base/py/transfer.py", line 424, in __ProcessResponse
    raise exceptions.HttpError.FromResponse(response)
apitools.base.py.exceptions.HttpNotFoundError: HttpError accessing <https://www.googleapis.com/storage/v1/b/xxxx/o/beam%2Ftemp%2Fstackoverflow-raphael-191119-084402.1574153042.687677%2F11710707918635668555%2F000000000009.avro?alt=media&generation=1574154204169350>: response: <{'x-guploader-uploadid': 'AEnB2UpgIuanY0AawrT7fRC_VW3aRfWSdrrTwT_TqQx1fPAAAUohVoL-8Z8Zw_aYUQcSMNqKIh5R2TulvgHHsoxLWo2gl6wUEA', 'content-type': 'text/html; charset=UTF-8', 'date': 'Tue, 19 Nov 2019 15:28:07 GMT', 'vary': 'Origin, X-Origin', 'expires': 'Tue, 19 Nov 2019 15:28:07 GMT', 'cache-control': 'private, max-age=0', 'content-length': '142', 'server': 'UploadServer', 'status': '404'}>, content <No such object: nlp-text-classification/beam/temp/stackoverflow-xxxx-191119-084402.1574153042.687677/11710707918635668555/000000000009.avro>
timestamp   
2019-11-19T15:28:07.770312309Z
logger  
root:batchworker.py:do_work
severity    
ERROR
worker  
stackoverflow-xxxx-191-11190044-7wyy-harness-2k89
step    
Read Posts from BigQuery
thread  
73:140029564072960