Debugging CoGroupByKey在大数据上总是失败（PythonSDK）_Debugging_Google Cloud Dataflow_Google Dataflow

Debugging CoGroupByKey在大数据上总是失败（PythonSDK）

debugging google-cloud-dataflow

Debugging CoGroupByKey在大数据上总是失败（PythonSDK）,debugging,google-cloud-dataflow,google-dataflow,Debugging,Google Cloud Dataflow,Google Dataflow,我有大约4000个文件（平均每个7MB）输入当数据大小达到4GB左右时，我的管道在step CoGroupByKey上总是失败。我试图限制只使用300个文件，然后它就可以正常运行了如果失败，GCP数据流上的日志仅显示： Workflow failed. Causes: S24:CoGroup Geo data/GroupByKey/Read+CoGroup Geo data/GroupByKey/GroupByWindow+CoGroup Geo data/Map(_merge_tagge

我有大约4000个文件（平均每个7MB）输入

当数据大小达到4GB左右时，我的管道在step CoGroupByKey上总是失败。我试图限制只使用300个文件，然后它就可以正常运行了

如果失败，GCP数据流上的日志仅显示：

Workflow failed. Causes: S24:CoGroup Geo data/GroupByKey/Read+CoGroup Geo data/GroupByKey/GroupByWindow+CoGroup Geo data/Map(_merge_tagged_vals_under_key) failed., The job failed because a work item has failed 4 times. Look in previous log entries for the cause of each one of the 4 failures. For more information, see https://cloud.google.com/dataflow/docs/guides/common-errors. The work item was attempted on these workers: 
  store-migration-10212040-aoi4-harness-m7j7
      Root cause: The worker lost contact with the service.,
  store-migration-xxxxx
      Root cause: The worker lost contact with the service.,
  store-migration-xxxxx
      Root cause: The worker lost contact with the service.,
  store-migration-xxxxx
      Root cause: The worker lost contact with the service.

我在日志资源管理器中挖掘所有日志。除了上述错误之外，没有任何其他错误指示，即使是我的

logging.info

和

try…除了

code

我认为这与实例的记忆有关，但我没有深入研究这个方向。因为这是我在使用GCP服务时不想担心的

谢谢。

真有趣！谢谢分享<代码>工作人员与服务失去联系。当工作人员内存压力过高时，消息很常见。你能分享更多关于你的管道和CoGBK之后的函数的细节吗？同意Pablo的观点，这看起来像是内存问题。你有热键吗？你试过内存更大的机器吗？@Pablo我试过

n1-highmem-4

和

-8

，但它还是崩溃了。其中的GroupByKey表示，它拥有约15GB的mem数据，低于

-8

，并且仍然在那里崩溃。