Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/google-app-engine/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 在应用引擎CRON上部署Google数据流作业时出错_Python_Google App Engine_Google Cloud Dataflow_Apache Beam - Fatal编程技术网

Python 在应用引擎CRON上部署Google数据流作业时出错

Python 在应用引擎CRON上部署Google数据流作业时出错,python,google-app-engine,google-cloud-dataflow,apache-beam,Python,Google App Engine,Google Cloud Dataflow,Apache Beam,(续自a) 我正在尝试部署一个googledataflow作业,以在googleappengine上作为cron作业运行它,方法如下所述 我在pipelines/script.py文件夹中有一个数据流脚本(用python编写)。在本地(使用apachebeamDirectRunner)或在谷歌云上(使用DataFlowRunner)运行此脚本可以正常工作。但当部署作业以在app engine上定期运行时,作业在执行时会引发以下错误: (4cb822d7f796239a): Traceback (

(续自a)

我正在尝试部署一个googledataflow作业,以在googleappengine上作为cron作业运行它,方法如下所述

我在pipelines/script.py文件夹中有一个数据流脚本(用python编写)。在本地(使用apachebeam
DirectRunner
)或在谷歌云上(使用
DataFlowRunner
)运行此脚本可以正常工作。但当部署作业以在app engine上定期运行时,作业在执行时会引发以下错误:

(4cb822d7f796239a): Traceback (most recent call last):   File
"/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py",
line 582, in do_work
    work_executor.execute()   File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py",
line 166, in execute
    op.start()   File "apache_beam/runners/worker/operations.py", line 294, in apache_beam.runners.worker.operations.DoOperation.start
(apache_beam/runners/worker/operations.c:10607)
    def start(self):   File "apache_beam/runners/worker/operations.py", line 295, in
apache_beam.runners.worker.operations.DoOperation.start
(apache_beam/runners/worker/operations.c:10501)
    with self.scoped_start_state:   File "apache_beam/runners/worker/operations.py", line 300, in
apache_beam.runners.worker.operations.DoOperation.start
(apache_beam/runners/worker/operations.c:9702)
    pickler.loads(self.spec.serialized_fn))   File "/usr/local/lib/python2.7/dist-
packages/apache_beam/internal/pickler.py", line 225, in loads
    return dill.loads(s)   File "/usr/local/lib/python2.7/dist-packages/dill/dill.py", line 277, in
loads
    return load(file)   File "/usr/local/lib/python2.7/dist-packages/dill/dill.py", line 266, in
load
    obj = pik.load()   File "/usr/lib/python2.7/pickle.py", line 858, in load
    dispatch[key](self)   File "/usr/lib/python2.7/pickle.py", line 1090, in load_global
    klass = self.find_class(module, name)   File "/usr/local/lib/python2.7/dist-packages/dill/dill.py", line 423, in
find_class
    return StockUnpickler.find_class(self, module, name)   File "/usr/lib/python2.7/pickle.py", line 1124, in find_class
    __import__(module) ImportError: No module named pipelines.spanner_backup
这是直接访问google云控制台dataflow面板中的作业时可见的堆栈跟踪。但是,如果我单击“堆栈跟踪”以从“Stackdriver错误报告”面板查看错误堆栈跟踪,我会看到以下跟踪:

Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 738, in run
    work, execution_context, env=self.environment)
  File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/workitem.py", line 130, in get_work_items
    work_item_proto.sourceOperationTask.split)
  File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/workercustomsources.py", line 142, in __init__
    source_spec[names.SERIALIZED_SOURCE_KEY]['value'])
  File "/usr/local/lib/python2.7/dist-packages/apache_beam/internal/pickler.py", line 225, in loads
    return dill.loads(s)
  File "/usr/local/lib/python2.7/dist-packages/dill/dill.py", line 277, in loads
    return load(file)
  File "/usr/local/lib/python2.7/dist-packages/dill/dill.py", line 266, in load
    obj = pik.load()
  File "/usr/lib/python2.7/pickle.py", line 858, in load
    dispatch[key](self)
  File "/usr/lib/python2.7/pickle.py", line 1090, in load_global
    klass = self.find_class(module, name)
  File "/usr/local/lib/python2.7/dist-packages/dill/dill.py", line 423, in find_class
    return StockUnpickler.find_class(self, module, name)
  File "/usr/lib/python2.7/pickle.py", line 1124, in find_class
    __import__(module)
ImportError: No module named spanner.client
建议在工作人员之间共享内容时出现导入错误?谷歌扳手应该正确安装虽然

我正在使用:

Flask==0.12.2 
apache-beam[gcp]==2.1.1 
gunicorn==19.7.1 
gevent==1.2.1
google-cloud-dataflow==2.1.1 
google-cloud-spanner==0.26
我错过什么了吗

编辑: My setup.py如下所示:(如上所述,对应的github链接带有注释)


这是我的问题的解决方案,记录在案。谢谢Marcin Zabloki帮了我的忙

似乎我没有正确地将安装文件链接到管道。通过替换

pipeline_options = PipelineOptions()
pipeline_options.view_as(SetupOptions).save_main_session = True
pipeline_options.view_as(SetupOptions).requirements_file = "requirements.txt"
google_cloud_options = pipeline_options.view_as(GoogleCloudOptions)
google_cloud_options.project = PROJECT_ID
google_cloud_options.job_name = JOB_NAME
google_cloud_options.staging_location = '%s/staging' % BUCKET_URL
google_cloud_options.temp_location = '%s/tmp' % BUCKET_URL
pipeline_options.view_as(StandardOptions).runner = 'DataflowRunner'

(将要安装的模块添加到setup.py文件而不是requirements.txt中)以及将我在本地使用的模块加载到ParDos中而不是文件的开头,我能够部署脚本


不这样做似乎会导致一些奇怪的、未定义的行为(例如函数没有找到在同一文件中定义的类),而不是明确的错误消息。

以下是我的问题的解决方案,以供记录。谢谢Marcin Zabloki帮了我的忙

似乎我没有正确地将安装文件链接到管道。通过替换

pipeline_options = PipelineOptions()
pipeline_options.view_as(SetupOptions).save_main_session = True
pipeline_options.view_as(SetupOptions).requirements_file = "requirements.txt"
google_cloud_options = pipeline_options.view_as(GoogleCloudOptions)
google_cloud_options.project = PROJECT_ID
google_cloud_options.job_name = JOB_NAME
google_cloud_options.staging_location = '%s/staging' % BUCKET_URL
google_cloud_options.temp_location = '%s/tmp' % BUCKET_URL
pipeline_options.view_as(StandardOptions).runner = 'DataflowRunner'

(将要安装的模块添加到setup.py文件而不是requirements.txt中)以及将我在本地使用的模块加载到ParDos中而不是文件的开头,我能够部署脚本


不这样做似乎会导致一些奇怪的、未定义的行为(例如函数未找到同一文件中定义的类),而不是清除错误消息。

管道选项中是否有
--save_main_session
?如果是,请尝试删除它以进行重新格式化。是的,从我的计算机提交作业时,需要使用DataflowRunner运行作业。但是,删除它会导致相同的错误。好的,请也添加
setup.py的内容。我添加了它。我应该为setup.py文件中“CUSTOM_COMMAND”中的workers中需要的所有模块添加“pip instal***”吗?或者您可以尝试用您的模块填充
REQUIRED_包
,如下所示:
REQUIRED_包=[“google cloud扳手==0.26”,“另一个模块==1.0”]
等…您是否有管道选项中的
--保存主会话
?如果是,请尝试删除它以进行重新格式化。是的,从我的计算机提交作业时,需要使用DataflowRunner运行作业。但是,删除它会导致相同的错误。好的,请也添加
setup.py的内容。我添加了它。我应该为setup.py文件中“CUSTOM_COMMAND”中的workers中需要的所有模块添加“pip instal***”吗?或者您可以尝试用您的模块填充
REQUIRED_包
,例如:
REQUIRED_包=[“google cloud panner==0.26”,“另一个模块==1.0”]
等等。。。
pipeline_options = PipelineOptions()
pipeline_options.view_as(SetupOptions).setup_file = "./setup.py"
google_cloud_options = pipeline_options.view_as(GoogleCloudOptions)
google_cloud_options.project = PROJECT_ID
google_cloud_options.job_name = JOB_NAME
google_cloud_options.staging_location = '%s/staging' % BUCKET_URL
google_cloud_options.temp_location = '%s/tmp' % BUCKET_URL
pipeline_options.view_as(StandardOptions).runner = 'DataflowRunner'