Google cloud platform 导入错误:cloud composer中的Python数据流作业
我可以在CloudComposer中将单个文件作为数据流作业运行,但当我将其作为包运行时,它会失败Google cloud platform 导入错误:cloud composer中的Python数据流作业,google-cloud-platform,airflow,google-cloud-composer,Google Cloud Platform,Airflow,Google Cloud Composer,我可以在CloudComposer中将单个文件作为数据流作业运行,但当我将其作为包运行时,它会失败 pipeline_jobs/ -- __init__.py -- run.py (main file) -- setup.py -- data_pipeline/ ----- __init__.py ----- tasks.py ----- transform.py ----- util.py 我得到了这个错误: WARNING - File "/tmp/dataflowd232f-run
pipeline_jobs/
-- __init__.py
-- run.py (main file)
-- setup.py
-- data_pipeline/
----- __init__.py
----- tasks.py
----- transform.py
----- util.py
我得到了这个错误:
WARNING - File "/tmp/dataflowd232f-run.py", line 14, in <module
{gcp_dataflow_hook.py:120} WARNING - from data_pipeline.tasks import task
WARNING - ImportError: No module named data_pipeline.tasks.
WARNING-File“/tmp/dataflowd232f run.py”,第14行,在中,尝试将整个管道作业/放入dags文件夹中,然后将数据流py文件引用为:/home/afflow/gcs/dags/pipeline\u jobs/run.py DAG位于cloud composer中的gcs桶中。我尝试过从gcs\u bucket/dags/pipeline\u jobs/run.py运行相同的操作。
但出现了相同的错误。
from datetime import datetime, timedelta
from airflow import DAG
from airflow.contrib.operators.dataflow_operator import DataFlowPythonOperator
default_args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': datetime.strptime("2017-11-01","%Y-%m-%d"),
'py_options': [],
'dataflow_default_options': {
'start-date': '20171101',
'end-date': '20171101',
'project': '<project-id>',
'region': '<location>',
'temp_location': 'gs://<bucket>/flow/tmp',
'staging_location': 'gs://<bucket>/flow/staging',
'setup_file': 'gs://<bucket>/dags/pipeline_jobs/setup.py',
'runner': 'DataFlowRunner',
'job_name': 'job_name_lookup',
'task-id': 'run_pipeline'
},
}
dag = DAG(
dag_id='pipeline_01',
default_args=default_args,
max_active_runs=1,
concurrency =1
)
task_1 = DataFlowPythonOperator(
py_file = 'gs://<bucket>/dags/pipeline_jobs/run.py',
gcp_conn_id='google_cloud_default',
task_id='run_job',
dag=dag)