Airflow 气流-如果其中一个失败,则运行任务
我希望有一个dag,如果在任何任务中出现错误,它应该自动执行一个任务来“重置”表并结束进程。例如:Airflow 气流-如果其中一个失败,则运行任务,airflow,Airflow,我希望有一个dag,如果在任何任务中出现错误,它应该自动执行一个任务来“重置”表并结束进程。例如: #Task that needs to be performed if any of the above fails drop_bq_op = BigQueryOperator( task_id='drop_bq', use_legacy_sql=False, allow_large_results=True, bql="""DELETE FROM dataset.
#Task that needs to be performed if any of the above fails
drop_bq_op = BigQueryOperator(
task_id='drop_bq',
use_legacy_sql=False,
allow_large_results=True,
bql="""DELETE FROM dataset.table1 WHERE ID IS NOT NULL""",
bigquery_conn_id='gcp',
dag=dag)
#task1
MsSql = MsSqlToGoogleCloudStorageOperator(
task_id='import',
mssql_conn_id=mssql,
google_cloud_storage_conn_id='gcp',
sql=sql_query,
bucket=nm_bucket,
filename=nm_arquivo,
schema_filename=sc_arquivo,
dag=dag)
#task2
Google = GoogleCloudStorageToBigQueryOperator(
task_id='gcs_to_bq',
bucket='bucket',
source_objects=[nm_arquivo],
destination_project_dataset_table=dataset_bq_tbl,
schema_fields=sc_tbl_bq,
source_format='NEWLINE_DELIMITED_JSON',
create_disposition='CREATE_IF_NEEDED',
write_disposition=wrt_disposition,
time_partitioning=tp_particao,
cluster_fields=nm_cluster,
bigquery_conn_id='gcp',
google_cloud_storage_conn_id='gcp',
dag=dag
)
task_3 = BigQueryOperator(
task_id='test3',
use_legacy_sql=False,
allow_large_results=True,
bql="""select ...""",
bigquery_conn_id='gcp',
dag=dag)
更新:我在脚本中包含以下代码:
def delete_bigquery():
"""query bigquery to get data to import to PSQL"""
client = bigquery.Client()
query = "DELETE FROM dataset.table1 WHERE ID IS NOT NULL"
dataset = client.dataset('dataset')
table = dataset.table(name='table1')
job_name = 'delete_{}'.format(uuid.uuid4())
job = client.run_async_query(job_name, query)
job.destination = table
job.write_disposition = 'WRITE_TRUNCATE'
job.begin()
return job.state
cleanup_task = PythonOperator(task_id="cleanup_task",
python_callable=delete_bigquery,
trigger_rule=TriggerRule.ONE_FAILED,
dag=dag)
[gcs_to_bq.set_upstream(import), task_3.set_upstream(gcs_to_bq)] >> cleanup_task
现在,当我再次进入dag时,会出现这个错误:
断开的DAG:[DAG.py]关系只能在运算符之间设置;
接收非类型
- 这是一个典型的案例
- 您可以创建
,并将其与所有上游任务(需要清理的)挂钩,然后分配给它cleanup_任务
我试图保持任务的顺序,所以我写了这个顺序:“[gcs_to_bq.set_upstream(import),task3.set_upstream(gcs_to_bq)]>>cleanup_task”,但我有一个错误消息:“只能在操作员之间设置关系;接收到的非类型”
# refer code here https://github.com/apache/airflow/blob/master/airflow/utils/trigger_rule.py#L28
from airflow.utils.trigger_rule import TriggerRule
..
cleanup_task = PythonOperator(dag_id="..",
task_id="cleanup_task"
..
trigger_rule=TriggerRule.ONE_FAILED
..)
..
# all tasks that must be cleaned-up should have `cleanup_task` in their downstream
[my_task_1, my_task_2, my_task_3] >> cleanup_task