Airflow 气流-如果其中一个失败,则运行任务

Airflow 气流-如果其中一个失败,则运行任务,airflow,Airflow,我希望有一个dag,如果在任何任务中出现错误,它应该自动执行一个任务来“重置”表并结束进程。例如: #Task that needs to be performed if any of the above fails drop_bq_op = BigQueryOperator( task_id='drop_bq', use_legacy_sql=False, allow_large_results=True, bql="""DELETE FROM dataset.

我希望有一个dag,如果在任何任务中出现错误,它应该自动执行一个任务来“重置”表并结束进程。例如:

#Task that needs to be performed if any of the above fails
drop_bq_op = BigQueryOperator(
    task_id='drop_bq',
    use_legacy_sql=False,
    allow_large_results=True,
    bql="""DELETE FROM dataset.table1 WHERE ID IS NOT NULL""",
    bigquery_conn_id='gcp',
    dag=dag)

#task1
MsSql = MsSqlToGoogleCloudStorageOperator(
    task_id='import',
    mssql_conn_id=mssql,
    google_cloud_storage_conn_id='gcp',
    sql=sql_query,
    bucket=nm_bucket,
    filename=nm_arquivo,
    schema_filename=sc_arquivo,
    dag=dag)

#task2
Google = GoogleCloudStorageToBigQueryOperator(
    task_id='gcs_to_bq',
    bucket='bucket',
    source_objects=[nm_arquivo],
    destination_project_dataset_table=dataset_bq_tbl,
    schema_fields=sc_tbl_bq,
    source_format='NEWLINE_DELIMITED_JSON',
    create_disposition='CREATE_IF_NEEDED',
    write_disposition=wrt_disposition,
    time_partitioning=tp_particao,
    cluster_fields=nm_cluster,
    bigquery_conn_id='gcp',
    google_cloud_storage_conn_id='gcp',
    dag=dag
)

task_3 = BigQueryOperator(
    task_id='test3',
    use_legacy_sql=False,
    allow_large_results=True,
    bql="""select ...""",
    bigquery_conn_id='gcp',
    dag=dag)
更新:我在脚本中包含以下代码:

def delete_bigquery():
    """query bigquery to get data to import to PSQL"""
    client = bigquery.Client()
    query = "DELETE FROM dataset.table1 WHERE ID IS NOT NULL"
    dataset = client.dataset('dataset')
    table = dataset.table(name='table1')
    job_name = 'delete_{}'.format(uuid.uuid4())
    job = client.run_async_query(job_name, query)
    job.destination = table
    job.write_disposition = 'WRITE_TRUNCATE'
    job.begin()
    return job.state

cleanup_task = PythonOperator(task_id="cleanup_task",
                              python_callable=delete_bigquery,
                              trigger_rule=TriggerRule.ONE_FAILED,
                              dag=dag)

[gcs_to_bq.set_upstream(import), task_3.set_upstream(gcs_to_bq)] >> cleanup_task
现在,当我再次进入dag时,会出现这个错误:

断开的DAG:[DAG.py]关系只能在运算符之间设置; 接收非类型

  • 这是一个典型的案例

  • 您可以创建
    cleanup_任务
    ,并将其与所有上游任务(需要清理的)挂钩,然后分配给它


我试图保持任务的顺序,所以我写了这个顺序:“[gcs_to_bq.set_upstream(import),task3.set_upstream(gcs_to_bq)]>>cleanup_task”,但我有一个错误消息:“只能在操作员之间设置关系;接收到的非类型”
# refer code here https://github.com/apache/airflow/blob/master/airflow/utils/trigger_rule.py#L28
from airflow.utils.trigger_rule import TriggerRule
..
cleanup_task = PythonOperator(dag_id="..",
                              task_id="cleanup_task"
                              ..
                              trigger_rule=TriggerRule.ONE_FAILED
                              ..)
..
# all tasks that must be cleaned-up should have `cleanup_task` in their downstream
[my_task_1, my_task_2, my_task_3] >> cleanup_task