Airflow “成功”回调和“失败”回调上的气流不使用数据块
我想定制我的DAG,以便在成功或失败时调用datarbicks笔记本。根据成功/失败案例,我创建了两个不同的函数来调用databricks笔记本。正在调用成功或失败回调函数,但未执行databricsks笔记本。下面是示例代码Airflow “成功”回调和“失败”回调上的气流不使用数据块,airflow,databricks,Airflow,Databricks,我想定制我的DAG,以便在成功或失败时调用datarbicks笔记本。根据成功/失败案例,我创建了两个不同的函数来调用databricks笔记本。正在调用成功或失败回调函数,但未执行databricsks笔记本。下面是示例代码 def task_success_callback(context): """ task_success callback """ context['task_instance'].task_id
def task_success_callback(context):
""" task_success callback """
context['task_instance'].task_id
print("success case")
dq_notebook_success_task_params = {
'existing_cluster_id': Variable.get("DATABRICKS_CLUSTER_ID"),
'notebook_task': {
'notebook_path': '/AAA/Airflow/Operators/audit_file_operator',
'base_parameters': {
"root": "dbfs:/mnt/aaa",
"audit_file_path": "/success_file_path/",
"table_name": "sample_data_table",
"audit_flag": "success"
}
}
}
DatabricksSubmitRunOperator(
task_id="weather_table_task_id",
databricks_conn_id='databricks_conn',
json=dq_notebook_success_task_params,
do_xcom_push=True,
secrets=[secret.Secret(
deploy_type='env',
deploy_target=None,
secret='adf-service-principal'
), secret.Secret(
deploy_type='env',
deploy_target=None,
secret='postgres-credentials',
)],
)
def task_failure_callback(context):
""" task_success callback """
context['task_instance'].task_id
print("failure case")
dq_notebook_failure_task_params = {
'existing_cluster_id': Variable.get("DATABRICKS_CLUSTER_ID"),
'notebook_task': {
'notebook_path': '/AAA/Airflow/Operators/audit_file_operator',
'base_parameters': {
"root": "dbfs:/mnt/aaa",
"audit_file_path": "/failure_file_path/",
"table_name": "sample_data_table",
"audit_flag": "failure"
}
}
}
DatabricksSubmitRunOperator(
task_id="weather_table_task_id",
databricks_conn_id='databricks_conn',
json=dq_notebook_failure_task_params,
do_xcom_push=True,
secrets=[secret.Secret(
deploy_type='env',
deploy_target=None,
secret='adf-service-principal'
), secret.Secret(
deploy_type='env',
deploy_target=None,
secret='postgres-credentials',
)],
)
DEFAULT_ARGS = {
"owner": "admin",
"depends_on_past": False,
"start_date": datetime(2020, 9, 23),
"on_success_callback": task_success_callback,
"on_failure_callback": task_failure_callback,
"email": ["airflow@airflow.com"],
"email_on_failure": False,
"email_on_retry": False,
"retries": 1,
"retry_delay": timedelta(seconds=10),
}
==================
Remaining DAG code
==================
在Airflow中,每个操作员都有定义操作员逻辑的
execute()
方法。创建工作流并初始化构造函数时,呈现模板并为您调用execute方法。然而,当您在python函数中定义运算符时,您还需要自己处理这个问题
所以当你写作时:
def task_success_callback(context):
DatabricksSubmitRunOperator(..)
您在这里所做的只是初始化DatabricksSubmitRunOperator
接触器。您没有调用运算符逻辑
您需要做的是:
def task_success_callback(context):
op = DatabricksSubmitRunOperator(..)
op.execute()
谢谢Elad的回复,它对我很有用。@Kiran如果解决了,请接受答案:)Elad,知道如何将参数传递给task_success_回调(上下文)函数吗?@Kiran请参阅Elad,谢谢您的回复,但这只是静态参数,但我需要将这些参数作为动态参数传递。您能帮助我如何传递动态参数吗?Elad,这里我想传递多个表加载的回调函数的表名和审核文件名路径。@Elad,您对此有任何输入吗。
TableList = collections.namedtuple(
"table_list",
"table_name audit_file_name",
)
LIST_OF_TABLES = [
TableList(
table_name="table1",
audit_file_name="/testdata/Audit_files/",
),
TableList(
table_name="table2",
audit_file_name="/testdata/Audit_files/",
),
TableList(
table_name="table3",
audit_file_name="/testdata/Audit_files/",
),
TableList(
table_name="table4",
audit_file_name="/testdata/Audit_files/",
)
]
for table in LIST_OF_TABLES:
DEFAULT_ARGS = {
"owner": "admin",
"depends_on_past": False,
"start_date": datetime(2020, 9, 23),
"on_success_callback": partial(task_success_callback,table.table_name,table.audit_file_name),
"on_failure_callback": partial(task_failure_callback,table.table_name,table.audit_file_name),
"email": ["airflow@airflow.com"],
"email_on_failure": False,
"email_on_retry": False,
"retries": 1,
"retry_delay": timedelta(seconds=10),
}
WORKFLOW = DAG(
'test_dag',
default_args=DEFAULT_ARGS,
schedule_interval="30 3 * * 1",
catchup=False,
)