Airflow 如何使用mysqltologlecloudstorageoperator从Xcom呈现值

Airflow 如何使用mysqltologlecloudstorageoperator从Xcom呈现值,airflow,Airflow,我有以下代码: import_orders_op = MySqlToGoogleCloudStorageOperator( task_id='import_orders', mysql_conn_id='mysql_con', google_cloud_storage_conn_id='gcp_con', sql='SELECT * FROM orders where orders_id>{0};'.format(LAST_IMPORTED_ORDER_ID

我有以下代码:

import_orders_op = MySqlToGoogleCloudStorageOperator(
    task_id='import_orders',
    mysql_conn_id='mysql_con',
    google_cloud_storage_conn_id='gcp_con',
    sql='SELECT * FROM orders where orders_id>{0};'.format(LAST_IMPORTED_ORDER_ID),
    bucket=GCS_BUCKET_ID,
    filename=file_name,
    dag=dag) 
我想将查询更改为:

sql='SELECT * FROM orders where orders_id>{0} and orders_id<{1};'.format(LAST_IMPORTED_ORDER_ID, ...)
它给出:

损坏的DAG:未定义名称“任务\实例”


在dag文件中,您没有在dagrun上下文中主动使用现有任务实例

您只能在操作员运行时提取该值,而不是在设置该值时,后者的上下文由调度程序在循环中执行,并且每天将运行1000次,即使DAG每周运行一次或已禁用。但是你写的东西实际上非常接近于一些可能有效的东西,所以也许你已经考虑过这个背景点

让我们把它写成一个模板:

# YOUR EXAMPLE FORMATTED A BIT MORE 80 COLS SYTLE
…
sql='SELECT * FROM orders where orders_id>{0} and orders_id<{1}'.format(
    LAST_IMPORTED_ORDER_ID,
    {{ task_instance.xcom_pull(
        task_ids=['get_max_order_id'], key='result_status') }}),
…

# SHOULD HAVE BEEN AT LEAST: I hope you can spot the difference.
…
sql='SELECT * FROM orders where orders_id>{0} and orders_id<{1}'.format(
    LAST_IMPORTED_ORDER_ID,
    "{{ task_instance.xcom_pull("
    "task_ids=['get_max_order_id'], key='result_status') }}"),
…

# AND COULD HAVE BEEN MORE CLEARLY READABLE AS:
…
sql='''
SELECT *
FROM orders
WHERE orders_id > {{ params.last_imported_id }}
  AND orders_id < {{ ti.xcom_pull('get_max_order_id') }}
''',
params={'last_imported_id': LAST_IMPORTED_ORDER_ID},
…
我知道您正在填充上次从气流变量导入的订单ID。您无法在dag文件中执行此操作,而是将{{params.last_imported_id}}更改为{var.value.last_imported_order_id}或您正在设置的气流变量的名称

# YOUR EXAMPLE FORMATTED A BIT MORE 80 COLS SYTLE
…
sql='SELECT * FROM orders where orders_id>{0} and orders_id<{1}'.format(
    LAST_IMPORTED_ORDER_ID,
    {{ task_instance.xcom_pull(
        task_ids=['get_max_order_id'], key='result_status') }}),
…

# SHOULD HAVE BEEN AT LEAST: I hope you can spot the difference.
…
sql='SELECT * FROM orders where orders_id>{0} and orders_id<{1}'.format(
    LAST_IMPORTED_ORDER_ID,
    "{{ task_instance.xcom_pull("
    "task_ids=['get_max_order_id'], key='result_status') }}"),
…

# AND COULD HAVE BEEN MORE CLEARLY READABLE AS:
…
sql='''
SELECT *
FROM orders
WHERE orders_id > {{ params.last_imported_id }}
  AND orders_id < {{ ti.xcom_pull('get_max_order_id') }}
''',
params={'last_imported_id': LAST_IMPORTED_ORDER_ID},
…