Airflow 使自定义气流宏展开其他宏

Airflow 使自定义气流宏展开其他宏,airflow,apache-airflow,Airflow,Apache Airflow,有没有办法在气流中生成用户定义的宏,而气流本身是从其他宏计算出来的 来自气流导入DAG 从afflow.operators.bash_operator导入bash operator dag=dag( “简单”, 附表_interval='0 21***', 用户定义的宏={ “下一个执行日期”:“{dag.下一个计划(执行日期)}”, }, ) task=bash运算符( task_id='bash_op', bash_command='echo“{{next_execution_date}}”

有没有办法在气流中生成用户定义的宏,而气流本身是从其他宏计算出来的

来自气流导入DAG
从afflow.operators.bash_operator导入bash operator
dag=dag(
“简单”,
附表_interval='0 21***',
用户定义的宏={
“下一个执行日期”:“{dag.下一个计划(执行日期)}”,
},
)
task=bash运算符(
task_id='bash_op',
bash_command='echo“{{next_execution_date}}”,
dag=dag,
)
这里的用例是将新的Airflow v1.8
next\u execution\u date
宏向后移植到Airflow v1.7中。不幸的是,此模板是在没有宏展开的情况下呈现的:

$ airflow render simple bash_op 2017-08-09 21:00:00
    # ----------------------------------------------------------
    # property: bash_command
    # ----------------------------------------------------------
    echo "{{ dag.following_schedule(execution_date) }}"

默认情况下,用户定义的宏不作为模板处理。如果要在
用户定义的\u宏
中保留模板(或在
参数
变量中使用模板),则始终可以手动重新运行模板功能:

class DoubleTemplatedBashOperator(BashOperator):
    def pre_execute(self, context):
        context['ti'].render_templates()
这将适用于不引用其他参数或UDM的模板。这样,您就可以拥有“两个深度”模板

或者将UDM直接放在
操作符的命令中(最简单的解决方案):


以下是一些解决方案:

1.重写
BashOperator
以向上下文添加一些值 这种方法的优点是:可以在自定义操作符中捕获一些重复的代码

糟糕的是:在呈现模板化字段之前,必须编写自定义运算符向上下文添加值

2.在用户定义的宏中执行计算 不一定是价值观。它们可以是函数

在您的dag中:

def compute_next_execution_date(dag, execution_date):
    return dag.following_schedule(execution_date)

dag = DAG(
    'simple',
    schedule_interval='0 21 * * *',
    user_defined_macros={
        'next_execution_date': compute_next_execution_date,
    },
)

task = BashOperator(
    task_id='bash_op',
    bash_command='echo "{{ next_execution_date(dag, execution_date) }}"',
    dag=dag,
)
好的方面:您可以定义可重用函数来处理运行时可用的值(、作业实例属性、任务实例属性等),并使您的函数结果可用于呈现模板

糟糕的部分(但不是那么烦人):您必须在需要的每个dag中导入这样一个函数作为用户定义的宏

3.直接在模板中调用语句 此解决方案是最简单的(如所述),在您的案例中可能是最好的

BashOperator(
    task_id='bash_op',
    bash_command='echo "{{ dag.following_schedule(execution_date) }}"',
    dag=dag,
)

非常适合像这样的简单通话。它们是一些其他直接可用的对象(如
task
task\u instance
,等等);甚至有些标准模块也可以使用(比如
宏.time
,…)。

我会投票支持使用Airflow插件来插入预定义的宏。 使用此方法,您可以在任何运算符中使用预定义宏,而无需声明任何内容

下面是我们正在使用的一些自定义宏。 示例使用:
{{macros.dagtz\u next\u execution\u date(ti)}}

from airflow.plugins_manager import AirflowPlugin
from datetime import datetime, timedelta
from airflow.utils.db import provide_session
from airflow.models import DagRun
import pendulum


@provide_session
def _get_dag_run(ti, session=None):
    """Get DagRun obj of the TaskInstance ti

    Args:
        ti (TYPE): the TaskInstance object
        session (None, optional): Not in use

    Returns:
        DagRun obj: the DagRun obj of the TaskInstance ti
    """
    task = ti.task
    dag_run = None
    if hasattr(task, 'dag'):
        dag_run = (
            session.query(DagRun)
            .filter_by(
                dag_id=task.dag.dag_id,
                execution_date=ti.execution_date)
            .first()
        )
        session.expunge_all()
        session.commit()
    return dag_run


def ds_add_no_dash(ds, days):
    """
    Add or subtract days from a YYYYMMDD
    :param ds: anchor date in ``YYYYMMDD`` format to add to
    :type ds: str
    :param days: number of days to add to the ds, you can use negative values
    :type days: int
    >>> ds_add('20150101', 5)
    '20150106'
    >>> ds_add('20150106', -5)
    '20150101'
    """

    ds = datetime.strptime(ds, '%Y%m%d')
    if days:
        ds = ds + timedelta(days)
    return ds.isoformat()[:10].replace('-', '')


def dagtz_execution_date(ti):
    """get the TaskInstance execution date (in DAG timezone) in pendulum obj

    Args:
        ti (TaskInstance): the TaskInstance object

    Returns:
        pendulum obj: execution_date in pendulum object (in DAG tz)
    """
    execution_date_pdl = pendulum.instance(ti.execution_date)
    dagtz_execution_date_pdl = execution_date_pdl.in_timezone(ti.task.dag.timezone)
    return dagtz_execution_date_pdl


def dagtz_next_execution_date(ti):
    """get the TaskInstance next execution date (in DAG timezone) in pendulum obj

    Args:
        ti (TaskInstance): the TaskInstance object

    Returns:
        pendulum obj: next execution_date in pendulum object (in DAG tz)
    """

    # For manually triggered dagruns that aren't run on a schedule, next/previous
    # schedule dates don't make sense, and should be set to execution date for
    # consistency with how execution_date is set for manually triggered tasks, i.e.
    # triggered_date == execution_date.
    dag_run = _get_dag_run(ti)
    if dag_run and dag_run.external_trigger:
        next_execution_date = ti.execution_date
    else:
        next_execution_date = ti.task.dag.following_schedule(ti.execution_date)

    next_execution_date_pdl = pendulum.instance(next_execution_date)
    dagtz_next_execution_date_pdl = next_execution_date_pdl.in_timezone(ti.task.dag.timezone)
    return dagtz_next_execution_date_pdl


def dagtz_next_ds(ti):
    """get the TaskInstance next execution date (in DAG timezone) in YYYY-MM-DD string
    """
    dagtz_next_execution_date_pdl = dagtz_next_execution_date(ti)
    return dagtz_next_execution_date_pdl.strftime('%Y-%m-%d')


def dagtz_next_ds_nodash(ti):
    """get the TaskInstance next execution date (in DAG timezone) in YYYYMMDD string
    """
    dagtz_next_ds_str = dagtz_next_ds(ti)
    return dagtz_next_ds_str.replace('-', '')


def dagtz_prev_execution_date(ti):
    """get the TaskInstance previous execution date (in DAG timezone) in pendulum obj

    Args:
        ti (TaskInstance): the TaskInstance object

    Returns:
        pendulum obj: previous execution_date in pendulum object (in DAG tz)
    """

    # For manually triggered dagruns that aren't run on a schedule, next/previous
    # schedule dates don't make sense, and should be set to execution date for
    # consistency with how execution_date is set for manually triggered tasks, i.e.
    # triggered_date == execution_date.
    dag_run = _get_dag_run(ti)
    if dag_run and dag_run.external_trigger:
        prev_execution_date = ti.execution_date
    else:
        prev_execution_date = ti.task.dag.previous_schedule(ti.execution_date)

    prev_execution_date_pdl = pendulum.instance(prev_execution_date)
    dagtz_prev_execution_date_pdl = prev_execution_date_pdl.in_timezone(ti.task.dag.timezone)
    return dagtz_prev_execution_date_pdl


def dagtz_prev_ds(ti):
    """get the TaskInstance prev execution date (in DAG timezone) in YYYY-MM-DD string
    """
    dagtz_prev_execution_date_pdl = dagtz_prev_execution_date(ti)
    return dagtz_prev_execution_date_pdl.strftime('%Y-%m-%d')


def dagtz_prev_ds_nodash(ti):
    """get the TaskInstance prev execution date (in DAG timezone) in YYYYMMDD string
    """
    dagtz_prev_ds_str = dagtz_prev_ds(ti)
    return dagtz_prev_ds_str.replace('-', '')


# Defining the plugin class
class AirflowTestPlugin(AirflowPlugin):
    name = "custom_macros"
    macros = [dagtz_execution_date, ds_add_no_dash,
              dagtz_next_execution_date, dagtz_next_ds, dagtz_next_ds_nodash,
              dagtz_prev_execution_date, dagtz_prev_ds, dagtz_prev_ds_nodash]

是的,将UDF宏扩展到每个需要它的地方肯定会起作用,但随后您会多次重复同一段宏代码。我想人们可以依赖Python工具,在运行时格式化字符串而不是使用UDF宏:
bash_command='echo“{next_execution_date}”。格式化(next_execution_date=next_execution_date),
,但它不会那么干净。@mksios我试图让我的解释更清楚,但是,如果在任务运行之前调用
render\u templates
,则可以完全执行您想要的操作。默认情况下,将UDM放入命令中;第二,手动填写UDM中的模板变量。我明白了,这很酷。。。我所能想到的双重膨胀的唯一不良副作用是:(1)需要避开参数的任何其他部分,以防止在不需要双重膨胀的部分发生双重膨胀;(2) 必须在每种运算符类型中执行此操作。然而,它似乎离实现最终目标最近。。。是时候向气流发出拉动请求了@好极了!如果有帮助,请投票/接受答案:)在第一个代码段更新最后一行中,请
返回super(NextExecutionDateAwareBashOperator,self);但是,我可以在python中使用
super()3@G埃罗,你能看看我的问题吗?如果你能回答的话,你会帮助我的。感谢@z1k的关注,感谢您提供完整的代码。如果您将Python的许多行精简为一个最小的示例来说明您的观点,那就更好了。未来的读者和我将为此感谢@z1k您如何在操作符或DAG中使用自定义宏?你能举个例子吗
BashOperator(
    task_id='bash_op',
    bash_command='echo "{{ dag.following_schedule(execution_date) }}"',
    dag=dag,
)
from airflow.plugins_manager import AirflowPlugin
from datetime import datetime, timedelta
from airflow.utils.db import provide_session
from airflow.models import DagRun
import pendulum


@provide_session
def _get_dag_run(ti, session=None):
    """Get DagRun obj of the TaskInstance ti

    Args:
        ti (TYPE): the TaskInstance object
        session (None, optional): Not in use

    Returns:
        DagRun obj: the DagRun obj of the TaskInstance ti
    """
    task = ti.task
    dag_run = None
    if hasattr(task, 'dag'):
        dag_run = (
            session.query(DagRun)
            .filter_by(
                dag_id=task.dag.dag_id,
                execution_date=ti.execution_date)
            .first()
        )
        session.expunge_all()
        session.commit()
    return dag_run


def ds_add_no_dash(ds, days):
    """
    Add or subtract days from a YYYYMMDD
    :param ds: anchor date in ``YYYYMMDD`` format to add to
    :type ds: str
    :param days: number of days to add to the ds, you can use negative values
    :type days: int
    >>> ds_add('20150101', 5)
    '20150106'
    >>> ds_add('20150106', -5)
    '20150101'
    """

    ds = datetime.strptime(ds, '%Y%m%d')
    if days:
        ds = ds + timedelta(days)
    return ds.isoformat()[:10].replace('-', '')


def dagtz_execution_date(ti):
    """get the TaskInstance execution date (in DAG timezone) in pendulum obj

    Args:
        ti (TaskInstance): the TaskInstance object

    Returns:
        pendulum obj: execution_date in pendulum object (in DAG tz)
    """
    execution_date_pdl = pendulum.instance(ti.execution_date)
    dagtz_execution_date_pdl = execution_date_pdl.in_timezone(ti.task.dag.timezone)
    return dagtz_execution_date_pdl


def dagtz_next_execution_date(ti):
    """get the TaskInstance next execution date (in DAG timezone) in pendulum obj

    Args:
        ti (TaskInstance): the TaskInstance object

    Returns:
        pendulum obj: next execution_date in pendulum object (in DAG tz)
    """

    # For manually triggered dagruns that aren't run on a schedule, next/previous
    # schedule dates don't make sense, and should be set to execution date for
    # consistency with how execution_date is set for manually triggered tasks, i.e.
    # triggered_date == execution_date.
    dag_run = _get_dag_run(ti)
    if dag_run and dag_run.external_trigger:
        next_execution_date = ti.execution_date
    else:
        next_execution_date = ti.task.dag.following_schedule(ti.execution_date)

    next_execution_date_pdl = pendulum.instance(next_execution_date)
    dagtz_next_execution_date_pdl = next_execution_date_pdl.in_timezone(ti.task.dag.timezone)
    return dagtz_next_execution_date_pdl


def dagtz_next_ds(ti):
    """get the TaskInstance next execution date (in DAG timezone) in YYYY-MM-DD string
    """
    dagtz_next_execution_date_pdl = dagtz_next_execution_date(ti)
    return dagtz_next_execution_date_pdl.strftime('%Y-%m-%d')


def dagtz_next_ds_nodash(ti):
    """get the TaskInstance next execution date (in DAG timezone) in YYYYMMDD string
    """
    dagtz_next_ds_str = dagtz_next_ds(ti)
    return dagtz_next_ds_str.replace('-', '')


def dagtz_prev_execution_date(ti):
    """get the TaskInstance previous execution date (in DAG timezone) in pendulum obj

    Args:
        ti (TaskInstance): the TaskInstance object

    Returns:
        pendulum obj: previous execution_date in pendulum object (in DAG tz)
    """

    # For manually triggered dagruns that aren't run on a schedule, next/previous
    # schedule dates don't make sense, and should be set to execution date for
    # consistency with how execution_date is set for manually triggered tasks, i.e.
    # triggered_date == execution_date.
    dag_run = _get_dag_run(ti)
    if dag_run and dag_run.external_trigger:
        prev_execution_date = ti.execution_date
    else:
        prev_execution_date = ti.task.dag.previous_schedule(ti.execution_date)

    prev_execution_date_pdl = pendulum.instance(prev_execution_date)
    dagtz_prev_execution_date_pdl = prev_execution_date_pdl.in_timezone(ti.task.dag.timezone)
    return dagtz_prev_execution_date_pdl


def dagtz_prev_ds(ti):
    """get the TaskInstance prev execution date (in DAG timezone) in YYYY-MM-DD string
    """
    dagtz_prev_execution_date_pdl = dagtz_prev_execution_date(ti)
    return dagtz_prev_execution_date_pdl.strftime('%Y-%m-%d')


def dagtz_prev_ds_nodash(ti):
    """get the TaskInstance prev execution date (in DAG timezone) in YYYYMMDD string
    """
    dagtz_prev_ds_str = dagtz_prev_ds(ti)
    return dagtz_prev_ds_str.replace('-', '')


# Defining the plugin class
class AirflowTestPlugin(AirflowPlugin):
    name = "custom_macros"
    macros = [dagtz_execution_date, ds_add_no_dash,
              dagtz_next_execution_date, dagtz_next_ds, dagtz_next_ds_nodash,
              dagtz_prev_execution_date, dagtz_prev_ds, dagtz_prev_ds_nodash]