Airflow 触发子DAG

Airflow 触发子DAG,airflow,Airflow,已编辑 我通过考虑@tobi6的输入编辑了这个问题 我从源代码中复制了subdag操作符 源代码: 我修改了execute方法中的一些内容。所做的更改是为了触发子DAG并等待子DAG完成执行。触发器工作正常,但未执行任务(DAG处于运行/绿色状态,而任务处于空/白状态) 请参阅以下我所做的更改: from airflow.exceptions import AirflowException from airflow.models import BaseOperator, Pool from ai

已编辑 我通过考虑@tobi6的输入编辑了这个问题

我从源代码中复制了subdag操作符

源代码:

我修改了execute方法中的一些内容。所做的更改是为了触发子DAG并等待子DAG完成执行。触发器工作正常,但未执行任务(DAG处于运行/绿色状态,而任务处于空/白状态)

请参阅以下我所做的更改:

from airflow.exceptions import AirflowException
from airflow.models import BaseOperator, Pool
from airflow.utils.decorators import apply_defaults
from airflow.utils.db import provide_session
from airflow.utils.state import State
from airflow.executors import GetDefaultExecutor
from time import sleep
import logging

from datetime import datetime


class SubDagOperator(BaseOperator):

    template_fields = tuple()
    ui_color = '#555'
    ui_fgcolor = '#fff'

    @provide_session
    @apply_defaults
    def __init__(
            self,
            subdag,
            executor=GetDefaultExecutor(),
            *args, **kwargs):
        """
        Yo dawg. This runs a sub dag. By convention, a sub dag's dag_id
        should be prefixed by its parent and a dot. As in `parent.child`.

        :param subdag: the DAG object to run as a subdag of the current DAG.
        :type subdag: airflow.DAG
        :param dag: the parent DAG
        :type subdag: airflow.DAG
        """
        import airflow.models
        dag = kwargs.get('dag') or airflow.models._CONTEXT_MANAGER_DAG
        if not dag:
            raise AirflowException('Please pass in the `dag` param or call '
                                   'within a DAG context manager')
        session = kwargs.pop('session')
        super(SubDagOperator, self).__init__(*args, **kwargs)

        # validate subdag name
        if dag.dag_id + '.' + kwargs['task_id'] != subdag.dag_id:
            raise AirflowException(
                "The subdag's dag_id should have the form "
                "'{{parent_dag_id}}.{{this_task_id}}'. Expected "
                "'{d}.{t}'; received '{rcvd}'.".format(
                    d=dag.dag_id, t=kwargs['task_id'], rcvd=subdag.dag_id))

        # validate that subdag operator and subdag tasks don't have a
        # pool conflict
        if self.pool:
            conflicts = [t for t in subdag.tasks if t.pool == self.pool]
            if conflicts:
                # only query for pool conflicts if one may exist
                pool = (
                    session
                    .query(Pool)
                    .filter(Pool.slots == 1)
                    .filter(Pool.pool == self.pool)
                    .first()
                )
                if pool and any(t.pool == self.pool for t in subdag.tasks):
                    raise AirflowException(
                        'SubDagOperator {sd} and subdag task{plural} {t} both '
                        'use pool {p}, but the pool only has 1 slot. The '
                        'subdag tasks will never run.'.format(
                            sd=self.task_id,
                            plural=len(conflicts) > 1,
                            t=', '.join(t.task_id for t in conflicts),
                            p=self.pool
                        )
                    )

        self.subdag = subdag
        self.executor = executor

    def execute(self, context):
        dag_run = self.subdag.create_dagrun(
            conf=context['dag_run'].conf,
            state=State.RUNNING,
            execution_date=context['execution_date'],
            run_id='trig__' + str(datetime.utcnow()),
            external_trigger=True
        )


        while True:
            if dag_run.get_state() == State.FAILED or dag_run.get_state() == State.SUCCESS:
                break
            else:
                sleep(10)
                continue
下面的代码显示了我是如何使用相同的

from airflow import DAG
from operators.sd_operator import SubDagOperator  # My SubDag Operator
from airflow.operators.python_operator import PythonOperator

import logging
from datetime import datetime

default_args = {
        'owner': 'airflow',
        'depends_on_past': False,
        'start_date': datetime(2017, 7, 17),
        'email': ['airflow@example.com'],
        'email_on_failure': False,
        'email_on_retry': False,
    }


def print_dag_details(**kwargs):
    logging.info(str(kwargs['dag_run'].conf))


with DAG('example_dag', schedule_interval=None, catchup=False, default_args=default_args) as dag:
    task_1 = SubDagOperator(
        subdag=sub_dag_func('example_dag', 'sub_dag_1'),
        task_id='sub_dag_1'
    )

    task_2 = SubDagOperator(
        subdag=sub_dag_func('example_dag', 'sub_dag_2'),
        task_id='sub_dag_2',
    )

    print_kwargs = PythonOperator(
        task_id='print_kwargs',
        python_callable=print_dag_details,
        provide_context=True
    )

    print_kwargs >> task_1 >> task_2 

你提供的任何信息都会有帮助。提前谢谢

没有上下文理解你的问题有点困难

“我复制了subdag操作符并修改了execute方法中的一些内容。”

  • 这是从哪里复制的
“触发器工作得很好…”

  • 这看起来怎么样
我在代码中看到了一些东西:

  • 将指定字段添加到sub_dag_func的函数调用中可能会有所帮助,例如
    sub_dag_func(subdag='parent_dag'…)

  • 在用于设置上游/下游的二进制班次定义中,有定义的任务我在DAG中找不到(
    df\u job\u 1
    df\u job\u 2
    )。这可能连接到子DAG(尚未查看它们)

  • 子dag的名称似乎与代码中的注释不一致,按照惯例,子dag的dag id应以其父级和点作为前缀,但它是
    子dag 1
    子dag 2


我从源代码复制了subdag操作符的代码。操作员正在创建一个DagRun对象,该对象正在使用新的触发器信息将dag状态更新为running(我在气流UI中看到带有新触发器信息的running状态)。但是,DAG中的任务不会由调度程序/工作程序拾取。因此,不执行任何状态或执行。