Python 在使用DockerPropertor时,如何同时使用xcom\u push=True和auto\u remove=True? 问题

Python 在使用DockerPropertor时,如何同时使用xcom\u push=True和auto\u remove=True? 问题,python,airflow,Python,Airflow,使用xcom\u push=True、xcom\u all=True和auto\u remove=True运行DockerOperator时,任务会引发一个错误,就好像在读取容器的STDOUT之前删除了容器一样 例子 以以下DAG为例: from datetime import datetime, timedelta from airflow import DAG from airflow.operators.docker_operator import DockerOperator from

使用
xcom\u push=True
xcom\u all=True
auto\u remove=True
运行
DockerOperator
时,任务会引发一个错误,就好像在读取容器的
STDOUT
之前删除了容器一样

例子 以以下DAG为例:

from datetime import datetime, timedelta

from airflow import DAG
from airflow.operators.docker_operator import DockerOperator
from airflow.operators.python_operator import PythonOperator


# Default (but overridable) arguments for Operators instantiations
default_args = {
    'owner': 'Satan',
    'depends_on_past': False,
    'start_date': datetime(2019, 11, 28),
    'retry_delay': timedelta(seconds=2),
}


# DAG definition


def createDockerOperatorTask(xcom_all, auto_remove, id_suffix):
    return DockerOperator(
        # Default args
        task_id="docker_operator" + id_suffix,
        image='centos:latest',
        container_name="container" + id_suffix,
        api_version='auto',
        command="echo 'FALSE';",
        docker_url="unix://var/run/docker.sock",
        network_mode="bridge",
        xcom_push=True,
        xcom_all=xcom_all,
        auto_remove=auto_remove,
    )


# Use dag as python context so all tasks are "automagically" linked (in no specific order) to it
with DAG('docker_operator_xcom', default_args=default_args, schedule_interval=timedelta(days=1)) as dag:
    t1 = createDockerOperatorTask(xcom_all=True, auto_remove=True, id_suffix="_1")

    t2 = createDockerOperatorTask(xcom_all=True, auto_remove=False, id_suffix="_2")

    t3 = createDockerOperatorTask(xcom_all=False, auto_remove=True, id_suffix="_3")


    # Set tasks precedence
    dag >> t1
    dag >> t2
    dag >> t3
如果我们运行它,第一个任务失败,另两个任务成功。然而,唯一“正确”运行的是
docker\u container\u 3
,因为它正确设置了
xcom\u值
,而
docker\u container\u 2
没有。这给了我一种感觉,它“尝试”读取
STDOUT
,当它不能读取时,它不会失败(正如它应该的那样,作为
docker\u container\u 1

每个任务的运行状态

任务日志
xcom\u push=True
xcom\u all=True
auto\u remove=True
带有
xcom\u push=True
xcom\u all=False
auto\u remove=True的任务日志
***日志文件不存在:/usr/local/afflow/logs/docker\u operator\u xcom/docker\u operator\u 3/2019-12-04T20:24:21.180209+00:00/1.Log
***正在从以下位置获取:http://5df603088df3:8793/log/docker_operator_xcom/docker_operator_3/2019-12-04T20:24:21.180209+00:00/1.log
[2019-12-0420:24:24992]{{{taskinstance.py:630}}INFO-所有依赖项都满足
[2019-12-0420:24:25031]{{taskinstance.py:630}INFO-所有依赖项都满足
[2019-12-0420:24:25032]{{taskinstance.py:841}}信息-
--------------------------------------------------------------------------------
[2019-12-0420:24:25032]{{{taskinstance.py:842}}INFO-开始尝试1次,共1次
[2019-12-0420:24:25032]{{taskinstance.py:843}}信息-
--------------------------------------------------------------------------------
[2019-12-0420:24:25054]{{taskinstance.py:862}}INFO-在2019-12-04T20:24:21.180209+00:00执行
[2019-12-04 20:24:25055]{base_task_runner.py:133}运行信息:['aiffair'、'run'、'docker_operator_xcom'、'docker_operator_3'、'2019-12-04T20:24:21.180209+00:00'、'--job_id'、'73'、'--pool'、'default_pool'、'-raw'、'-raw'、'-sd'、'DAGS_FOLDER/qm_operators/exp 5_-eba.py'、'cfg_路径'、'tmo947']
[2019-12-04 20:24:26219]{{base_task_runner.py:115}}信息-作业73:Subtask docker_operator_3[2019-12-04 20:24:26219]{{settings.py:252}信息-设置。配置_orm():使用池设置。池大小=5,最大溢出=10,池回收=1800,pid=1039
[2019-12-04 20:24:26294]{base_task_runner.py:115}INFO-Job 73:Subtask docker_operator_3/usr/local/lib/python3.7/site packages/psycopg2/_init_.py:144:UserWarning:psycopg2 wheel包将从2.8版重命名;为了保持从二进制文件安装,请改用“pip安装psycopg2二进制文件”。有关详细信息,请参阅:。
[2019-12-0420:24:26294]{{base_task_runner.py:115}}INFO-Job 73:Subtask docker_operator_3”“)
[2019-12-0420:24:27549]{{base_task_runner.py:115}信息-作业73:子任务docker_operator_3[2019-12-0420:24:27548]{{{u_init__.py:51}信息-使用执行器CeleryExecutor
[2019-12-04 20:24:27549]{{base_task_runner.py:115}信息-作业73:子任务docker_operator_3[2019-12-04 20:24:27549]{{dagbag.py:92}信息-从/usr/local/aiffaign/dags/qm_operators/exp_5_prueba.py填充dagbag
[2019-12-04 20:24:27722]{{base_task_runner.py:115}信息-作业73:子任务docker_operator_3[2019-12-04 20:24:27721]{{cli cli.py:545}信息-在主机5df603088df3上运行
[2019-12-0420:24:27754]{{docker_operator.py:201}}信息-从image centos启动docker容器:最新
[2019-12-0420:24:28329]{{logging_mixin.py:112}}信息-附件:[]
[2019-12-0420:24:29979]{{logging_mixin.py:112}信息-[2019-12-0420:24:29979]{{local_task_job.py:124}警告-自上次心跳(0.01秒)<心率(5.0秒),睡眠时间4.989138秒
[2019-12-0420:24:34974]{{logging_mixin.py:112}}INFO-[2019-12-0420:24:34974]{{local_task_job.py:103}}INFO-任务退出,返回代码为0
docker\u operator\u 2的xcom

docker\u operator\u 3的xcom

变通办法 即使设置
auto_remove=False
,如在
docker_container_2
中,使任务成功并正确设置XCom,容器永远不会被删除,并且DAG的未来运行将失败,因为旧运行的容器将与新运行的容器冲突

解决方法是在下游添加一个任务,删除容器,但它不是“干净的”

有没有一种方法可以同时使用
xcom\u push=True
auto\u remove=True
运行DockerPerator?

读取,我不这么认为。它调用Docker API客户机
wait
,然后调用
日志

但是,对于
auto_remove
状态:

当容器的进程退出时,在守护进程端启用容器的自动删除


因此,只要操作员对
wait
的调用完成,容器就会被删除,您将无法检索它的日志。

您可以将
DockerOperator子类化,并在
post\u execute
中删除容器。如下所示:

class XComDockerOperator(DockerOperator):
    def post_execute(self, context, result=None):
        if self.cli is not None:
            self.log.info('Removing Docker container')
            self.cli.remove_container(self.container['Id'])
        super().post_execute(context, result)

我假设“有没有一种方法可以同时使用xcom\u push=True和auto\u remove=False运行DockerPerator?”的意思是“有没有一种方法可以同时使用xcom\u push=True和auto\u remove=True运行DockerPerator?”?你是对的,修复了我担心的问题……你推荐什么方法“手工清理”“创建的容器?@Alechan我个人会创建DockerOperator的我自己的子类,并用一个特定的键覆盖到
xcom\u push
创建的容器id,然后使用
PythonOperator
这个
xcom\u pull
来手动移除容器。沉闷
*** Log file does not exist: /usr/local/airflow/logs/docker_operator_xcom/docker_operator_2/2019-12-04T20:24:21.180209+00:00/1.log
*** Fetching from: http://5df603088df3:8793/log/docker_operator_xcom/docker_operator_2/2019-12-04T20:24:21.180209+00:00/1.log

[2019-12-04 20:24:24,794] {{taskinstance.py:630}} INFO - Dependencies all met for <TaskInstance: docker_operator_xcom.docker_operator_2 2019-12-04T20:24:21.180209+00:00 [queued]>
[2019-12-04 20:24:24,829] {{taskinstance.py:630}} INFO - Dependencies all met for <TaskInstance: docker_operator_xcom.docker_operator_2 2019-12-04T20:24:21.180209+00:00 [queued]>
[2019-12-04 20:24:24,829] {{taskinstance.py:841}} INFO - 
--------------------------------------------------------------------------------
[2019-12-04 20:24:24,829] {{taskinstance.py:842}} INFO - Starting attempt 1 of 1
[2019-12-04 20:24:24,829] {{taskinstance.py:843}} INFO - 
--------------------------------------------------------------------------------
[2019-12-04 20:24:24,842] {{taskinstance.py:862}} INFO - Executing <Task(DockerOperator): docker_operator_2> on 2019-12-04T20:24:21.180209+00:00
[2019-12-04 20:24:24,843] {{base_task_runner.py:133}} INFO - Running: ['airflow', 'run', 'docker_operator_xcom', 'docker_operator_2', '2019-12-04T20:24:21.180209+00:00', '--job_id', '71', '--pool', 'default_pool', '--raw', '-sd', 'DAGS_FOLDER/qm_operators/exp_5_prueba.py', '--cfg_path', '/tmp/tmpeq9uc4kw']
[2019-12-04 20:24:26,174] {{base_task_runner.py:115}} INFO - Job 71: Subtask docker_operator_2 [2019-12-04 20:24:26,173] {{settings.py:252}} INFO - settings.configure_orm(): Using pool settings. pool_size=5, max_overflow=10, pool_recycle=1800, pid=1035
[2019-12-04 20:24:26,226] {{base_task_runner.py:115}} INFO - Job 71: Subtask docker_operator_2 /usr/local/lib/python3.7/site-packages/psycopg2/__init__.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use "pip install psycopg2-binary" instead. For details see: <http://initd.org/psycopg/docs/install.html#binary-install-from-pypi>.
[2019-12-04 20:24:26,226] {{base_task_runner.py:115}} INFO - Job 71: Subtask docker_operator_2   """)
[2019-12-04 20:24:27,685] {{base_task_runner.py:115}} INFO - Job 71: Subtask docker_operator_2 [2019-12-04 20:24:27,678] {{__init__.py:51}} INFO - Using executor CeleryExecutor
[2019-12-04 20:24:27,685] {{base_task_runner.py:115}} INFO - Job 71: Subtask docker_operator_2 [2019-12-04 20:24:27,678] {{dagbag.py:92}} INFO - Filling up the DagBag from /usr/local/airflow/dags/qm_operators/exp_5_prueba.py
[2019-12-04 20:24:27,973] {{base_task_runner.py:115}} INFO - Job 71: Subtask docker_operator_2 [2019-12-04 20:24:27,971] {{cli.py:545}} INFO - Running <TaskInstance: docker_operator_xcom.docker_operator_2 2019-12-04T20:24:21.180209+00:00 [running]> on host 5df603088df3
[2019-12-04 20:24:28,017] {{docker_operator.py:201}} INFO - Starting docker container from image centos:latest
[2019-12-04 20:24:28,643] {{logging_mixin.py:112}} INFO - Attachs:  []
[2019-12-04 20:24:29,783] {{logging_mixin.py:112}} INFO - [2019-12-04 20:24:29,782] {{local_task_job.py:124}} WARNING - Time since last heartbeat(0.01 s) < heartrate(5.0 s), sleeping for 4.989846 s
[2019-12-04 20:24:34,780] {{logging_mixin.py:112}} INFO - [2019-12-04 20:24:34,779] {{local_task_job.py:103}} INFO - Task exited with return code 0
*** Log file does not exist: /usr/local/airflow/logs/docker_operator_xcom/docker_operator_3/2019-12-04T20:24:21.180209+00:00/1.log
*** Fetching from: http://5df603088df3:8793/log/docker_operator_xcom/docker_operator_3/2019-12-04T20:24:21.180209+00:00/1.log

[2019-12-04 20:24:24,992] {{taskinstance.py:630}} INFO - Dependencies all met for <TaskInstance: docker_operator_xcom.docker_operator_3 2019-12-04T20:24:21.180209+00:00 [queued]>
[2019-12-04 20:24:25,031] {{taskinstance.py:630}} INFO - Dependencies all met for <TaskInstance: docker_operator_xcom.docker_operator_3 2019-12-04T20:24:21.180209+00:00 [queued]>
[2019-12-04 20:24:25,032] {{taskinstance.py:841}} INFO - 
--------------------------------------------------------------------------------
[2019-12-04 20:24:25,032] {{taskinstance.py:842}} INFO - Starting attempt 1 of 1
[2019-12-04 20:24:25,032] {{taskinstance.py:843}} INFO - 
--------------------------------------------------------------------------------
[2019-12-04 20:24:25,054] {{taskinstance.py:862}} INFO - Executing <Task(DockerOperator): docker_operator_3> on 2019-12-04T20:24:21.180209+00:00
[2019-12-04 20:24:25,055] {{base_task_runner.py:133}} INFO - Running: ['airflow', 'run', 'docker_operator_xcom', 'docker_operator_3', '2019-12-04T20:24:21.180209+00:00', '--job_id', '73', '--pool', 'default_pool', '--raw', '-sd', 'DAGS_FOLDER/qm_operators/exp_5_prueba.py', '--cfg_path', '/tmp/tmp94dzo8w7']
[2019-12-04 20:24:26,219] {{base_task_runner.py:115}} INFO - Job 73: Subtask docker_operator_3 [2019-12-04 20:24:26,219] {{settings.py:252}} INFO - settings.configure_orm(): Using pool settings. pool_size=5, max_overflow=10, pool_recycle=1800, pid=1039
[2019-12-04 20:24:26,294] {{base_task_runner.py:115}} INFO - Job 73: Subtask docker_operator_3 /usr/local/lib/python3.7/site-packages/psycopg2/__init__.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use "pip install psycopg2-binary" instead. For details see: <http://initd.org/psycopg/docs/install.html#binary-install-from-pypi>.
[2019-12-04 20:24:26,294] {{base_task_runner.py:115}} INFO - Job 73: Subtask docker_operator_3   """)
[2019-12-04 20:24:27,549] {{base_task_runner.py:115}} INFO - Job 73: Subtask docker_operator_3 [2019-12-04 20:24:27,548] {{__init__.py:51}} INFO - Using executor CeleryExecutor
[2019-12-04 20:24:27,549] {{base_task_runner.py:115}} INFO - Job 73: Subtask docker_operator_3 [2019-12-04 20:24:27,549] {{dagbag.py:92}} INFO - Filling up the DagBag from /usr/local/airflow/dags/qm_operators/exp_5_prueba.py
[2019-12-04 20:24:27,722] {{base_task_runner.py:115}} INFO - Job 73: Subtask docker_operator_3 [2019-12-04 20:24:27,721] {{cli.py:545}} INFO - Running <TaskInstance: docker_operator_xcom.docker_operator_3 2019-12-04T20:24:21.180209+00:00 [running]> on host 5df603088df3
[2019-12-04 20:24:27,754] {{docker_operator.py:201}} INFO - Starting docker container from image centos:latest
[2019-12-04 20:24:28,329] {{logging_mixin.py:112}} INFO - Attachs:  []
[2019-12-04 20:24:29,979] {{logging_mixin.py:112}} INFO - [2019-12-04 20:24:29,979] {{local_task_job.py:124}} WARNING - Time since last heartbeat(0.01 s) < heartrate(5.0 s), sleeping for 4.989138 s
[2019-12-04 20:24:34,974] {{logging_mixin.py:112}} INFO - [2019-12-04 20:24:34,974] {{local_task_job.py:103}} INFO - Task exited with return code 0
class XComDockerOperator(DockerOperator):
    def post_execute(self, context, result=None):
        if self.cli is not None:
            self.log.info('Removing Docker container')
            self.cli.remove_container(self.container['Id'])
        super().post_execute(context, result)