Airflow 防止气流从回填作业中流出
我们使用最新的Airflow v1.10.7,并在Airflow.cfg中设置了catchup,在每个DAG中设置catchup=False,在默认参数中设置catchup=False 但气流仍然会产生反作用,并使工作正常进行。我们真的很想使用气流,但这是一个表演的障碍。还有,为什么dag在第一次创建时会自动运行,即使catchup=False 谢谢你的帮助 样本dag:Airflow 防止气流从回填作业中流出,airflow,Airflow,我们使用最新的Airflow v1.10.7,并在Airflow.cfg中设置了catchup,在每个DAG中设置catchup=False,在默认参数中设置catchup=False 但气流仍然会产生反作用,并使工作正常进行。我们真的很想使用气流,但这是一个表演的障碍。还有,为什么dag在第一次创建时会自动运行,即使catchup=False 谢谢你的帮助 样本dag: version: '3.7' services: postgres: image: postgres
version: '3.7'
services:
postgres:
image: postgres:9.6
environment:
- POSTGRES_USER=airflow
- POSTGRES_PASSWORD=abc
- POSTGRES_DB=airflow
logging:
options:
max-size: 10m
max-file: "3"
webserver:
image: puckel/docker-airflow:1.10.7
restart: always
depends_on:
- postgres
environment:
- LOAD_EX=n
- EXECUTOR=Local
- AIRFLOW__CORE__DEFAULT_TIMEZONE=America/New_York
- AIRFLOW__SCHEDULER__CATCHUP_BY_DEFAULT=False
- AIRFLOW__WEBSERVER__BASE_URL=http://1.1.1.1:8080
- AIRFLOW__SMTP__SMTP_STARTTLS=False
- AIRFLOW__SMTP__SMTP_PORT=587
- AIRFLOW__SMTP__SMTP_HOST=0.0.0.0
logging:
options:
max-size: 10m
max-file: "3"
volumes:
- /home/ec2-user/airflow/dags:/usr/local/airflow/dags
# - ./plugins:/usr/local/airflow/plugins
ports:
- "8080:8080"
- "587:587"
command: webserver
healthcheck:
test: ["CMD-SHELL", "[ -f /usr/local/airflow/airflow-webserver.pid ]"]
interval: 30s
timeout: 30s
retries: 3
来自气流导入DAG
从utils.general导入计划,默认参数,ABCCompute运算符
从afflow.operators.http_operator导入SimpleHttpOperator
从datetime导入datetime,timedelta
默认参数={
“所有者”:“abc”,
“依赖于过去”:False,
“开始日期”:日期时间(2020,1,1),
'电子邮件':['abc@abc.com'],
“失败时发送电子邮件”:True,
“重试时发送电子邮件”:False,
“重试”:0
}
DAG\u py:
version: '3.7'
services:
postgres:
image: postgres:9.6
environment:
- POSTGRES_USER=airflow
- POSTGRES_PASSWORD=abc
- POSTGRES_DB=airflow
logging:
options:
max-size: 10m
max-file: "3"
webserver:
image: puckel/docker-airflow:1.10.7
restart: always
depends_on:
- postgres
environment:
- LOAD_EX=n
- EXECUTOR=Local
- AIRFLOW__CORE__DEFAULT_TIMEZONE=America/New_York
- AIRFLOW__SCHEDULER__CATCHUP_BY_DEFAULT=False
- AIRFLOW__WEBSERVER__BASE_URL=http://1.1.1.1:8080
- AIRFLOW__SMTP__SMTP_STARTTLS=False
- AIRFLOW__SMTP__SMTP_PORT=587
- AIRFLOW__SMTP__SMTP_HOST=0.0.0.0
logging:
options:
max-size: 10m
max-file: "3"
volumes:
- /home/ec2-user/airflow/dags:/usr/local/airflow/dags
# - ./plugins:/usr/local/airflow/plugins
ports:
- "8080:8080"
- "587:587"
command: webserver
healthcheck:
test: ["CMD-SHELL", "[ -f /usr/local/airflow/airflow-webserver.pid ]"]
interval: 30s
timeout: 30s
retries: 3
带有DAG的(
"abc_dag,,
catchup=False,
默认参数=默认参数,
计划\间隔=计划。每日\ 0800。值)作为dag:
t=简单的HttpOperator(
http_conn_id='core',
任务\u id='send\u abc\u alerts',
dag=dag,
方法='GET',
数据={'Url':'abc'},
endpoint='abc/callapi',
log_response=True,
响应检查=λr:真
)
Docker compose.yml:
version: '3.7'
services:
postgres:
image: postgres:9.6
environment:
- POSTGRES_USER=airflow
- POSTGRES_PASSWORD=abc
- POSTGRES_DB=airflow
logging:
options:
max-size: 10m
max-file: "3"
webserver:
image: puckel/docker-airflow:1.10.7
restart: always
depends_on:
- postgres
environment:
- LOAD_EX=n
- EXECUTOR=Local
- AIRFLOW__CORE__DEFAULT_TIMEZONE=America/New_York
- AIRFLOW__SCHEDULER__CATCHUP_BY_DEFAULT=False
- AIRFLOW__WEBSERVER__BASE_URL=http://1.1.1.1:8080
- AIRFLOW__SMTP__SMTP_STARTTLS=False
- AIRFLOW__SMTP__SMTP_PORT=587
- AIRFLOW__SMTP__SMTP_HOST=0.0.0.0
logging:
options:
max-size: 10m
max-file: "3"
volumes:
- /home/ec2-user/airflow/dags:/usr/local/airflow/dags
# - ./plugins:/usr/local/airflow/plugins
ports:
- "8080:8080"
- "587:587"
command: webserver
healthcheck:
test: ["CMD-SHELL", "[ -f /usr/local/airflow/airflow-webserver.pid ]"]
interval: 30s
timeout: 30s
retries: 3
能否显示DAG定义和配置文件的相关部分?如果没有这些,就很难找出问题出在哪里。包括docker compose和dag.py Code您是如何测试的?您是正在清除以前的DAG并检查它们是否运行,还是正在创建新的DAG并等待几分钟以查看发生了什么?我不是100%确定,但我认为如果Airflow已经在数据库中为以前的运行创建了任务实例,那么如果清除它们,它将再次运行它们。尝试创建一个新的DAG,其开始日期为过去1小时,每5分钟运行一次。激活DAG时,不应创建12次跑步,而应在5分钟内创建1次跑步。