Airflow 气流计划程序未执行计划作业,且未生成日志

Airflow 气流计划程序未执行计划作业,且未生成日志,airflow,airflow-scheduler,Airflow,Airflow Scheduler,环境: RHEL 8.2 x86_64 气流1.10.10 PostgreSQL 12.3 Python 3.6 设置: 气流作为用户“svc_etl”运行,该用户通过组和用户拥有气流主文件夹、DAG和日志文件夹的权限 Windows Samba共享上的DAG文件夹位置(链接文件夹) Windows Samba共享上的任务日志文件夹位置 Postgres和Airflow在同一服务器(VM)上作为服务运行(systemctl) 服务服务器上的主文件夹 这种设置是由于预算限制 问题: DAG上的

环境:

  • RHEL 8.2 x86_64
  • 气流1.10.10
  • PostgreSQL 12.3
  • Python 3.6 设置:
  • 气流作为用户“svc_etl”运行,该用户通过组和用户拥有气流主文件夹、DAG和日志文件夹的权限
  • Windows Samba共享上的DAG文件夹位置(链接文件夹)
  • Windows Samba共享上的任务日志文件夹位置
  • Postgres和Airflow在同一服务器(VM)上作为服务运行(systemctl)
  • 服务服务器上的主文件夹 这种设置是由于预算限制
问题: DAG上的计划不会执行,并且在登录失败状态下结束。在某些情况下,在DAG的树状视图中使用“清除”选项可成功执行第一个任务。任何对Clear的进一步使用都会导致后续任务的执行。 这些任务可以是完整的模块,也可以是单独的功能。 气流计划程序日志中未出现错误,任务本身也没有日志

采取的步骤:

  • 停止Airflow scheduler,从文件夹和数据库中删除所有DAG及其日志。重新启动气流计划程序,将DAG复制回文件夹,然后等待。DAG显示在GUI和数据库中。计划任务仍将在GUI和数据库中显示为失败。清除该错误将导致成功执行第一个任务,然后在进一步清除后,第一个和后续任务将成功(并写入任务日志)
  • 手动执行可能导致(取决于DAG定义)相同的行为
  • DAG在不同的气流测试服务器上成功运行,该服务器仅按顺序执行。两台服务器的日志和DAG文件夹位于不同的位置,并且彼此分开
问题:

  • 如何找出气流/Postgres的哪一部分产生这种行为
气流配置:

[core]
# The folder where your airflow pipelines live, most likely a
# subfolder in a code repository
# This path must be absolute
dags_folder = /ABTEILUNG/DWH/airflow/dags

# The folder where airflow should store its log files
# This path must be absolute
# > writing logs to the GIT main folder is a very bad idea and needs to change soon!!!
base_log_folder = /ABTEILUNG/DWH/airflow/logs

# Airflow can store logs remotely in AWS S3, Google Cloud Storage or Elastic Search.
# Users must supply an Airflow connection id that provides access to the storage
# location. If remote_logging is set to true, see UPDATING.md for additional
# configuration requirements.
remote_logging = False
remote_log_conn_id =
remote_base_log_folder =
encrypt_s3_logs = False

# Logging level
logging_level = INFO
fab_logging_level = WARN

# Logging class
# Specify the class that will specify the logging configuration
# This class has to be on the python classpath
# logging_config_class = my.path.default_local_settings.LOGGING_CONFIG
logging_config_class =

# Log format
# Colour the logs when the controlling terminal is a TTY.
colored_console_log = True
colored_log_format = [%%(blue)s%%(asctime)s%%(reset)s] {%%(blue)s%%(filename)s:%%(reset)s%%(lineno)d} %%(log_color)s%%(levelname)s%%(reset)s - %%(log_color)s%%(message)s%%(reset)s
colored_formatter_class = airflow.utils.log.colored_log.CustomTTYColoredFormatter

log_format = [%%(asctime)s] {%%(filename)s:%%(lineno)d} %%(levelname)s - %%(message)s
simple_log_format = %%(asctime)s %%(levelname)s - %%(message)s

# Log filename format
log_filename_template = {{ ti.dag_id }}/{{ ti.task_id }}/{{ ts }}/{{ try_number }}.log
log_processor_filename_template = {{ filename }}.log
dag_processor_manager_log_location = /home/airflow/logs/dag_processor_manager/dag_processor_manager.log

# Hostname by providing a path to a callable, which will resolve the hostname
# The format is "package:function". For example,
# default value "socket:getfqdn" means that result from getfqdn() of "socket" package will be used as hostname
# No argument should be required in the function specified.
# If using IP address as hostname is preferred, use value "airflow.utils.net:get_host_ip_address"
hostname_callable = socket:getfqdn

# Default timezone in case supplied date times are naive
# can be utc (default), system, or any IANA timezone string (e.g. Europe/Amsterdam)
default_timezone = Europe/Berlin

# The executor class that airflow should use. Choices include
# SequentialExecutor, LocalExecutor, CeleryExecutor, DaskExecutor, KubernetesExecutor
#Default:
#executor = SequentialExecutor
executor = LocalExecutor

# The SqlAlchemy connection string to the metadata database.
# SqlAlchemy supports many different database engine, more information
# their website
#sql_alchemy_conn = sqlite:////home/airflow/airflow.db
#! this needs to change to MySQL or Postgres database
sql_alchemy_conn = postgresql+psycopg2://localhost:5436/airflow_db?user=svc_etl&password=<some-hard-password>

# The encoding for the databases
sql_engine_encoding = utf-8

# If SqlAlchemy should pool database connections.
sql_alchemy_pool_enabled = True

# The SqlAlchemy pool size is the maximum number of database connections
# in the pool. 0 indicates no limit.
sql_alchemy_pool_size = 5

# The maximum overflow size of the pool.
# When the number of checked-out connections reaches the size set in pool_size,
# additional connections will be returned up to this limit.
# When those additional connections are returned to the pool, they are disconnected and discarded.
# It follows then that the total number of simultaneous connections the pool will allow is pool_size + max_overflow,
# and the total number of "sleeping" connections the pool will allow is pool_size.
# max_overflow can be set to -1 to indicate no overflow limit;
# no limit will be placed on the total number of concurrent connections. Defaults to 10.
sql_alchemy_max_overflow = 10

# The SqlAlchemy pool recycle is the number of seconds a connection
# can be idle in the pool before it is invalidated. This config does
# not apply to sqlite. If the number of DB connections is ever exceeded,
# a lower config value will allow the system to recover faster.
sql_alchemy_pool_recycle = 1800

# Check connection at the start of each connection pool checkout.
# Typically, this is a simple statement like “SELECT 1”.
# More information here: https://docs.sqlalchemy.org/en/13/core/pooling.html#disconnect-handling-pessimistic
sql_alchemy_pool_pre_ping = True

# The schema to use for the metadata database
# SqlAlchemy supports databases with the concept of multiple schemas.
sql_alchemy_schema =

# The amount of parallelism as a setting to the executor. This defines
# the max number of task instances that should run simultaneously
# on this airflow installation
# > actually this parameter is more like max_active_tasks per Airflow state database across all processing servers
# > Default original: parallelism = 32
parallelism = 6

# The number of task instances allowed to run concurrently by the schedule
# > again more like max_active_tasks_for_worker (process)
# > Default original: dag_concurrency = 16
dag_concurrency = 4

# Are DAGs paused by default at creation
dags_are_paused_at_creation = True

# The maximum number of active DAG runs per DAG
# > Default original: max_active_runs_per_dag = 16
max_active_runs_per_dag = 4

# Whether to load the examples that ship with Airflow. It's good to
# get started, but you probably want to set this to False in a production
# environment
load_examples = False

# Where your Airflow plugins are stored
plugins_folder = /home/airflow/plugins

# Secret key to save connection passwords in the db
fernet_key = 
#! this needs to be set to enable obfuscated password shown in GUI

# Whether to disable pickling dags
donot_pickle = False

# How long before timing out a python file import
dagbag_import_timeout = 30

# How long before timing out a DagFileProcessor, which processes a dag file
dag_file_processor_timeout = 120

# The class to use for running task instances in a subprocess
task_runner = StandardTaskRunner

# If set, tasks without a `run_as_user` argument will be run with this user
# Can be used to de-elevate a sudo user running Airflow when executing tasks
default_impersonation =

# What security module to use (for example kerberos):
security =

# If set to False enables some unsecure features like Charts and Ad Hoc Queries.
# In 2.0 will default to True.
secure_mode = False

# Turn unit test mode on (overwrites many configuration options with test
# values at runtime)
unit_test_mode = False

# Name of handler to read task instance logs.
# Default to use task handler.
task_log_reader = task

# Whether to enable pickling for xcom (note that this is insecure and allows for
# RCE exploits). This will be deprecated in Airflow 2.0 (be forced to False).
enable_xcom_pickling = True

# When a task is killed forcefully, this is the amount of time in seconds that
# it has to cleanup after it is sent a SIGTERM, before it is SIGKILLED
killed_task_cleanup_time = 60

# Whether to override params with dag_run.conf. If you pass some key-value pairs through `airflow backfill -c` or
# `airflow trigger_dag -c`, the key-value pairs will override the existing ones in params.
dag_run_conf_overrides_params = False

# Worker initialisation check to validate Metadata Database connection
worker_precheck = False

# When discovering DAGs, ignore any files that don't contain the strings `DAG` and `airflow`.
dag_discovery_safe_mode = True

# The number of retries each task is going to have by default. Can be overridden at dag or task level.
default_task_retries = 0

[cli]
# In what way should the cli access the API. The LocalClient will use the
# database directly, while the json_client will use the api running on the
# webserver
api_client = airflow.api.client.local_client

# If you set web_server_url_prefix, do NOT forget to append it here, ex:
# endpoint_url = http://localhost:8080/myroot
# So api will look like: http://localhost:8080/myroot/api/experimental/...
endpoint_url = http://localhost:8080

[api]
# How to authenticate users of the API
auth_backend = airflow.api.auth.backend.default

[lineage]
# what lineage backend to use
backend =

[atlas]
sasl_enabled = False
host =
port = 21000
username =
password =

[operators]
# The default owner assigned to each new operator, unless
# provided explicitly or passed via `default_args`
default_owner = airflow
default_cpus = 1
default_ram = 512
default_disk = 512
default_gpus = 0

[hive]
...
[webserver]
# The base url of your website as airflow cannot guess what domain or
# cname you are using. This is used in automated emails that
# airflow sends to point links to the right web server
base_url = http://localhost:8080

# The ip specified when starting the web server
web_server_host = 0.0.0.0

# The port on which to run the web server
web_server_port = 8080
...
# Number of seconds the webserver waits before killing gunicorn master that doesn't respond
#Default:web_server_master_timeout = 120
web_server_master_timeout = 300

# Number of seconds the gunicorn webserver waits before timing out on a worker
#Default:web_server_worker_timeout = 120
web_server_worker_timeout = 300

# Number of workers to refresh at a time. When set to 0, worker refresh is
# disabled. When nonzero, airflow periodically refreshes webserver workers by
# bringing up new ones and killing old ones.
worker_refresh_batch_size = 1

# Number of seconds to wait before refreshing a batch of workers.
#Default:worker_refresh_interval = 30
worker_refresh_interval = 60

# Secret key used to run your flask app
secret_key = temporary_key

# Number of workers to run the Gunicorn web server
#Default:workers = 4
workers = 2

# The worker class gunicorn should use. Choices include
# sync (default), eventlet, gevent
worker_class = sync

# Log files for the gunicorn webserver. '-' means log to stderr.
#Default:access_logfile = -
#Default:error_logfile = -
access_logile = -
error_logfile = /home/airflow/gunicorn.err
...
# Default DAG view.  Valid values are:
# tree, graph, duration, gantt, landing_times
dag_default_view = tree

# Default DAG orientation. Valid values are:
# LR (Left->Right), TB (Top->Bottom), RL (Right->Left), BT (Bottom->Top)
dag_orientation = LR

# Puts the webserver in demonstration mode; blurs the names of Operators for
# privacy.
demo_mode = False

# The amount of time (in secs) webserver will wait for initial handshake
# while fetching logs from other worker machine
#Default:log_fetch_timeout_sec = 5
log_fetch_timeout_sec = 30

# By default, the webserver shows paused DAGs. Flip this to hide paused
# DAGs by default
hide_paused_dags_by_default = False

# Consistent page size across all listing views in the UI
page_size = 100
...
# Define the color of navigation bar
#Default:navbar_color = #007A87
navbar_color = #1C33C7

# Default dagrun to show in UI
default_dag_run_display_number = 25
...
# Default setting for wrap toggle on DAG code and TI log views.
default_wrap = False
...
[email]
email_backend = airflow.utils.email.send_email_smtp

[smtp]
# If you want airflow to send emails on retries, failure, and you want to use
# the airflow.utils.email.send_email_smtp function, you have to configure an
# smtp server here
smtp_host = <valid-SMTP-server-connection>
smtp_starttls = True
smtp_ssl = False
# Uncomment and set the user/pass settings if you want to use SMTP AUTH
# smtp_user = 
# smtp_password = 
smtp_port = 25
smtp_mail_from = <valid-domain-address@domain>

[sentry]
...
[celery]
...
[dask]
...
[scheduler]
# Task instances listen for external kill signal (when you clear tasks
# from the CLI or the UI), this defines the frequency at which they should
# listen (in seconds).
job_heartbeat_sec = 5

# The scheduler constantly tries to trigger new tasks (look at the
# scheduler section in the docs for more information). This defines
# how often the scheduler should run (in seconds).
# Default original: scheduler_heartbeat_sec = 5
scheduler_heartbeat_sec = 20

# after how much time should the scheduler terminate in seconds
# -1 indicates to run continuously (see also num_runs)
run_duration = -1

# The number of times to try to schedule each DAG file
# -1 indicates unlimited number
num_runs = -1

# The number of seconds to wait between consecutive DAG file processing
# Default original: processor_poll_interval = 1
processor_poll_interval = 10

# after how much time (seconds) a new DAGs should be picked up from the filesystem
# Default original: min_file_process_interval = 0
min_file_process_interval = 10

# How often (in seconds) to scan the DAGs directory for new files. Default to 5 minutes.
dag_dir_list_interval = 300

# How often should stats be printed to the logs
print_stats_interval = 30

# If the last scheduler heartbeat happened more than scheduler_health_check_threshold ago (in seconds),
# scheduler is considered unhealthy.
# This is used by the health check in the "/health" endpoint
scheduler_health_check_threshold = 30

child_process_log_directory = /home/airflow/logs/scheduler
...
# Turn off scheduler catchup by setting this to False.
# Default behavior is unchanged and
# Command Line Backfills still work, but the scheduler
# will not do scheduler catchup if this is False,
# however it can be set on a per DAG basis in the
# DAG definition (catchup)
catchup_by_default = True

# This changes the batch size of queries in the scheduling main loop.
# If this is too high, SQL query performance may be impacted by one
# or more of the following:
#  - reversion to full table scan
#  - complexity of query predicate
#  - excessive locking
#
# Additionally, you may hit the maximum allowable query length for your db.
#
# Set this to 0 for no limit (not advised)
max_tis_per_query = 512

# Statsd (https://github.com/etsy/statsd) integration settings
...
# The scheduler can run multiple threads in parallel to schedule dags.
# This defines how many threads will run.
max_threads = 2

authenticate = False

# Turn off scheduler use of cron intervals by setting this to False.
# DAGs submitted manually in the web UI or with trigger_dag will still run.
use_job_schedule = True

[ldap]
...
[kerberos]
ccache = /tmp/airflow_krb5_ccache
# gets augmented with fqdn
principal = airflow
reinit_frequency = 3600
kinit_path = kinit
keytab = airflow.keytab
...
[elasticsearch]
...
[kubernetes]
...
[core]
#气流管道所在的文件夹,很可能是
#代码存储库中的子文件夹
#这条路必须是绝对的
dags_文件夹=/ABTEILUNG/DWH/afflow/dags
#气流应存储其日志文件的文件夹
#这条路必须是绝对的
#>将日志写入GIT主文件夹是一个非常糟糕的主意,需要尽快更改!!!
基本日志文件夹=/ABTEILUNG/DWH/afflow/logs
#Airflow可以在AWS S3、谷歌云存储或弹性搜索中远程存储日志。
#用户必须提供气流连接id,以便访问存储
#地点。如果remote_logging设置为true,请参阅update.md以了解更多信息
#配置要求。
远程日志记录=错误
远程日志连接id=
远程\u基本\u日志\u文件夹=
加密\u s3\u日志=False
#日志记录级别
日志记录\u级别=信息
工厂日志记录级别=警告
#日志类
#指定将指定日志记录配置的类
#此类必须位于python类路径上
#logging\u config\u class=my.path.default\u local\u settings.logging\u config
日志记录\u配置\u类=
#日志格式
#当控制终端为TTY时,给日志上色。
彩色控制台日志=真
彩色日志格式=[%%(蓝色)s%%(asctime)s%%(重置)s]{%%(蓝色)s%%(文件名)s:%%(重置)s%%(行号)d}%%(日志颜色)s%%(levelname)s%%(重置)s-%%(日志颜色)s%%(消息)s%%(重置)s
彩色\u格式化程序\u类=afflow.utils.log.colored\u log.CustomTTYColoredFormatter
日志格式=[%%(asctime)s]{%%(文件名)s:%%(行号)d}%%(levelname)s-%%(消息)s
简单日志格式=%%(asctime)s%%(levelname)s-%%(message)s
#日志文件名格式
log_filename_template={{ti.dag_id}}/{{ti.task_id}}/{{ts}/{{try_number}.log
日志\处理器\文件名\模板={{filename}}.log
dag_处理器_管理器_日志_位置=/home/afflow/logs/dag_处理器_管理器/dag_处理器_管理器.log
#通过提供可调用文件的路径来解析主机名
#格式为“包:功能”。例如
#默认值“socket:getfqdn”表示“socket”包的getfqdn()的结果将用作主机名
#指定的函数中不需要参数。
#如果首选使用IP地址作为主机名,请使用值“afflow.utils.net:get\u host\u IP\u address”
hostname\u callable=socket:getfqdn
#如果提供的日期时间不正确,则为默认时区
#可以是utc(默认)、system或任何IANA时区字符串(例如欧洲/阿姆斯特丹)
默认时区=欧洲/柏林
#气流应使用的执行器类。选择包括
#顺序执行器、本地执行器、CeleryExecutor、DaskExecutor、Kubernetesecutor
#默认值:
#执行者=顺序执行者
executor=LocalExecutor
#指向元数据数据库的SqlAlchemy连接字符串。
#SqlAlchemy支持许多不同的数据库引擎,更多信息
#他们的网站
#sql\u炼金术\u连接=sqlite:////home/airflow/airflow.db
#! 这需要更改为MySQL或Postgres数据库
sql\u alchemy\u conn=postgresql+psycopg2://localhost:5436/aiffort\u db?user=svc\u etl&password=
#数据库的编码
sql_引擎_编码=utf-8
#如果SqlAlchemy应该共享数据库连接。
sql\u alchemy\u pool\u enabled=True
#SqlAlchemy池大小是数据库连接的最大数量
#在游泳池里。0表示没有限制。
sql\u炼金术\u池大小=5
#池的最大溢出大小。
#当签出的连接数达到池_size中设置的大小时,
#其他连接将返回到此限制。
#当这些附加连接返回到池时,它们将断开连接并丢弃。
#因此,池允许的同时连接总数为池大小+最大溢出,
#池允许的“休眠”连接总数是池大小。
#最大溢出可设置为-1,表示无溢出限制;
#并发连接的总数不受限制。默认值为10。
sql\u炼金术\u最大\u溢出=10
#SqlAlchemy池循环是连接的秒数
#可以在无效之前在池中处于空闲状态。这个配置是doe