Python 气流1.9-无法获取要写入s3的日志

Python 气流1.9-无法获取要写入s3的日志,python,airflow,Python,Airflow,我在aws的kubernetes运行气流1.9。我希望日志转到s3,因为气流容器本身寿命不长 我已经阅读了描述这个过程的各种线程和文档,但是我仍然无法让它工作。首先是一个向我证明s3配置和权限有效的测试。这是在我们的一个工作实例上运行的 使用气流写入s3文件 airflow@airflow-worker-847c66d478-lbcn2:~$ id uid=1000(airflow) gid=1000(airflow) groups=1000(airflow) airflow@airflow-w

我在aws的kubernetes运行气流1.9。我希望日志转到s3,因为气流容器本身寿命不长

我已经阅读了描述这个过程的各种线程和文档,但是我仍然无法让它工作。首先是一个向我证明s3配置和权限有效的测试。这是在我们的一个工作实例上运行的

使用气流写入s3文件

airflow@airflow-worker-847c66d478-lbcn2:~$ id
uid=1000(airflow) gid=1000(airflow) groups=1000(airflow)
airflow@airflow-worker-847c66d478-lbcn2:~$ env |grep s3
AIRFLOW__CONN__S3_LOGS=s3://vevo-dev-us-east-1-services-airflow/logs/
AIRFLOW__CORE__REMOTE_LOG_CONN_ID=s3_logs
AIRFLOW__CORE__REMOTE_BASE_LOG_FOLDER=s3://vevo-dev-us-east-1-services-airflow/logs/
airflow@airflow-worker-847c66d478-lbcn2:~$ python
Python 3.6.4 (default, Dec 21 2017, 01:37:56)
[GCC 4.9.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import airflow
>>> s3 = airflow.hooks.S3Hook('s3_logs')
/usr/local/lib/python3.6/site-packages/airflow/utils/helpers.py:351: DeprecationWarning: Importing S3Hook directly from <module 'airflow.hooks' from '/usr/local/lib/python3.6/site-packages/airflow/hooks/__init__.py'> has been deprecated. Please import from '<module 'airflow.hooks' from '/usr/local/lib/python3.6/site-packages/airflow/hooks/__init__.py'>.[operator_module]' instead. Support for direct imports will be dropped entirely in Airflow 2.0.
  DeprecationWarning)
>>> s3.load_string('put this in s3 file', airflow.conf.get('core', 'remote_base_log_folder') + "/airflow-test")
[2018-02-23 18:43:58,437] {{base_hook.py:80}} INFO - Using connection to: vevo-dev-us-east-1-services-airflow
因此,除了气流作业不使用s3进行日志记录外,气流s3连接似乎很好。这里是我的设置,我认为要么是错误的,要么是遗漏了什么

正在运行的worker/scheduler/master实例的环境变量为

airflow@airflow-worker-847c66d478-lbcn2:~$ env |grep -i s3
AIRFLOW__CONN__S3_LOGS=s3://vevo-dev-us-east-1-services-airflow/logs/
AIRFLOW__CORE__REMOTE_LOG_CONN_ID=s3_logs
AIRFLOW__CORE__REMOTE_BASE_LOG_FOLDER=s3://vevo-dev-us-east-1-services-airflow/logs/
S3_BUCKET=vevo-dev-us-east-1-services-airflow
这表明气流中存在s3_日志连接

airflow@airflow-worker-847c66d478-lbcn2:~$ airflow connections -l|grep s3
│ 's3_logs'              │ 's3'                    │ 'vevo-dev-
us-...vices-airflow' │ None   │ False          │ False                │ None                           │
我把这个文件放在我的docker映像中。你可以在我们的一个工人身上看到一个例子

airflow@airflow-worker-847c66d478-lbcn2:~$ ls -al /usr/local/airflow/config/
total 32
drwxr-xr-x. 2 root    root    4096 Feb 23 00:39 .
drwxr-xr-x. 1 airflow airflow 4096 Feb 23 00:53 ..
-rw-r--r--. 1 root    root    4471 Feb 23 00:25 airflow_local_settings.py
-rw-r--r--. 1 root    root       0 Feb 16 21:35 __init__.py
我们已编辑该文件以定义远程\u BASE\u LOG\u FOLDER变量。这里是我们的版本和上游版本之间的差异

index 899e815..897d2fd 100644
--- a/var/tmp/file
+++ b/config/airflow_local_settings.py
@@ -35,7 +35,8 @@ PROCESSOR_FILENAME_TEMPLATE = '{{ filename }}.log'
 # Storage bucket url for remote logging
 # s3 buckets should start with "s3://"
 # gcs buckets should start with "gs://"
-REMOTE_BASE_LOG_FOLDER = ''
+REMOTE_BASE_LOG_FOLDER = conf.get('core', 'remote_base_log_folder')
+

 DEFAULT_LOGGING_CONFIG = {
     'version': 1,
在这里,您可以看到我们的一个工人的设置是正确的

>>> import airflow
>>> airflow.conf.get('core', 'remote_base_log_folder')
's3://vevo-dev-us-east-1-services-airflow/logs/'
基于远程\u基本\u日志\u文件夹以“s3”开头且远程\u日志记录为真这一事实

>>> airflow.conf.get('core', 'remote_logging')
'True'
我希望这个块的计算结果为true,并使日志转到s3

请任何使用s3日志记录1.9的人指出我缺少什么?我想向上游项目提交一份PR,以更新文档,因为这似乎是一个非常常见的问题,并且尽我所能,上游文档无效或经常被误解


谢谢!G.

是的,我也很难仅仅根据文档进行设置。我得检查一下气流的代码才能弄清楚。有很多事情是你做不到的

需要检查的内容:
1.确保您拥有log_config.py文件,并且该文件的目录正确:./config/log_config.py。还要确保您没有忘记该目录中的_uinit__u;.py文件。
2.确保定义了s3.task处理程序,并将其格式化程序设置为airflow.task
3.确保将airflow.task和airflow.task\u runner处理程序设置为s3.task

下面是一个适用于我的log_config.py文件:

# -*- coding: utf-8 -*-
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import os

from airflow import configuration as conf

# TO DO: Logging format and level should be configured
# in this file instead of from airflow.cfg. Currently
# there are other log format and level configurations in
# settings.py and cli.py. Please see AIRFLOW-1455.

LOG_LEVEL = conf.get('core', 'LOGGING_LEVEL').upper()
LOG_FORMAT = conf.get('core', 'log_format')

BASE_LOG_FOLDER = conf.get('core', 'BASE_LOG_FOLDER')
PROCESSOR_LOG_FOLDER = conf.get('scheduler', 'child_process_log_directory')

FILENAME_TEMPLATE = '{{ ti.dag_id }}/{{ ti.task_id }}/{{ ts }}/{{ try_number }}.log'
PROCESSOR_FILENAME_TEMPLATE = '{{ filename }}.log'

S3_LOG_FOLDER = 's3://your_path_to_airflow_logs'

LOGGING_CONFIG = {
    'version': 1,
    'disable_existing_loggers': False,
    'formatters': {
        'airflow.task': {
            'format': LOG_FORMAT,
        },
        'airflow.processor': {
            'format': LOG_FORMAT,
        },
    },
    'handlers': {
        'console': {
            'class': 'logging.StreamHandler',
            'formatter': 'airflow.task',
            'stream': 'ext://sys.stdout'
        },
        'file.task': {
            'class': 'airflow.utils.log.file_task_handler.FileTaskHandler',
            'formatter': 'airflow.task',
            'base_log_folder': os.path.expanduser(BASE_LOG_FOLDER),
            'filename_template': FILENAME_TEMPLATE,
        },
        'file.processor': {
            'class': 'airflow.utils.log.file_processor_handler.FileProcessorHandler',
            'formatter': 'airflow.processor',
            'base_log_folder': os.path.expanduser(PROCESSOR_LOG_FOLDER),
            'filename_template': PROCESSOR_FILENAME_TEMPLATE,
        },
        # When using s3 or gcs, provide a customized LOGGING_CONFIG
        # in airflow_local_settings within your PYTHONPATH, see UPDATING.md
        # for details
        's3.task': {
            'class': 'airflow.utils.log.s3_task_handler.S3TaskHandler',
            'formatter': 'airflow.task',
            'base_log_folder': os.path.expanduser(BASE_LOG_FOLDER),
            's3_log_folder': S3_LOG_FOLDER,
            'filename_template': FILENAME_TEMPLATE,
        },
        # 'gcs.task': {
        #     'class': 'airflow.utils.log.gcs_task_handler.GCSTaskHandler',
        #     'formatter': 'airflow.task',
        #     'base_log_folder': os.path.expanduser(BASE_LOG_FOLDER),
        #     'gcs_log_folder': GCS_LOG_FOLDER,
        #     'filename_template': FILENAME_TEMPLATE,
        # },
    },
    'loggers': {
        '': {
            'handlers': ['console'],
            'level': LOG_LEVEL
        },
        'airflow': {
            'handlers': ['console'],
            'level': LOG_LEVEL,
            'propagate': False,
        },
        'airflow.processor': {
            'handlers': ['file.processor'],
            'level': LOG_LEVEL,
            'propagate': True,
        },
        'airflow.task': {
            'handlers': ['s3.task'],
            'level': LOG_LEVEL,
            'propagate': False,
        },
        'airflow.task_runner': {
            'handlers': ['s3.task'],
            'level': LOG_LEVEL,
            'propagate': True,
        },
    }
}

在使用部署到k8时,我还必须将远程日志配置添加到工作区。 所以这还不够:

  AIRFLOW__CORE__REMOTE_LOGGING: True
  AIRFLOW__CORE__REMOTE_LOG_CONN_ID: s3_logs
  AIRFLOW__CORE__REMOTE_BASE_LOG_FOLDER: 's3://my-log-bucket/logs'
我还得把这些车交给工人

  AIRFLOW__KUBERNETES_ENVIRONMENT_VARIABLES__AIRFLOW__CORE__REMOTE_LOGGING: True
  AIRFLOW__KUBERNETES_ENVIRONMENT_VARIABLES__AIRFLOW__CORE__REMOTE_LOG_CONN_ID: s3_logs
  AIRFLOW__KUBERNETES_ENVIRONMENT_VARIABLES__AIRFLOW__CORE__REMOTE_BASE_LOG_FOLDER: 's3://my-log-bucket/logs'

你能包括你对s3.TaskHandler的定义吗。我不知道这是什么或者应该放在哪里。嗨,
/config/
应该放在哪里?我把它放在
$afflow\u HOME
&它仍然显示
导入错误:无法从config.log\u config.logging\u config加载自定义日志,因为没有名为'config'的模块。
。我甚至试着在
$PYTHONPATH
中添加
$avirflow\u HOME
  AIRFLOW__KUBERNETES_ENVIRONMENT_VARIABLES__AIRFLOW__CORE__REMOTE_LOGGING: True
  AIRFLOW__KUBERNETES_ENVIRONMENT_VARIABLES__AIRFLOW__CORE__REMOTE_LOG_CONN_ID: s3_logs
  AIRFLOW__KUBERNETES_ENVIRONMENT_VARIABLES__AIRFLOW__CORE__REMOTE_BASE_LOG_FOLDER: 's3://my-log-bucket/logs'