Airflow 由于无法读取日志文件,任务失败

Airflow 由于无法读取日志文件,任务失败,airflow,google-cloud-composer,Airflow,Google Cloud Composer,Composer由于无法读取日志文件而导致任务失败,它抱怨编码不正确 以下是UI中显示的日志: *** Unable to read remote log from gs://bucket/logs/campaign_exceptions_0_0_1/merge_campaign_exceptions/2019-08-03T10:00:00+00:00/1.log *** 'ascii' codec can't decode byte 0xc2 in position 6986: ordinal

Composer由于无法读取日志文件而导致任务失败,它抱怨编码不正确

以下是UI中显示的日志:

*** Unable to read remote log from gs://bucket/logs/campaign_exceptions_0_0_1/merge_campaign_exceptions/2019-08-03T10:00:00+00:00/1.log
*** 'ascii' codec can't decode byte 0xc2 in position 6986: ordinal not in range(128)

*** Log file does not exist: /home/airflow/gcs/logs/campaign_exceptions_0_0_1/merge_campaign_exceptions/2019-08-03T10:00:00+00:00/1.log
*** Fetching from: http://airflow-worker-68dc66c9db-x945n:8793/log/campaign_exceptions_0_0_1/merge_campaign_exceptions/2019-08-03T10:00:00+00:00/1.log
*** Failed to fetch log file from worker. HTTPConnectionPool(host='airflow-worker-68dc66c9db-x945n', port=8793): Max retries exceeded with url: /log/campaign_exceptions_0_0_1/merge_campaign_exceptions/2019-08-03T10:00:00+00:00/1.log (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f1c9ff19d10>: Failed to establish a new connection: [Errno -2] Name or service not known',))
但是我可以通过
gsutil
下载该文件

当我查看文件时,它似乎有文本覆盖其他文本

我无法显示整个文件,但它看起来如下所示:

--------------------------------------------------------------------------------
Starting attempt 1 of 1
--------------------------------------------------------------------------------
@-@{"task-id": "merge_campaign_exceptions", "execution-date": "2019-08-03T10:00:00+00:00", "workflow": "__campaign_exceptions_0_0_1"}
[2019-08-04 10:01:23,313] {models.py:1569} INFO - Executing <Task(BigQueryOperator): merge_campaign_exceptions> on 2019-08-03T10:00:00+00:00@-@{"task-id": "merge_campaign_exceptions", "execution-date": "2019-08-03T10:00:00+00:00", "workflow": "__campaign_exceptions_0_0_1"}
[2019-08-04 10:01:23,314] {base_task_runner.py:124} INFO - Running: ['bash', '-c', u'airflow run __campaign_exceptions_0_0_1 merge_campaign_exceptions 2019-08-03T10:00:00+00:00 --job_id 22767 --pool _bq_pool --raw -sd DAGS_FOLDER//-campaign-exceptions.py --cfg_path /tmp/tmpyBIVgT']@-@{"task-id": "merge_campaign_exceptions", "execution-date": "2019-08-03T10:00:00+00:00", "workflow": "__campaign_exceptions_0_0_1"}
[2019-08-04 10:01:24,658] {base_task_runner.py:107} INFO - Job 22767: Subtask merge_campaign_exceptions [2019-08-04 10:01:24,658] {settings.py:176} INFO - setting.configure_orm(): Using pool settings. pool_size=5, pool_recycle=1800@-@{"task-id": "merge_campaign_exceptions", "execution-date": "2019-08-03T10:00:00+00:00", "workflow": "__campaign_exceptions_0_0_1"}
--------------------------------------------------------------------------------
开始尝试1次,共1次
--------------------------------------------------------------------------------
@-@{“任务id”:“合并活动例外”,“执行日期”:“2019-08-03T10:00:00+00:00”,“工作流”:“活动例外”\u 0\u 0\u 1}
[2019-08-04 10:01:23313]{models.py:1569}INFO-在2019-08-03T10:00:00+00:00-@{“任务id”:“合并活动异常”,“执行日期”:“2019-08-03T10:00:00+00:00”,“工作流”:“活动异常0\0\u 1”
[2019-08-04 10:01:23314]{base_task_runner.py:124}信息-运行:['bash','-c',u'aiffair run_campion_exceptions_0_0_0_1 merge_campion_exceptions 2019-08-03T10:00:00+00:00-作业id 22767-池-bq_池-raw-sd DAGS_FOLDER/-campaign-exceptions.py-cfg_路径/tmp/tmpygt']-“任务id”merge_camption执行日期:{“2019-08-03T10:00:00+00:00”,“工作流程”:“\u活动\u例外情况\u 0\u 1”}
[2019-08-04 10:01:24658]{base_task_runner.py:107}信息-作业22767:子任务合并_活动_异常[2019-08-04 10:01:24658]{settings.py:176}信息-设置。配置_orm():使用池设置。池大小=5,池回收=1800@-{“任务id”:“合并_活动_异常”,“执行日期”:“2019-08-03T10:00+00:00”,“工作流”:__活动\u例外\u 0\u 0\u 1“}

其中
@-@{}
部分似乎位于典型日志的“顶部”。

在GCP Cloud Composer中查看日志时,我遇到了类似的问题。它似乎没有阻止失败的DAG任务运行。看起来是GKE和保存日志文件的存储桶之间的权限错误


您仍然可以通过进入与/dags文件夹位于同一目录下的集群存储桶查看日志,在该目录下您还可以看到日志/文件夹。

我遇到了同样的问题。在我的案例中,问题是我删除了用于检索日志的
谷歌云\u默认连接

检查配置并查找连接名称

[core]
remote_log_conn_id = google_cloud_default

然后检查用于该连接名称的凭据是否具有访问
GCS bucket

的正确权限。是否确定任务失败是因为无法加载日志,或者可能是因为其他原因?任务更可能写入非ASCII或二进制内容的日志,这会阻止web UI显示他们。这本身不应该对任务是否能够完成有任何影响。不确定tbh和我没有办法检查它,但你说的有道理。有趣的是,
@-{}
仍在GCS的实际日志文件中出现,但没有出现在UI中…即使是在成功的任务中。不确定这是GCP错误还是气流错误;我将深入了解他们的Jira,看看是否有其他人在气流中看到这一点(不是作曲家)我们已经杀死并重新创建了这个集群,因为它已经被冲洗过了…所以这就是答案。非常感谢!
[core]
remote_log_conn_id = google_cloud_default