Airflow 气流计划程序持续崩溃,数据库连接错误(Google Composer)

Airflow 气流计划程序持续崩溃,数据库连接错误(Google Composer),airflow,airflow-scheduler,google-cloud-composer,Airflow,Airflow Scheduler,Google Cloud Composer,我已经使用GoogleComposer一段时间了(Composer-0.5.2-airflow-1.9.0),气流调度器出现了一些问题。调度器容器有时会崩溃,它可能会进入锁定状态,无法启动任何新任务(数据库连接出错),因此我必须重新创建整个Composer环境。这一次,出现了一个CrashLoopBackOff,调度程序盒无法再重新启动。这个错误与我以前遇到的非常相似。这是Stackdriver的回溯: Traceback (most recent call last): File "/us

我已经使用GoogleComposer一段时间了(
Composer-0.5.2-airflow-1.9.0
),气流调度器出现了一些问题。调度器容器有时会崩溃,它可能会进入锁定状态,无法启动任何新任务(数据库连接出错),因此我必须重新创建整个Composer环境。这一次,出现了一个
CrashLoopBackOff
,调度程序盒无法再重新启动。这个错误与我以前遇到的非常相似。这是Stackdriver的回溯:

Traceback (most recent call last):
  File "/usr/local/bin/airflow", line 27, in <module>
    args.func(args)
  File "/usr/local/lib/python2.7/site-packages/airflow/bin/cli.py", line 826, in scheduler
    job.run()
  File "/usr/local/lib/python2.7/site-packages/airflow/jobs.py", line 198, in run
    self._execute()
  File "/usr/local/lib/python2.7/site-packages/airflow/jobs.py", line 1549, in _execute
    self._execute_helper(processor_manager)
  File "/usr/local/lib/python2.7/site-packages/airflow/jobs.py", line 1594, in _execute_helper
    self.reset_state_for_orphaned_tasks(session=session)
  File "/usr/local/lib/python2.7/site-packages/airflow/utils/db.py", line 50, in wrapper
    result = func(*args, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/airflow/jobs.py", line 266, in reset_state_for_orphaned_tasks
    .filter(or_(*filter_for_tis), TI.state.in_(resettable_states))
  File "/usr/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2783, in all
    return list(self)
  File "/usr/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2935, in __iter__
    return self._execute_and_instances(context)
  File "/usr/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2958, in _execute_and_instances
    result = conn.execute(querycontext.statement, self._params)
  File "/usr/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 948, in execute
    return meth(self, multiparams, params)
  File "/usr/local/lib/python2.7/site-packages/sqlalchemy/sql/elements.py", line 269, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "/usr/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1060, in _execute_clauseelement
    compiled_sql, distilled_params
  File "/usr/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1200, in _execute_context
    context)
  File "/usr/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1413, in _handle_dbapi_exception
    exc_info
  File "/usr/local/lib/python2.7/site-packages/sqlalchemy/util/compat.py", line 203, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb, cause=cause)
  File "/usr/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1193, in _execute_context
    context)
  File "/usr/local/lib/python2.7/site-packages/sqlalchemy/engine/default.py", line 508, in do_execute
    cursor.execute(statement, parameters)
  File "/usr/local/lib/python2.7/site-packages/MySQLdb/cursors.py", line 250, in execute
    self.errorhandler(self, exc, value)
  File "/usr/local/lib/python2.7/site-packages/MySQLdb/connections.py", line 50, in defaulterrorhandler
    raise errorvalue
sqlalchemy.exc.OperationalError: (_mysql_exceptions.OperationalError) (1205, 'Lock wait timeout exceeded; try restarting transaction') [SQL: u'SELECT task_instance.try_number AS task_instance_try_number, task_instance.task_id AS task_instance_task_id, task_instance.dag_id AS task_instance_dag_id, task_instance.execution_date AS task_instance_execution_date, task_instance.start_date AS task_instance_start_date, task_instance.end_date AS task_instance_end_date, task_instance.duration AS task_instance_duration, task_instance.state AS task_instance_state, task_instance.max_tries AS task_instance_max_tries, task_instance.hostname AS task_instance_hostname, task_instance.unixname AS task_instance_unixname, task_instance.job_id AS task_instance_job_id, task_instance.pool AS task_instance_pool, task_instance.queue AS task_instance_queue, task_instance.priority_weight AS task_instance_priority_weight, task_instance.operator AS task_instance_operator, task_instance.queued_dttm AS task_instance_queued_dttm, task_instance.pid AS task_instance_pid \nFROM task_instance \nWHERE (task_instance.dag_id = %s AND task_instance.task_id = %s AND task_instance.execution_date = %s OR task_instance.dag_id = %s AND task_instance.task_id = %s AND task_instance.execution_date = %s OR task_instance.dag_id = %s AND task_instance.task_id = %s AND task_instance.execution_date = %s OR task_instance.dag_id = %s AND task_instance.task_id = %s AND task_instance.execution_date = %s OR task_instance.dag_id = %s AND task_instance.task_id = %s AND task_instance.execution_date = %s OR task_instance.dag_id = %s AND task_instance.task_id = %s AND task_instance.execution_date = %s) AND task_instance.state IN (%s, %s) FOR UPDATE'] [parameters: ('pb_write_event_tables_v2_dev2', 'check_table_chest_progressed', datetime.datetime(2018, 6, 26, 8, 0), 'pb_write_event_tables_v2_dev2', 'check_table_name_changed', datetime.datetime(2018, 6, 26, 8, 0), 'pb_write_event_tables_v2_dev2', 'check_table_registered', datetime.datetime(2018, 6, 26, 8, 0), 'pb_write_event_tables_v2_dev2', 'check_table_unit_leveled_up', datetime.datetime(2018, 6, 26, 8, 0), 'pb_write_event_tables_v2_dev2', 'check_table_virtual_currency_earned', datetime.datetime(2018, 6, 26, 8, 0), 'pb_write_event_tables_v2_dev2', 'check_table_virtual_currency_spent', datetime.datetime(2018, 6, 26, 8, 0), u'scheduled', u'queued')] (Background on this error at: http://sqlalche.me/e/e3q8)
回溯(最近一次呼叫最后一次):
文件“/usr/local/bin/afflow”,第27行,在
args.func(args)
调度器中的文件“/usr/local/lib/python2.7/site packages/afflow/bin/cli.py”,第826行
job.run()
文件“/usr/local/lib/python2.7/site packages/afflow/jobs.py”,第198行,运行中
self._execute()
文件“/usr/local/lib/python2.7/site packages/afflow/jobs.py”,第1549行,在
self.\u execute\u helper(处理器\u管理器)
文件“/usr/local/lib/python2.7/site packages/afflow/jobs.py”,第1594行,在“执行”助手中
孤立任务(会话=会话)的self.reset\u state\u
文件“/usr/local/lib/python2.7/site packages/afflow/utils/db.py”,第50行,在包装器中
结果=函数(*args,**kwargs)
文件“/usr/local/lib/python2.7/site packages/afflow/jobs.py”,第266行,对于孤立的任务,处于重置状态
.filter(或_u(*filter_for_tis),TI.state.in_u(可重置_状态))
文件“/usr/local/lib/python2.7/site packages/sqlalchemy/orm/query.py”,第2783行,共
返回列表(自我)
文件“/usr/local/lib/python2.7/site packages/sqlalchemy/orm/query.py”,第2935行,在__
返回self.\u执行\u和\u实例(上下文)
文件“/usr/local/lib/python2.7/site packages/sqlalchemy/orm/query.py”,第2958行,在_execute_和_实例中
结果=conn.execute(querycontext.statement,self.\u参数)
文件“/usr/local/lib/python2.7/site packages/sqlalchemy/engine/base.py”,执行中的第948行
返回方法(自身、多线程、参数)
文件“/usr/local/lib/python2.7/site packages/sqlalchemy/sql/elements.py”,第269行,在连接上执行
返回连接。_execute_clauseelement(self、multiparams、params)
文件“/usr/local/lib/python2.7/site packages/sqlalchemy/engine/base.py”,第1060行,位于执行元素中
编译的sql,提取的参数
文件“/usr/local/lib/python2.7/site packages/sqlalchemy/engine/base.py”,第1200行,在执行上下文中
(上下文)
文件“/usr/local/lib/python2.7/site packages/sqlalchemy/engine/base.py”,第1413行,在_handle_dbapi_exception中
exc_信息
文件“/usr/local/lib/python2.7/site packages/sqlalchemy/util/compat.py”,第203行,从原因中提升
重新释放(类型(异常),异常,tb=exc\U tb,原因=原因)
文件“/usr/local/lib/python2.7/site packages/sqlalchemy/engine/base.py”,第1193行,在执行上下文中
(上下文)
文件“/usr/local/lib/python2.7/site packages/sqlalchemy/engine/default.py”,第508行,在do_execute中
cursor.execute(语句、参数)
文件“/usr/local/lib/python2.7/site packages/MySQLdb/cursors.py”,第250行,在execute中
errorhandler(self、exc、value)
文件“/usr/local/lib/python2.7/site packages/MySQLdb/connections.py”,第50行,在defaulterrorhandler中
提高错误值
sqlalchemy.exc.OperationalError:(_mysql_exceptions.OperationalError)(1205,“超过锁等待超时;尝试重新启动事务”)[SQL:u'SELECT task\u instance.try\u number作为task\u instance\u try\u number,task\u instance.task\u id作为task\u instance\u dag\u id,task\u instance.execution\u date作为task\u instance\u execution\u date,task\u instance.start\u date作为task\u instance\u开始日期,task\u instance.end\u date作为task\u实例\u结束日期,task\u instance.durati在上,作为任务\实例\持续时间,任务\实例.state作为任务\实例\状态,任务\实例.max\尝试作为任务\实例\ max\尝试,任务\实例.hostname作为任务\实例\主机名,任务\实例.unixname作为任务\实例\ unixname,任务\实例.job\ id作为任务\实例\作业\ id,任务\实例.pool作为任务\实例\池,任务\实例.queue作为任务\实例e_队列,task_instance.priority_weight作为task_instance_priority_weight,task_instance.operator作为task_instance_operator,task_instance.queued_dttm作为task_instance_queued_dttm,task_instance.pid作为task_instance_pid\n来自task_instance\n此处(task_instance.dag_id=%s AND task_instance.task_id=%s AND task_instance.dag_id=%s AND task_instance.task_id=%s AND task_instance.execution_date=%s OR task_instance.dag_id=%s AND task_instance.task_id=%s AND task_instance.execution_日期=%s OR task_instance.dag_id=%s AND task_instance.task_id=%s)%s和task_instance.execution_date=%s或task_instance.dag_id=%s和task_instance.task_id=%s和task_instance.execution_date=%s或task_instance.dag_id=%s和task_instance.execution_date=%s)和task_instance.state(在(%s,%s)中更新'.[参数:('pb_write_event_tables_v2_dev2'、'check_table_Cast_Progress'、datetime.datetime(2018,6,26,8,0)、'pb_write_event_tables_v2_dev2'、'check_tables_name_changed'、datetime.datetime(2018,6,26,8,0)、'pb_write_event_tables___v2_dev2;_registered'、datetime.datetime(2018,6,26,8,0)日期时间(2018年6月26日8月0日),日期时间(2018年6月26日8日0日),日期时间(2018年),日期时间(2018年6月26日8日0日),日期时间(2018年6月26日8日0日),日期时间(2018年6月26日8日8日),排队)(有关此错误的背景信息,请访问:http://sqlalche.me/e/e3q8)
我对技术上的RDBMS错误一无所知。然而,这是一个具有默认环境的开箱即用的Google Composer,所以我想知道是否有其他人也有类似的问题或想法