Airflow 气流工人卡住：任务位于'；运行'；不是有效执行状态的状态。必须清除该任务才能运行_Airflow_Airflow Scheduler

Airflow 气流工人卡住：任务位于'；运行'；不是有效执行状态的状态。必须清除该任务才能运行

airflow

Airflow 气流工人卡住：任务位于'；运行'；不是有效执行状态的状态。必须清除该任务才能运行,airflow,airflow-scheduler,Airflow,Airflow Scheduler,气流任务在没有任何问题的情况下运行，中途突然卡住，任务实例详细信息如上述消息所示我清除了我的整个数据库，但仍然得到了相同的错误事实上，我得到这个问题，只有一些达格。大多数情况下，当长期运行的作业我正在犯错误 [2019-07-03 12:14:56,337] {{models.py:1353}} INFO - Dependencies not met for <TaskInstance: XXXXXX.index_to_es 2019-07-01T13:30:00+00:00 [ru

气流任务在没有任何问题的情况下运行，中途突然卡住，任务实例详细信息如上述消息所示

我清除了我的整个数据库，但仍然得到了相同的错误

事实上，我得到这个问题，只有一些达格。大多数情况下，当长期运行的作业

我正在犯错误

[2019-07-03 12:14:56,337] {{models.py:1353}} INFO - Dependencies not met for <TaskInstance: XXXXXX.index_to_es 2019-07-01T13:30:00+00:00 [running]>, dependency 'Task Instance State' FAILED: Task is in the 'running' state which is not a valid state for execution. The task must be cleared in order to be run.
[2019-07-03 12:14:56,341] {{models.py:1353}} INFO - Dependencies not met for <TaskInstance: XXXXXX.index_to_es 2019-07-01T13:30:00+00:00 [running]>, dependency 'Task Instance Not Already Running' FAILED: Task is already running, it started on 2019-07-03 05:58:51.601552+00:00.
[2019-07-03 12:14:56,342] {{logging_mixin.py:95}} INFO - [2019-07-03 12:14:56,342] {{jobs.py:2514}} INFO - Task is not able to be run

谢谢你的帮助

我发现了问题所在，那是基础设施的问题。我使用的是AWS EFS，当达到吞吐量时，突发模式会阻塞工作进程。更改为配置模式后，工作进程不再处于停滞状态。我是从你那里得到这个主意的

当任务日志显示“任务无法运行”时，我也会遇到此错误。因此，它将尝试重试。在重试中，它将成功。但作为KubernetesPodOperator运行的原始任务仍在运行。不知怎的，我忘记了这个任务或者这个正在运行的吊舱。

default_args = {
    'owner': 'datascience',
    'depends_on_past': True,
    'start_date': datetime(2019, 6, 12),
    'email': ['datascience@mycompany.com'],
    'email_on_failure': True,
    'email_on_retry': True,
    'retries': 3,
    'retry_delay': timedelta(minutes=5),
    # 'queue': 'nill',
    # 'pool': 'backfill',
    # 'priority_weight': 10,
    # 'end_date': datetime(2016, 1, 1),
}
def get_index_date(**kwargs):
    tomorrow=kwargs.get('templates_dict').get('tomorrow')
    return str(tomorrow).replace('-','.')

"""
Create Dags specify its features
"""
dag = DAG(
    DAG_NAME,
    schedule_interval="0 9 * * *",
    catchup=True,
    default_args=default_args,
    template_searchpath='/efs/sql')

create_table = BigQueryOperator(
    dag=dag,
    task_id='create_temp_table_from_query',
    sql='daily_demand.sql',
    use_legacy_sql=False,
    destination_dataset_table=TEMP_TABLE,
    bigquery_conn_id=CONNECTION_ID,
    create_disposition='CREATE_IF_NEEDED',
    write_disposition='WRITE_TRUNCATE'
)

"""Task to zip and export to GCS"""
export_to_storage = BigQueryToCloudStorageOperator(
    task_id='export_to_GCS',
    source_project_dataset_table=TEMP_TABLE,
    destination_cloud_storage_uris=[CLOUD_STORAGE_URI],
    export_format='NEWLINE_DELIMITED_JSON',
    compression='GZIP',
    bigquery_conn_id=CONNECTION_ID,
    dag=dag)
"""Task to get the tomorrow execution date formatted for indexing"""
get_index_date = PythonOperator(
    task_id='get_index_date',
    python_callable=get_index_date,
    templates_dict={'tomorrow':"{{ tomorrow_ds }}"},
    provide_context=True,
    dag=dag
)
"""Task to download zipped files and bulkindex to elasticsearch"""
es_indexing = EsDownloadAndIndexOperator(
    task_id="index_to_es",
    object=OBJECT,
    es_url=ES_URI,
    local_path=LOCAL_FILE,
    gcs_conn_id=CONNECTION_ID,
    bucket=GCS_BUCKET_ID,
    es_index_type='demand_shopper',
    es_bulk_batch=5000,
    es_index_name=INDEX,
    es_request_timeout=300,
    dag=dag)


"""Define the chronology of tasks in DAG"""
create_table >> export_to_storage >> get_index_date >> es_indexing