Google cloud platform 如何访问气流中默认参数中的任何变量?

Google cloud platform 如何访问气流中默认参数中的任何变量?,google-cloud-platform,airflow,google-cloud-composer,Google Cloud Platform,Airflow,Google Cloud Composer,我对airflow和python是新手,这里我有一些疑问,我有一个方法read_props,它从bucket中读取属性文件,在读取了我想在默认参数中使用的相同值之后,我得到了名称“CLUSTER_name”未定义错误。请检查下面的代码 # Importing Modules from airflow import DAG from airflow.contrib.operators import dataproc_operator from datetime import datetime,

我对airflow和python是新手,这里我有一些疑问,我有一个方法read_props,它从bucket中读取属性文件,在读取了我想在默认参数中使用的相同值之后,我得到了名称“CLUSTER_name”未定义错误。请检查下面的代码

# Importing Modules
from airflow import DAG

from airflow.contrib.operators import dataproc_operator
from datetime import datetime, timedelta
from airflow.operators.python_operator import PythonOperator
from google.cloud import storage

keys = {}
global CLUSTER_NAME

def read_props():
    blobs = iterate_bucket()
    for blob in blobs:
        print("blob name : ", blob.name)
        if "cluster_config.properties" in blob.name:
            downloaded_data = blob.download_as_string()
            downloaded_data.decode('ascii')
            split_data = downloaded_data.split()
            print("split_data : ", split_data)
            for line in split_data:
                print("line : ", line)
                if b'=' in line:
                    name, value = line.split(b'=', 1)
                    keys[(name.strip())] = (value.strip())
                    print("keys inside :", keys)
    print("keys :", keys)
    print("keys get :", keys.get(b'cluster_name').decode('ascii'))
    CLUSTER_NAME = keys.get(b'cluster_name').decode('ascii')
    print("CLUSTER_NAME : ", CLUSTER_NAME)
    #task_instance.xcom_push(key="CLUSTER_NAME", value=keys.get(b'cluster_name').decode('ascii'))


def iterate_bucket():
    bucket_name = 'bucket-name'
    storage_client = storage.Client.from_service_account_json(
        '/home/airflow/gcs/data/private-key.json')
    bucket = storage_client.get_bucket(bucket_name)

    blobs = bucket.list_blobs()
    return blobs


 # In default args we are using readYML method object, which will be the helpful for dataproc creation
default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'start_date': datetime(2019, 1, 1),
    'email': ['airflow@example.com'],
    'email_on_failure': False,
    'email_on_retry': False,
    'retries': 1,
    'retry_delay': timedelta(minutes=5),
    'cluster_name': CLUSTER_NAME,
    # 'project_id': project_id,
    # 'zone': zone,
    # 'num_workers': num_workers,
    # 'worker_machine_type': worker_machine_type,
    # 'master_machine_type': master_machine_type,    
}

# Instantiate a DAG

dag = DAG(dag_id='create_cluster', default_args=default_args, schedule_interval=timedelta(days=1))

t1 = PythonOperator(task_id='Raw1', python_callable=read_props,dag=dag)
# creating a dataproc cluster using DataprocClusterCreateOperator
create_dataproc_cluster = 
dataproc_operator.DataprocClusterCreateOperator(task_id='create_dataproc_cluster', dag=dag)



t1 >> create_dataproc_cluster
在默认参数中,我收到错误-未定义名称“CLUSTER_name”

请参阅下面的属性文件

cluster_name=cluster-name
project_id=project-name
zone=zone-name
num_workers=*
worker_machine_type=**-*****-*
master_machine_type=**-*****-*

您尚未将该值分配给
CLUSTER\u NAME
,因此在
default\u args
中使用该值会导致错误。您所做的是在
read_props
中本地为其赋值,创建DAG对象后将调用该方法。请澄清您使用此代码的意图。