Google cloud platform 如何访问气流中默认参数中的任何变量?
我对airflow和python是新手,这里我有一些疑问,我有一个方法read_props,它从bucket中读取属性文件,在读取了我想在默认参数中使用的相同值之后,我得到了名称“CLUSTER_name”未定义错误。请检查下面的代码Google cloud platform 如何访问气流中默认参数中的任何变量?,google-cloud-platform,airflow,google-cloud-composer,Google Cloud Platform,Airflow,Google Cloud Composer,我对airflow和python是新手,这里我有一些疑问,我有一个方法read_props,它从bucket中读取属性文件,在读取了我想在默认参数中使用的相同值之后,我得到了名称“CLUSTER_name”未定义错误。请检查下面的代码 # Importing Modules from airflow import DAG from airflow.contrib.operators import dataproc_operator from datetime import datetime,
# Importing Modules
from airflow import DAG
from airflow.contrib.operators import dataproc_operator
from datetime import datetime, timedelta
from airflow.operators.python_operator import PythonOperator
from google.cloud import storage
keys = {}
global CLUSTER_NAME
def read_props():
blobs = iterate_bucket()
for blob in blobs:
print("blob name : ", blob.name)
if "cluster_config.properties" in blob.name:
downloaded_data = blob.download_as_string()
downloaded_data.decode('ascii')
split_data = downloaded_data.split()
print("split_data : ", split_data)
for line in split_data:
print("line : ", line)
if b'=' in line:
name, value = line.split(b'=', 1)
keys[(name.strip())] = (value.strip())
print("keys inside :", keys)
print("keys :", keys)
print("keys get :", keys.get(b'cluster_name').decode('ascii'))
CLUSTER_NAME = keys.get(b'cluster_name').decode('ascii')
print("CLUSTER_NAME : ", CLUSTER_NAME)
#task_instance.xcom_push(key="CLUSTER_NAME", value=keys.get(b'cluster_name').decode('ascii'))
def iterate_bucket():
bucket_name = 'bucket-name'
storage_client = storage.Client.from_service_account_json(
'/home/airflow/gcs/data/private-key.json')
bucket = storage_client.get_bucket(bucket_name)
blobs = bucket.list_blobs()
return blobs
# In default args we are using readYML method object, which will be the helpful for dataproc creation
default_args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': datetime(2019, 1, 1),
'email': ['airflow@example.com'],
'email_on_failure': False,
'email_on_retry': False,
'retries': 1,
'retry_delay': timedelta(minutes=5),
'cluster_name': CLUSTER_NAME,
# 'project_id': project_id,
# 'zone': zone,
# 'num_workers': num_workers,
# 'worker_machine_type': worker_machine_type,
# 'master_machine_type': master_machine_type,
}
# Instantiate a DAG
dag = DAG(dag_id='create_cluster', default_args=default_args, schedule_interval=timedelta(days=1))
t1 = PythonOperator(task_id='Raw1', python_callable=read_props,dag=dag)
# creating a dataproc cluster using DataprocClusterCreateOperator
create_dataproc_cluster =
dataproc_operator.DataprocClusterCreateOperator(task_id='create_dataproc_cluster', dag=dag)
t1 >> create_dataproc_cluster
在默认参数中,我收到错误-未定义名称“CLUSTER_name”
请参阅下面的属性文件
cluster_name=cluster-name
project_id=project-name
zone=zone-name
num_workers=*
worker_machine_type=**-*****-*
master_machine_type=**-*****-*
您尚未将该值分配给
CLUSTER\u NAME
,因此在default\u args
中使用该值会导致错误。您所做的是在read_props
中本地为其赋值,创建DAG对象后将调用该方法。请澄清您使用此代码的意图。