Python 在Apache中访问自定义运算符中的params参数 问题
我希望将值列表或任何值作为参数传递给自定义运算符,修改运算符中的值,然后通过Python 在Apache中访问自定义运算符中的params参数 问题,python,python-3.x,airflow,Python,Python 3.x,Airflow,我希望将值列表或任何值作为参数传递给自定义运算符,修改运算符中的值,然后通过{params}宏访问sql模板中的值 当前设置 以下是我的设置的相关部分,为清晰起见,有点做作 DAG定义: from airflow import DAG from datetime import timedelta, datetime from acme.operators.dwh_operators import ProcessDimensionOperator default_args = { 'ow
{params}
宏访问sql模板中的值
当前设置
以下是我的设置的相关部分,为清晰起见,有点做作
DAG定义:
from airflow import DAG
from datetime import timedelta, datetime
from acme.operators.dwh_operators import ProcessDimensionOperator
default_args = {
'owner': 'airflow',
'start_date': datetime(2019, 2, 27),
'provide_context': True,
'depends_on_past': True
}
dag = DAG(
'etl',
schedule_interval=None,
dagrun_timeout=timedelta(minutes=60),
template_searchpath=tmpl_search_path,
default_args=default_args,
max_active_runs=1)
process_product_dim = ProcessDimensionOperator(
task_id='process_product_dim',
mysql_conn_id='mysql_dwh',
sql='process_dimension.sql',
database='dwh',
col_names=[
'id',
'name',
'category',
'price',
'available',
'country',
],
t_name='products',
dag=dag)
from airflow.hooks.mysql_hook import MySqlHook
from airflow.models import BaseOperator
from airflow.utils.decorators import apply_defaults
class ProcessDimensionOperator(BaseOperator):
template_fields = (
'sql',
'parameters')
template_ext = ('.sql',)
@apply_defaults
def __init__(
self,
sql,
t_name,
col_names,
database,
mysql_conn_id='mysql_default',
*args, **kwargs):
super(ProcessDimensionOperator, self).__init__(*args, **kwargs)
self.sql = sql
self.t_name = t_name
self.col_names = col_names
self.database = database
self.mysql_conn_id = mysql_conn_id
self.parameters = parameters
def execute(self, context):
hook = MySqlHook(mysql_conn_id=self.mysql_conn_id)
self.params['col_names'] = self.col_names
self.params['t_name'] = self.t_name
self.params['match_statement'] = self.construct_match_statement(self.col_names)
hook.run(sql=self.sql)
def construct_match_statement(self, cols):
map_list = map(lambda x: f'and t.{x} = s.{x}', cols[1:])
return ' '.join(map_list)
操作员定义:
from airflow import DAG
from datetime import timedelta, datetime
from acme.operators.dwh_operators import ProcessDimensionOperator
default_args = {
'owner': 'airflow',
'start_date': datetime(2019, 2, 27),
'provide_context': True,
'depends_on_past': True
}
dag = DAG(
'etl',
schedule_interval=None,
dagrun_timeout=timedelta(minutes=60),
template_searchpath=tmpl_search_path,
default_args=default_args,
max_active_runs=1)
process_product_dim = ProcessDimensionOperator(
task_id='process_product_dim',
mysql_conn_id='mysql_dwh',
sql='process_dimension.sql',
database='dwh',
col_names=[
'id',
'name',
'category',
'price',
'available',
'country',
],
t_name='products',
dag=dag)
from airflow.hooks.mysql_hook import MySqlHook
from airflow.models import BaseOperator
from airflow.utils.decorators import apply_defaults
class ProcessDimensionOperator(BaseOperator):
template_fields = (
'sql',
'parameters')
template_ext = ('.sql',)
@apply_defaults
def __init__(
self,
sql,
t_name,
col_names,
database,
mysql_conn_id='mysql_default',
*args, **kwargs):
super(ProcessDimensionOperator, self).__init__(*args, **kwargs)
self.sql = sql
self.t_name = t_name
self.col_names = col_names
self.database = database
self.mysql_conn_id = mysql_conn_id
self.parameters = parameters
def execute(self, context):
hook = MySqlHook(mysql_conn_id=self.mysql_conn_id)
self.params['col_names'] = self.col_names
self.params['t_name'] = self.t_name
self.params['match_statement'] = self.construct_match_statement(self.col_names)
hook.run(sql=self.sql)
def construct_match_statement(self, cols):
map_list = map(lambda x: f'and t.{x} = s.{x}', cols[1:])
return ' '.join(map_list)
process\u dimension.sql
create table if not exists staging.{{ params.t_name }};
select
*
from
source.{{ params.t_name }} as source
join
target.{{ params.t_name }} as target
on source.id = target.id {{ params.match_statement }}
但这会引发错误,因为{{params.t_name}}
和{{params.match_statement}}
呈现为null
我试过的
- 在任务定义的
参数中设置params
和t_name
,并将映射/连接逻辑保留在sql模板中。这是可行的,但如果可能的话,我希望将逻辑排除在模板之外c_name
- 将
传递到params={xxx}
super(ProcessDimensionOperator,self)。\uuuu init\uuu(params=params,*args,**kwargs)
- 将参数作为
传递到parameters={xxx}
方法中,并使用hook.run()
对其进行模板化,但这会导致问题,因为它会在变量周围加引号,从而弄乱各种sql语句%(x)s
我是python和airflow的新手,所以我可能错过了一些明显的东西,非常感谢您的帮助,谢谢 这里也一样。我只花了几个小时(几天?)来找出问题的原因(上帝保佑IPython.embed和logging)。从1.10.3开始,这是由TaskInstance.render_templates()引起的,在呈现任何模板_字段或模板_ext后,它不会更新Jinja上下文,只更新任务属性。看 因此,您只需使用
{{task.params.whatever}}
而不是
{{params.whatever}}
在.sql模板文件中
事实上,如果Jinja上下文将不断更新,那么就必须注意模板的顺序和依赖性。这是一种嵌套/递归渲染。这也可能会导致性能下降
此外,我不建议使用“参数”(与“params”不同),因为它们似乎要作为参数传递给数据库游标,这样就不能传递数字/整数、列或表名,或者仅仅传递SQL片段(例如where、having、limit等).我希望将
params
传递给super()。这导致了同样的错误?在将其传递到super()之前,您修改了参数?