Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/306.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 在Apache中访问自定义运算符中的params参数 问题_Python_Python 3.x_Airflow - Fatal编程技术网

Python 在Apache中访问自定义运算符中的params参数 问题

Python 在Apache中访问自定义运算符中的params参数 问题,python,python-3.x,airflow,Python,Python 3.x,Airflow,我希望将值列表或任何值作为参数传递给自定义运算符,修改运算符中的值,然后通过{params}宏访问sql模板中的值 当前设置 以下是我的设置的相关部分,为清晰起见,有点做作 DAG定义: from airflow import DAG from datetime import timedelta, datetime from acme.operators.dwh_operators import ProcessDimensionOperator default_args = { 'ow

我希望将值列表或任何值作为参数传递给自定义运算符,修改运算符中的值,然后通过
{params}
宏访问sql模板中的值

当前设置 以下是我的设置的相关部分,为清晰起见,有点做作

DAG定义:

from airflow import DAG
from datetime import timedelta, datetime
from acme.operators.dwh_operators import ProcessDimensionOperator

default_args = {
    'owner': 'airflow',
    'start_date': datetime(2019, 2, 27),
    'provide_context': True,
    'depends_on_past': True
}

dag = DAG(
    'etl',
    schedule_interval=None,
    dagrun_timeout=timedelta(minutes=60),
    template_searchpath=tmpl_search_path,
    default_args=default_args,
    max_active_runs=1)

process_product_dim = ProcessDimensionOperator(
    task_id='process_product_dim',
    mysql_conn_id='mysql_dwh',
    sql='process_dimension.sql',
    database='dwh',
    col_names=[
        'id',
        'name',
        'category',
        'price',
        'available',
        'country',
    ],
    t_name='products',
    dag=dag)
from airflow.hooks.mysql_hook import MySqlHook
from airflow.models import BaseOperator
from airflow.utils.decorators import apply_defaults

class ProcessDimensionOperator(BaseOperator):
    template_fields = (
        'sql',
        'parameters')
    template_ext = ('.sql',)

    @apply_defaults
    def __init__(
            self,
            sql,
            t_name,
            col_names,
            database,
            mysql_conn_id='mysql_default',
            *args, **kwargs):
        super(ProcessDimensionOperator, self).__init__(*args, **kwargs)
        self.sql = sql
        self.t_name = t_name
        self.col_names = col_names
        self.database = database
        self.mysql_conn_id = mysql_conn_id
        self.parameters = parameters

    def execute(self, context):
        hook = MySqlHook(mysql_conn_id=self.mysql_conn_id)

        self.params['col_names'] = self.col_names
        self.params['t_name'] = self.t_name
        self.params['match_statement'] = self.construct_match_statement(self.col_names)

        hook.run(sql=self.sql)

    def construct_match_statement(self, cols):
        map_list = map(lambda x: f'and t.{x} = s.{x}', cols[1:])

        return ' '.join(map_list)
操作员定义:

from airflow import DAG
from datetime import timedelta, datetime
from acme.operators.dwh_operators import ProcessDimensionOperator

default_args = {
    'owner': 'airflow',
    'start_date': datetime(2019, 2, 27),
    'provide_context': True,
    'depends_on_past': True
}

dag = DAG(
    'etl',
    schedule_interval=None,
    dagrun_timeout=timedelta(minutes=60),
    template_searchpath=tmpl_search_path,
    default_args=default_args,
    max_active_runs=1)

process_product_dim = ProcessDimensionOperator(
    task_id='process_product_dim',
    mysql_conn_id='mysql_dwh',
    sql='process_dimension.sql',
    database='dwh',
    col_names=[
        'id',
        'name',
        'category',
        'price',
        'available',
        'country',
    ],
    t_name='products',
    dag=dag)
from airflow.hooks.mysql_hook import MySqlHook
from airflow.models import BaseOperator
from airflow.utils.decorators import apply_defaults

class ProcessDimensionOperator(BaseOperator):
    template_fields = (
        'sql',
        'parameters')
    template_ext = ('.sql',)

    @apply_defaults
    def __init__(
            self,
            sql,
            t_name,
            col_names,
            database,
            mysql_conn_id='mysql_default',
            *args, **kwargs):
        super(ProcessDimensionOperator, self).__init__(*args, **kwargs)
        self.sql = sql
        self.t_name = t_name
        self.col_names = col_names
        self.database = database
        self.mysql_conn_id = mysql_conn_id
        self.parameters = parameters

    def execute(self, context):
        hook = MySqlHook(mysql_conn_id=self.mysql_conn_id)

        self.params['col_names'] = self.col_names
        self.params['t_name'] = self.t_name
        self.params['match_statement'] = self.construct_match_statement(self.col_names)

        hook.run(sql=self.sql)

    def construct_match_statement(self, cols):
        map_list = map(lambda x: f'and t.{x} = s.{x}', cols[1:])

        return ' '.join(map_list)
process\u dimension.sql

create table if not exists staging.{{ params.t_name }};

select
    *
from
    source.{{ params.t_name }} as source
join
    target.{{ params.t_name }} as target
    on source.id = target.id {{ params.match_statement }}
但这会引发错误,因为
{{params.t_name}}
{{params.match_statement}}
呈现为null

我试过的
  • 在任务定义的
    params
    参数中设置
    t_name
    c_name
    ,并将映射/连接逻辑保留在sql模板中。这是可行的,但如果可能的话,我希望将逻辑排除在模板之外
  • params={xxx}
    传递到
    super(ProcessDimensionOperator,self)。\uuuu init\uuu(params=params,*args,**kwargs)
  • 将参数作为
    parameters={xxx}
    传递到
    hook.run()
    方法中,并使用
    %(x)s
    对其进行模板化,但这会导致问题,因为它会在变量周围加引号,从而弄乱各种sql语句

我是python和airflow的新手,所以我可能错过了一些明显的东西,非常感谢您的帮助,谢谢

这里也一样。我只花了几个小时(几天?)来找出问题的原因(上帝保佑IPython.embed和logging)。从1.10.3开始,这是由TaskInstance.render_templates()引起的,在呈现任何模板_字段或模板_ext后,它不会更新Jinja上下文,只更新任务属性。看

因此,您只需使用

{{task.params.whatever}}

而不是

{{params.whatever}}

在.sql模板文件中

事实上,如果Jinja上下文将不断更新,那么就必须注意模板的顺序和依赖性。这是一种嵌套/递归渲染。这也可能会导致性能下降


此外,我不建议使用“参数”(与“params”不同),因为它们似乎要作为参数传递给数据库游标,这样就不能传递数字/整数、列或表名,或者仅仅传递SQL片段(例如where、having、limit等).

我希望将
params
传递给
super()。这导致了同样的错误?在将其传递到
super()之前,您修改了
参数?