Sql 以数组作为参数的雪花函数失败,子查询错误不受支持

Sql 以数组作为参数的雪花函数失败,子查询错误不受支持,sql,snowflake-cloud-data-platform,Sql,Snowflake Cloud Data Platform,我有一个事务表,我需要对类似的记录进行分组,对于可能唯一的类列,我需要从查找表(类表)中选择最顶层的值,这些值是类似记录的值(loopup表根据优先级排序) 从类中选择* 从T_数据中选择* 当我编写下面这样的查询时,它可以正常工作 选择最小值(数组构造('OMEGA','GAMMA','BETA') 当我在实际查询中使用它时,它会失败,并出现SQL编译错误:无法计算不支持的子查询类型 我期望从上面的8条记录中得到如下输出 select C_ID,P_ID,D_ID,S_ID,array_con

我有一个事务表,我需要对类似的记录进行分组,对于可能唯一的类列,我需要从查找表(类表)中选择最顶层的值,这些值是类似记录的值(loopup表根据优先级排序)

从类中选择*

从T_数据中选择*

当我编写下面这样的查询时,它可以正常工作

选择最小值(数组构造('OMEGA','GAMMA','BETA')

当我在实际查询中使用它时,它会失败,并出现SQL编译错误:无法计算不支持的子查询类型

我期望从上面的8条记录中得到如下输出

select C_ID,P_ID,D_ID,S_ID,array_construct(class_array) ca from (
        select C_ID,P_ID,D_ID,S_ID,arrayagg(class) class_array
        from t_data 
        group by C_ID,P_ID,D_ID,S_ID
    );

Output

C_ID    P_ID    D_ID    S_ID    CLASS_ARRAY
1101111 1404    564     1404    ["OMEGA", "GAMMA", "BETA"]
1101111 1404    599     1425    ["ALPHA", "GAMMA"]
1101111 1404    564     1425    ["ALPHA", "GAMMA", "OMEGA"]

When I use the min_value function on the above class_array that will return a single value based on the priority in the lookup table.

C_ID    P_ID    D_ID    S_ID    CLASS_ARRAY
1101111 1404    564     1404    BETA
1101111 1404    599     1425    ALPHA
1101111 1404    564     1425    ALPHA

请提供一些选项,以了解为什么该函数对于硬编码的值运行良好,但如果在查询中构造数组并作为参数传递,则会失败。

Snowflake在支持包含列定义中的某些SELECT模式的SQL语句时有一些限制。有几种方法可以重写上述查询以获得所需的结果:

1) 找到最小ID,然后连接回类表:

with T as (  
  select C_ID, P_ID, D_ID, S_ID, min(class.id) minclassid
  from t_data join class
     on class.name = t_data.class
  group by C_ID,P_ID,D_ID,S_ID
)
select C_ID, P_ID, D_ID, S_ID, class.name
from T join CLASS on minclassid = class.id;
2) 或者使用窗口功能获取组中按ID排序的第一个类名:

select distinct C_ID, P_ID, D_ID, S_ID, 
   first_value(class.name) over 
     (partition by C_ID, P_ID, D_ID, S_ID order by class.id) name
from t_data join class
on class.name = t_data.class;

这也可以通过使用过滤器来实现,该过滤器允许在选择阶段之后进行过滤,并且在结果中不显示该过滤器逻辑

with class as (
    select * from values
      (2, 'BETA'),
      (6, 'OMEGA'),
      (5, 'SIGMA'),
      (1, 'ALPHA'),
      (3, 'GAMMA'),
      (4, 'DELTA')  
      v(id, name)
), t_data as (
    select * from values
      (1101111, 1404, 564, 1404, 'BETA'),
      (1101111, 1404, 599, 1425, 'ALPHA'),
      (1101111, 1404, 564, 1404, 'OMEGA'),
      (1101111, 1404, 564, 1425, 'ALPHA'),
      (1101111, 1404, 564, 1404, 'GAMMA'),
      (1101111, 1404, 564, 1425, 'GAMMA'),
      (1101111, 1404, 599, 1425, 'GAMMA'),
      (1101111, 1404, 564, 1425, 'OMEGA')
      v(C_ID, P_ID, D_ID, S_ID, CLASS)
)
select c_id, p_id, d_id, s_id, d.class
from t_data d
join class c on d.class = c.name
qualify row_number() over (partition by c_id, p_id, d_id, s_id order by c.id) = 1;
给出:

C_ID    P_ID    D_ID    S_ID    CLASS
1101111 1404    564     1404    BETA
1101111 1404    564     1425    ALPHA
1101111 1404    599     1425    ALPHA
这与更显式/详细的表单相同:

select c_id, p_id, d_id, s_id, class from (
    select c_id, p_id, d_id, s_id, d.class
        ,row_number() over (partition by c_id, p_id, d_id, s_id order by c.id) as rn
    from t_data d
    join class c on d.class = c.name
)
where rn = 1;
这和斯图尔特的
DISTINCT

如果您真的想通过数组进行排序,您可以使用组内的
对数组进行排序(排序依据..
),然后您可以选择第一个对象,但是第一个值或限定方法应该更快。。但如果有其他原因需要保留阵列,这可能会有所帮助

select C_ID, P_ID, D_ID, S_ID, class_array[0] ca from (
    select C_ID, P_ID, D_ID, S_ID, arrayagg(class) within group (order by class.id) class_array
    from t_data
    join class on t_data.class = class.name
    group by C_ID,P_ID,D_ID,S_ID
);

谢谢你的建议,斯图尔特。我似乎没有明确提到的一件事是,在为每行创建的数组中的值列表中,我需要在查找中选择至少一个(循环表是根据优先级排序的)。我已经更新了帖子。谢谢Simeon!!质量选项非常有效。我需要为每行的值列表选择查找中的至少一个(循环表根据优先级排序)。qualify确实给出了正确的结果。
with class as (
    select * from values
      (2, 'BETA'),
      (6, 'OMEGA'),
      (5, 'SIGMA'),
      (1, 'ALPHA'),
      (3, 'GAMMA'),
      (4, 'DELTA')  
      v(id, name)
), t_data as (
    select * from values
      (1101111, 1404, 564, 1404, 'BETA'),
      (1101111, 1404, 599, 1425, 'ALPHA'),
      (1101111, 1404, 564, 1404, 'OMEGA'),
      (1101111, 1404, 564, 1425, 'ALPHA'),
      (1101111, 1404, 564, 1404, 'GAMMA'),
      (1101111, 1404, 564, 1425, 'GAMMA'),
      (1101111, 1404, 599, 1425, 'GAMMA'),
      (1101111, 1404, 564, 1425, 'OMEGA')
      v(C_ID, P_ID, D_ID, S_ID, CLASS)
)
select c_id, p_id, d_id, s_id, d.class
from t_data d
join class c on d.class = c.name
qualify row_number() over (partition by c_id, p_id, d_id, s_id order by c.id) = 1;
C_ID    P_ID    D_ID    S_ID    CLASS
1101111 1404    564     1404    BETA
1101111 1404    564     1425    ALPHA
1101111 1404    599     1425    ALPHA
select c_id, p_id, d_id, s_id, class from (
    select c_id, p_id, d_id, s_id, d.class
        ,row_number() over (partition by c_id, p_id, d_id, s_id order by c.id) as rn
    from t_data d
    join class c on d.class = c.name
)
where rn = 1;
select C_ID, P_ID, D_ID, S_ID, class_array[0] ca from (
    select C_ID, P_ID, D_ID, S_ID, arrayagg(class) within group (order by class.id) class_array
    from t_data
    join class on t_data.class = class.name
    group by C_ID,P_ID,D_ID,S_ID
);