Sql Bigquery数组中不常见的元素

Sql Bigquery数组中不常见的元素,sql,google-bigquery,Sql,Google Bigquery,假设您有一列这样的数组,我尝试根据不常见元素的计数对行进行分组。一旦不同的不常见元素的数量达到5,它将进入下一组。 在下面的示例中,前三行将是组1,因为不常用元素为['3'、'4'、'6'、'7'],长度为4,但如果将下一行添加到组中,则不同的不常用元素数组将为['1'、'3'、'4'、'5'、'6'、'7'],它将超过5个不同的不常用元素的限制 with arr as ( select 1 ord, ['1','2','3','4'] as ar union all

假设您有一列这样的数组,我尝试根据不常见元素的计数对行进行分组。一旦不同的不常见元素的数量达到5,它将进入下一组。 在下面的示例中,前三行将是组1,因为不常用元素为['3'、'4'、'6'、'7'],长度为4,但如果将下一行添加到组中,则不同的不常用元素数组将为['1'、'3'、'4'、'5'、'6'、'7'],它将超过5个不同的不常用元素的限制

with arr as (
       select 1 ord, ['1','2','3','4'] as ar
       union all 
       select 2, ['1','2','3']
       union all 
       select 3,['1','2','6','7']
       union all 
       select 4,['2','4','5','7']
       union all 
       select 5, ['string1','5','6','7','8']
      )

      select * from arr
我正在寻找一个输出如下

到目前为止,我已经编写了代码,但肯定遗漏了一大块。添加它只是为了以防万一,如果它是有用的

with arr as (
       select 1 ord, ['1','2','3','4'] as ar,1 subclass
       union all 
       select 2, ['1','2','3'],1
       union all 
       select 3,['1','2','6','7'],1
       union all 
       select 4,['2','4','5','7'],1
       union all 
       select 5, ['string1','5','6','7','8'],1
      )
, history_t as (
      select a.* ,
     ARRAY_AGG(struct(ar)) OVER (PARTITION BY SUBCLASS ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as history
  
      from arr a ) 

, tem2 as (

 select a.* except(history,ar), 
 
 (SELECT COUNT(1) FROM UNNEST(history) AS col ) AS array_cnt
 ,b.ar unnest1
 from history_t a
 ,unnest(history) b 

) 

, tem3 as (
select a.* except(unnest1),sku_lst
from tem2 a , unnest(unnest1) sku_lst
) 

, all_sku_freq as (
select 
ord, array_cnt , sku_lst , subclass,count(*) sku_freq
 from tem3
 group by 1,2,3,4 ) 

, uncommon_sku_cnt as (
 select ord, subclass, count( sku_lst) uncommon_sku_count from all_sku_freq where sku_freq <> array_cnt group by 1,2 ) 

 ,rolling_uncomm_sku_cnt as (

select a.*, sum(uncommon_sku_count) over(partition by subclass order by ord asc range between unbounded preceding and current row  ) roll_uncomm_sku_cnt 
from uncommon_sku_cnt a
 ) 

 select a.* from rolling_uncomm_sku_cnt a 

我这里有很多问题。1您说“下一行…”-是什么定义了这些行的顺序,以便您可以确定下一行是哪一行?添加了顺序列,看起来是否更好?2组中的行是连续行还是表中的任何行?确定。谢谢你解答我的问题。o:我明白你的意思,谢谢你看一眼