Sql Bigquery数组中不常见的元素
假设您有一列这样的数组,我尝试根据不常见元素的计数对行进行分组。一旦不同的不常见元素的数量达到5,它将进入下一组。 在下面的示例中,前三行将是组1,因为不常用元素为['3'、'4'、'6'、'7'],长度为4,但如果将下一行添加到组中,则不同的不常用元素数组将为['1'、'3'、'4'、'5'、'6'、'7'],它将超过5个不同的不常用元素的限制Sql Bigquery数组中不常见的元素,sql,google-bigquery,Sql,Google Bigquery,假设您有一列这样的数组,我尝试根据不常见元素的计数对行进行分组。一旦不同的不常见元素的数量达到5,它将进入下一组。 在下面的示例中,前三行将是组1,因为不常用元素为['3'、'4'、'6'、'7'],长度为4,但如果将下一行添加到组中,则不同的不常用元素数组将为['1'、'3'、'4'、'5'、'6'、'7'],它将超过5个不同的不常用元素的限制 with arr as ( select 1 ord, ['1','2','3','4'] as ar union all
with arr as (
select 1 ord, ['1','2','3','4'] as ar
union all
select 2, ['1','2','3']
union all
select 3,['1','2','6','7']
union all
select 4,['2','4','5','7']
union all
select 5, ['string1','5','6','7','8']
)
select * from arr
我正在寻找一个输出如下
到目前为止,我已经编写了代码,但肯定遗漏了一大块。添加它只是为了以防万一,如果它是有用的
with arr as (
select 1 ord, ['1','2','3','4'] as ar,1 subclass
union all
select 2, ['1','2','3'],1
union all
select 3,['1','2','6','7'],1
union all
select 4,['2','4','5','7'],1
union all
select 5, ['string1','5','6','7','8'],1
)
, history_t as (
select a.* ,
ARRAY_AGG(struct(ar)) OVER (PARTITION BY SUBCLASS ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as history
from arr a )
, tem2 as (
select a.* except(history,ar),
(SELECT COUNT(1) FROM UNNEST(history) AS col ) AS array_cnt
,b.ar unnest1
from history_t a
,unnest(history) b
)
, tem3 as (
select a.* except(unnest1),sku_lst
from tem2 a , unnest(unnest1) sku_lst
)
, all_sku_freq as (
select
ord, array_cnt , sku_lst , subclass,count(*) sku_freq
from tem3
group by 1,2,3,4 )
, uncommon_sku_cnt as (
select ord, subclass, count( sku_lst) uncommon_sku_count from all_sku_freq where sku_freq <> array_cnt group by 1,2 )
,rolling_uncomm_sku_cnt as (
select a.*, sum(uncommon_sku_count) over(partition by subclass order by ord asc range between unbounded preceding and current row ) roll_uncomm_sku_cnt
from uncommon_sku_cnt a
)
select a.* from rolling_uncomm_sku_cnt a
我这里有很多问题。1您说“下一行…”-是什么定义了这些行的顺序,以便您可以确定下一行是哪一行?添加了顺序列,看起来是否更好?2组中的行是连续行还是表中的任何行?确定。谢谢你解答我的问题。o:我明白你的意思,谢谢你看一眼