特定列的postgresql(红移)最大值
我在做红移——我有一张像这样的桌子特定列的postgresql(红移)最大值,sql,group-by,max,amazon-redshift,Sql,Group By,Max,Amazon Redshift,我在做红移——我有一张像这样的桌子 userid oid version number_of_objects 1 ab 1 10 1 ab 2 20 1 ab 3 17 1 ab 4 16 1 ab 5 14 1 cd 1 5 1 cd 2 6 1 cd 3 9 1 cd 4
userid oid version number_of_objects
1 ab 1 10
1 ab 2 20
1 ab 3 17
1 ab 4 16
1 ab 5 14
1 cd 1 5
1 cd 2 6
1 cd 3 9
1 cd 4 12
2 ef 1 4
2 ef 2 3
2 gh 1 16
2 gh 2 12
2 gh 3 21
我想从此表中选择每个oid
的最大版本号,并获取userid
和行数
当我试着这样做时,不幸的是我把整张桌子都拿回来了:
SELECT MAX(version), oid, userid, number_of_objects
FROM table
GROUP BY oid, userid, number_of_objects
LIMIT 10;
但真正的结果,我想要的是:
userid oid MAX(version) number_of_objects
1 ab 5 14
1 cd 4 12
2 ef 2 3
2 gh 3 21
不知何故,distinct on也不起作用,它说:
不支持在上选择不同的
你知道吗
更新:在此期间,我提出了这个解决方案,但我觉得这不是最明智的解决方案。它也很慢。但它至少起作用了。以防万一:
SELECT * FROM table,
(SELECT MAX(version) as maxversion, oid, userid
FROM table
GROUP BY oid, userid
) as maxtable
WHERE table.oid = maxtable.oid
AND table.userid = maxtable.userid
AND table.version = maxtable.version
LIMIT 100;
您有更好的解决方案吗?如果redshift确实有窗口功能,您可以尝试以下方法:
SELECT *
FROM (
select oid,
userid,
version,
max(version) over (partition by oid, userid) as max_version,
from the_table
) t
where version = max_version;
我希望这比使用分组方式的自连接要快
另一个选项是使用行号()
函数:
SELECT *
FROM (
select oid,
userid,
version,
row_number() over (partition by oid, userid order by version desc) as rn,
from the_table
) t
where rn = 1;
使用哪一种更重要的是个人品味的问题。就性能而言,我不希望有什么不同
select distinct
first_value(userid) over(
partition by oid
order by version desc
rows between unbounded preceding and unbounded following
) as userid
, oid
, first_value(version) over(
partition by oid
order by version desc
rows between unbounded preceding and unbounded following
) as max_version
, first_value(number_of_objects) over(
partition by oid
order by version desc
rows between unbounded preceding and unbounded following
) as number_of_objects
from table
order by oid;
如果版本
可为空,请不要忘记将排序中的最后一个
设为空。长话短说:马代表球场
作者的方法应该在较小的表上更快,并提取样本数据,但窗口方法在性能上更一致,在整个表上更快
下面是我在有17列、184 121 798行和12 809 740个唯一id(每个id平均有14个版本,但最多可以有40个)的表上所做的一些解释结果
快速总结:
Tomi的方法:成本=5983958.76..678016898538556.94(第一行为6*10^6,整表为7*10^13)
@一匹没有名字的马:成本=1000027117538.39..1000031720583.59(对于任何查询都是10^12)
@梅林:和上面的方法差不多
原始方法
因此,第一行和所有行的成本分别为5983958.76(6*10^6)和67801689853856.94(7*10^13)
一匹没有名字的马
由@a_horse_和_no_名称提供的两个解决方案几乎都有精确的计划,因此我将只粘贴其中一个
explain
SELECT *
FROM (
select *,
row_number() over (partition by id order by version desc) as rn
from table
)
where rn = 1;
给予
梅林方法
@Merlin提供的解决方案似乎不完整,因为它不会返回最新版本的所有值,但它的性能与第二个选项类似
explain
select distinct
id
, first_value(version) over(
partition by id
order by version desc
rows between unbounded preceding and unbounded following
) as max_version
, first_value(additional_col) over(
partition by id
order by version desc
rows between unbounded preceding and unbounded following
) as additional_col
from table t;
给予
只需在(oid)上使用
DISTINCT
和orderbyOID,version DESC
我希望Tomi的SQL能够工作(它在Oracle dbms中工作)。这是PostgreSQL的一个缺陷/限制吗?@Klaslindb228 ck:是的,它在Postgres中也“有效”(比如:“它执行”)。问题是,由于“分组依据”返回了oid/userid的多行,而Tomi只希望每个oid/userid组合有一行。是的,它说“错误:不支持SELECT DISTINCT ON[SQL State=0A000]”,我想这是因为我在使用红移。@TomiMester:红移支持窗口函数吗?谢谢!对不起,但对我来说,这比我的解决方法慢(真正的桌子需要20多分钟,所以我跳过了跑步……不过谢谢你的努力。
explain
SELECT *
FROM (
select *,
row_number() over (partition by id order by version desc) as rn
from table
)
where rn = 1;
Filter: (rn = 1)
-> XN Window (cost=1000027117538.39..1000029419060.99 rows=184121808 width=44)
Partition: id
Order: version
-> XN Sort (cost=1000027117538.39..1000027577842.91 rows=184121808 width=44)
Sort Key: id, version
-> XN Seq Scan on table (cost=0.00..1841218.08 rows=184121808 width=44)
explain
select distinct
id
, first_value(version) over(
partition by id
order by version desc
rows between unbounded preceding and unbounded following
) as max_version
, first_value(additional_col) over(
partition by id
order by version desc
rows between unbounded preceding and unbounded following
) as additional_col
from table t;
XN Unique (cost=1000027117538.39..1000032180888.11 rows=184121808 width=84)
-> XN Window (cost=1000027117538.39..1000030799974.55 rows=184121808 width=84)
Partition: id
Order: version
-> XN Sort (cost=1000027117538.39..1000027577842.91 rows=184121808 width=84)
Sort Key: id, version
-> XN Seq Scan on table (cost=0.00..1841218.08 rows=184121808 width=84)