特定列的postgresql（红移）最大值_Sql_Group By_Max_Amazon Redshift

特定列的postgresql（红移）最大值

sql amazon-redshift

特定列的postgresql（红移）最大值,sql,group-by,max,amazon-redshift,Sql,Group By,Max,Amazon Redshift,我在做红移——我有一张像这样的桌子 userid oid version number_of_objects 1 ab 1 10 1 ab 2 20 1 ab 3 17 1 ab 4 16 1 ab 5 14 1 cd 1 5 1 cd 2 6 1 cd 3 9 1 cd 4

我在做红移——我有一张像这样的桌子

userid  oid version number_of_objects
1       ab  1       10
1       ab  2       20
1       ab  3       17
1       ab  4       16
1       ab  5       14
1       cd  1       5
1       cd  2       6
1       cd  3       9
1       cd  4       12
2       ef  1       4
2       ef  2       3
2       gh  1       16
2       gh  2       12
2       gh  3       21

我想从此表中选择每个

oid

的最大版本号，并获取

userid

和行数

当我试着这样做时，不幸的是我把整张桌子都拿回来了：

SELECT MAX(version), oid, userid, number_of_objects
FROM table
GROUP BY oid, userid, number_of_objects
LIMIT 10;

但真正的结果，我想要的是：

userid  oid MAX(version)    number_of_objects
1       ab  5               14
1       cd  4               12
2       ef  2               3
2       gh  3               21

不知何故，distinct on也不起作用，它说：

不支持在上选择不同的

你知道吗

更新：在此期间，我提出了这个解决方案，但我觉得这不是最明智的解决方案。它也很慢。但它至少起作用了。以防万一：

SELECT * FROM table,
   (SELECT MAX(version) as maxversion, oid, userid
    FROM table
    GROUP BY oid, userid
    ) as maxtable
    WHERE  table.oid = maxtable.oid
   AND table.userid = maxtable.userid
   AND table.version = maxtable.version
LIMIT 100;

您有更好的解决方案吗？

如果redshift确实有窗口功能，您可以尝试以下方法：

SELECT * 
FROM (
  select oid, 
         userid, 
         version,
         max(version) over (partition by oid, userid) as max_version, 
  from the_table
) t
where version = max_version;

我希望这比使用

分组方式的自连接要快
另一个选项是使用行号（）
函数：
SELECT * 
FROM (
  select oid, 
         userid, 
         version,
         row_number() over (partition by oid, userid order by version desc) as rn, 
  from the_table
) t
where rn = 1;

使用哪一种更重要的是个人品味的问题。就性能而言，我不希望有什么不同
select      distinct
            first_value(userid) over(
                  partition by oid 
                  order by version desc
                  rows between unbounded preceding and unbounded following
                  ) as userid
            , oid
            , first_value(version) over(
                  partition by oid
                  order by version desc
                  rows between unbounded preceding and unbounded following
                  ) as max_version
            , first_value(number_of_objects) over(
                  partition by oid
                  order by version desc
                  rows between unbounded preceding and unbounded following
                  ) as number_of_objects

from        table
order by    oid;



如果版本
可为空，请不要忘记将排序中的最后一个

设为空。

长话短说：马代表球场

作者的方法应该在较小的表上更快，并提取样本数据，但窗口方法在性能上更一致，在整个表上更快

下面是我在有17列、184 121 798行和12 809 740个唯一id（每个id平均有14个版本，但最多可以有40个）的表上所做的一些解释结果

快速总结：

Tomi的方法：成本=5983958.76..678016898538556.94（第一行为6*10^6，整表为7*10^13）

@一匹没有名字的马：成本=1000027117538.39..1000031720583.59（对于任何查询都是10^12）

@梅林：和上面的方法差不多

原始方法因此，第一行和所有行的成本分别为5983958.76（6*10^6）和67801689853856.94（7*10^13）

一匹没有名字的马由@a_horse_和_no_名称提供的两个解决方案几乎都有精确的计划，因此我将只粘贴其中一个

explain
SELECT * 
FROM (
  select *,
         row_number() over (partition by id order by version desc) as rn
  from table
)
where rn = 1;

给予

梅林方法 @Merlin提供的解决方案似乎不完整，因为它不会返回最新版本的所有值，但它的性能与第二个选项类似

explain
select      distinct
              id
            , first_value(version) over(
                  partition by id
                  order by version desc
                  rows between unbounded preceding and unbounded following
                  ) as max_version
            , first_value(additional_col) over(
                  partition by id
                  order by version desc
                  rows between unbounded preceding and unbounded following
                  ) as additional_col

from        table t;

给予

只需在（oid）上使用

DISTINCT

和

orderbyOID，version DESC

我希望Tomi的SQL能够工作（它在Oracle dbms中工作）。这是PostgreSQL的一个缺陷/限制吗？@Klaslindb228 ck：是的，它在Postgres中也“有效”（比如：“它执行”）。问题是，由于“分组依据”返回了oid/userid的多行，而Tomi只希望每个oid/userid组合有一行。是的，它说“错误：不支持SELECT DISTINCT ON[SQL State=0A000]”，我想这是因为我在使用红移。@TomiMester:红移支持窗口函数吗？谢谢！对不起，但对我来说，这比我的解决方法慢（真正的桌子需要20多分钟，所以我跳过了跑步……不过谢谢你的努力。

explain
SELECT * 
FROM (
  select *,
         row_number() over (partition by id order by version desc) as rn
  from table
)
where rn = 1;

  Filter: (rn = 1)
  ->  XN Window  (cost=1000027117538.39..1000029419060.99 rows=184121808 width=44)
        Partition: id
        Order: version
        ->  XN Sort  (cost=1000027117538.39..1000027577842.91 rows=184121808 width=44)
              Sort Key: id, version
              ->  XN Seq Scan on table  (cost=0.00..1841218.08 rows=184121808 width=44)

explain
select      distinct
              id
            , first_value(version) over(
                  partition by id
                  order by version desc
                  rows between unbounded preceding and unbounded following
                  ) as max_version
            , first_value(additional_col) over(
                  partition by id
                  order by version desc
                  rows between unbounded preceding and unbounded following
                  ) as additional_col

from        table t;

XN Unique  (cost=1000027117538.39..1000032180888.11 rows=184121808 width=84)
  ->  XN Window  (cost=1000027117538.39..1000030799974.55 rows=184121808 width=84)
        Partition: id
        Order: version
        ->  XN Sort  (cost=1000027117538.39..1000027577842.91 rows=184121808 width=84)
              Sort Key: id, version
              ->  XN Seq Scan on table  (cost=0.00..1841218.08 rows=184121808 width=84)