Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/image-processing/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
特定列的postgresql(红移)最大值_Sql_Group By_Max_Amazon Redshift - Fatal编程技术网

特定列的postgresql(红移)最大值

特定列的postgresql(红移)最大值,sql,group-by,max,amazon-redshift,Sql,Group By,Max,Amazon Redshift,我在做红移——我有一张像这样的桌子 userid oid version number_of_objects 1 ab 1 10 1 ab 2 20 1 ab 3 17 1 ab 4 16 1 ab 5 14 1 cd 1 5 1 cd 2 6 1 cd 3 9 1 cd 4

我在做红移——我有一张像这样的桌子

userid  oid version number_of_objects
1       ab  1       10
1       ab  2       20
1       ab  3       17
1       ab  4       16
1       ab  5       14
1       cd  1       5
1       cd  2       6
1       cd  3       9
1       cd  4       12
2       ef  1       4
2       ef  2       3
2       gh  1       16
2       gh  2       12
2       gh  3       21
我想从此表中选择每个
oid
的最大版本号,并获取
userid
和行数

当我试着这样做时,不幸的是我把整张桌子都拿回来了:

SELECT MAX(version), oid, userid, number_of_objects
FROM table
GROUP BY oid, userid, number_of_objects
LIMIT 10;
但真正的结果,我想要的是:

userid  oid MAX(version)    number_of_objects
1       ab  5               14
1       cd  4               12
2       ef  2               3
2       gh  3               21
不知何故,distinct on也不起作用,它说:

不支持在上选择不同的

你知道吗


更新:在此期间,我提出了这个解决方案,但我觉得这不是最明智的解决方案。它也很慢。但它至少起作用了。以防万一:

SELECT * FROM table,
   (SELECT MAX(version) as maxversion, oid, userid
    FROM table
    GROUP BY oid, userid
    ) as maxtable
    WHERE  table.oid = maxtable.oid
   AND table.userid = maxtable.userid
   AND table.version = maxtable.version
LIMIT 100;

您有更好的解决方案吗?

如果redshift确实有窗口功能,您可以尝试以下方法:

SELECT * 
FROM (
  select oid, 
         userid, 
         version,
         max(version) over (partition by oid, userid) as max_version, 
  from the_table
) t
where version = max_version;
我希望这比使用
分组方式的自连接要快

另一个选项是使用
行号()
函数:

SELECT * 
FROM (
  select oid, 
         userid, 
         version,
         row_number() over (partition by oid, userid order by version desc) as rn, 
  from the_table
) t
where rn = 1;
使用哪一种更重要的是个人品味的问题。就性能而言,我不希望有什么不同

select      distinct
            first_value(userid) over(
                  partition by oid 
                  order by version desc
                  rows between unbounded preceding and unbounded following
                  ) as userid
            , oid
            , first_value(version) over(
                  partition by oid
                  order by version desc
                  rows between unbounded preceding and unbounded following
                  ) as max_version
            , first_value(number_of_objects) over(
                  partition by oid
                  order by version desc
                  rows between unbounded preceding and unbounded following
                  ) as number_of_objects

from        table
order by    oid;


如果
版本
可为空,请不要忘记将排序中的最后一个
设为空。

长话短说:马代表球场

作者的方法应该在较小的表上更快,并提取样本数据,但窗口方法在性能上更一致,在整个表上更快

下面是我在有17列、184 121 798行和12 809 740个唯一id(每个id平均有14个版本,但最多可以有40个)的表上所做的一些解释结果

快速总结:

Tomi的方法:成本=5983958.76..678016898538556.94(第一行为6*10^6,整表为7*10^13)

@一匹没有名字的马:成本=1000027117538.39..1000031720583.59(对于任何查询都是10^12)

@梅林:和上面的方法差不多

原始方法 因此,第一行和所有行的成本分别为5983958.76(6*10^6)和67801689853856.94(7*10^13)

一匹没有名字的马 由@a_horse_和_no_名称提供的两个解决方案几乎都有精确的计划,因此我将只粘贴其中一个

explain
SELECT * 
FROM (
  select *,
         row_number() over (partition by id order by version desc) as rn
  from table
)
where rn = 1;
给予

梅林方法 @Merlin提供的解决方案似乎不完整,因为它不会返回最新版本的所有值,但它的性能与第二个选项类似

explain
select      distinct
              id
            , first_value(version) over(
                  partition by id
                  order by version desc
                  rows between unbounded preceding and unbounded following
                  ) as max_version
            , first_value(additional_col) over(
                  partition by id
                  order by version desc
                  rows between unbounded preceding and unbounded following
                  ) as additional_col

from        table t;
给予


只需在(oid)上使用
DISTINCT
orderbyOID,version DESC
我希望Tomi的SQL能够工作(它在Oracle dbms中工作)。这是PostgreSQL的一个缺陷/限制吗?@Klaslindb228 ck:是的,它在Postgres中也“有效”(比如:“它执行”)。问题是,由于“分组依据”返回了oid/userid的多行,而Tomi只希望每个oid/userid组合有一行。是的,它说“错误:不支持SELECT DISTINCT ON[SQL State=0A000]”,我想这是因为我在使用红移。@TomiMester:红移支持窗口函数吗?谢谢!对不起,但对我来说,这比我的解决方法慢(真正的桌子需要20多分钟,所以我跳过了跑步……不过谢谢你的努力。
explain
SELECT * 
FROM (
  select *,
         row_number() over (partition by id order by version desc) as rn
  from table
)
where rn = 1;
  Filter: (rn = 1)
  ->  XN Window  (cost=1000027117538.39..1000029419060.99 rows=184121808 width=44)
        Partition: id
        Order: version
        ->  XN Sort  (cost=1000027117538.39..1000027577842.91 rows=184121808 width=44)
              Sort Key: id, version
              ->  XN Seq Scan on table  (cost=0.00..1841218.08 rows=184121808 width=44)
explain
select      distinct
              id
            , first_value(version) over(
                  partition by id
                  order by version desc
                  rows between unbounded preceding and unbounded following
                  ) as max_version
            , first_value(additional_col) over(
                  partition by id
                  order by version desc
                  rows between unbounded preceding and unbounded following
                  ) as additional_col

from        table t;
XN Unique  (cost=1000027117538.39..1000032180888.11 rows=184121808 width=84)
  ->  XN Window  (cost=1000027117538.39..1000030799974.55 rows=184121808 width=84)
        Partition: id
        Order: version
        ->  XN Sort  (cost=1000027117538.39..1000027577842.91 rows=184121808 width=84)
              Sort Key: id, version
              ->  XN Seq Scan on table  (cost=0.00..1841218.08 rows=184121808 width=84)