Postgresql 用更有效的方法来计算单独分组的列,替换几十个联接
有一个表(在PostgreSQL中)有多个列(从n1到n10),每个列的每一行都包含单独的数字(为了简单起见,在下面的示例中,数字是1、2和3)。表中有数千行。表中数据摘录:Postgresql 用更有效的方法来计算单独分组的列,替换几十个联接,postgresql,Postgresql,有一个表(在PostgreSQL中)有多个列(从n1到n10),每个列的每一行都包含单独的数字(为了简单起见,在下面的示例中,数字是1、2和3)。表中有数千行。表中数据摘录: +----+----+----+------+-----+ | n1 | n2 | n3 | n... | n10 | +----+----+----+------+-----+ | 3 | 2 | 1 | 1 | 1 | | 2 | 1 | 2 | 2 | 3 | | 1 | 1 |
+----+----+----+------+-----+
| n1 | n2 | n3 | n... | n10 |
+----+----+----+------+-----+
| 3 | 2 | 1 | 1 | 1 |
| 2 | 1 | 2 | 2 | 3 |
| 1 | 1 | 2 | 3 | 1 |
| 2 | 3 | 1 | 1 | 2 |
| 3 | 2 | 1 | 1 | 2 |
| 1 | 3 | 1 | 3 | 3 |
| 2 | 3 | 1 | 3 | 3 |
| 1 | 1 | 3 | 3 | 1 |
| 3 | 2 | 3 | 1 | 2 |
| 2 | 1 | 2 | 2 | 1 |
+----+----+----+------+-----+
我要做的是计算每一列中的各个数字,这样结果表如下所示:
+--------+----------+----------+----------+------------+-----------+
| number | n1_count | n2_count | n3_count | n..._count | n10_count |
+--------+----------+----------+----------+------------+-----------+
| 1 | 3 | 4 | 5 | 4 | 4 |
| 2 | 4 | 3 | 3 | 2 | 3 |
| 3 | 3 | 3 | 2 | 4 | 3 |
+--------+----------+----------+----------+------------+-----------+
我通过使用多个左连接成功地实现了这一点:
SELECT number, n1_count, n2_count, n3_count, n..._count, n10_count FROM
(VALUES ('1'), ('2'), ('3') AS t (number)
LEFT JOIN
(SELECT n1, COUNT(n1) AS n1_count FROM table GROUP BY n1) AS n1_ ON n1 = number
LEFT JOIN
(SELECT n2, COUNT(d2) AS n2_count FROM table GROUP BY n2) AS n2_ ON n2 = number
LEFT JOIN
(SELECT n3, COUNT(d3) AS n3_count FROM table GROUP BY n3) AS n3_ ON n3 = number
LEFT JOIN
(SELECT n..., COUNT(d4) AS n..._count FROM table GROUP BY n...) AS n..._ ON n... = number
LEFT JOIN
(SELECT n10, COUNT(d5) AS n10_count FROM table GROUP BY n10) AS n10_ ON n10 = number;
但是最终的查询(10个左连接)看起来非常庞大和复杂,所以我想知道是否可以通过更优雅和高效的方式实现相同的结果?请给我指出我的选项。您可以使用过滤聚合和使用数组的单个连接:
select t.number,
count(*) filter (where tt.n1 = t.number) as n1_count,
count(*) filter (where tt.n2 = t.number) as n2_count,
count(*) filter (where tt.n3 = t.number) as n3_count,
count(*) filter (where tt.n4 = t.number) as n4_count
from the_table tt
join (values (1),(2),(3) ) as t(number) on t.number = any(array[tt.n1,tt.n2,tt.n3,tt.n4])
group by t.number;
谢谢你的解决方案,它正在工作,我不知道这种方式。但不幸的是,查询需要42毫秒才能完成,而我的初始查询需要14毫秒。尽管您的解决方案看起来更简洁,但它并没有那么有效。。。