Sql 如何优化此查询(或者有更好的方法)?

Sql 如何优化此查询(或者有更好的方法)?,sql,postgresql,postgresql-9.2,Sql,Postgresql,Postgresql 9.2,最近,我和一些人向我提供了一个可行的解决方案,但我忘了提到我的表有数百万行—表项为1000万行,其他表为100万行—也许他们认为我使用的是一个小数据集,如我提供的示例所示 以下是SQL: WITH a AS ( SELECT item.id, string_agg(prefered_store.store::varchar, ',') wishlist_stores FROM item, list_wishlist, wishlist, prefered_store WHERE it

最近,我和一些人向我提供了一个可行的解决方案,但我忘了提到我的表有数百万行—表项为1000万行,其他表为100万行—也许他们认为我使用的是一个小数据集,如我提供的示例所示

以下是SQL:

WITH a AS (
  SELECT item.id, string_agg(prefered_store.store::varchar, ',') wishlist_stores
  FROM item, list_wishlist, wishlist, prefered_store
  WHERE item.list=list_wishlist.list
    AND list_wishlist.wishlist=wishlist.id
    AND wishlist.prefered_stores=prefered_store.id
  GROUP BY item.id
), b AS (
  SELECT item.id, 
    string_agg(
      prefered_store.store::varchar || ',' || prefered_store.comment,
      ' ; ') item_stores_comments
    FROM item, prefered_store
    WHERE item.prefered_stores=prefered_store.id
    GROUP BY item.id
)
SELECT a.id,item_stores_comments,wishlist_stores 
FROM a,b
WHERE a.id=b.id
虽然它完全符合我的需要,但速度非常慢。仅限一排大约需要10分钟,10排大约需要15分钟。我还在等着看一千行要花多长时间,已经快一个小时了。现在我的桌面不是最快的:奔腾4,内存为1.5GB,但感觉还是不太好

我已经为WHERE子句中的所有字段编制了索引,并在需要的地方创建了主键。 除此之外,还有什么方法可以让这个查询运行得更快吗

PostgreSQL 9.2

DDL:

仅包含相关表格和字段的简单图表:

解释和分析:

Merge Join  (cost=23342752.95..12971604557.95 rows=863210883998 width=68) (actual time=1182616.544..1251542.167 rows=13139337 loops=1)
  Merge Cond: (a.id = b.id)
  CTE a
    ->  GroupAggregate  (cost=8477658.65..8992463.86 rows=13139337 width=8) (actual time=252170.500..307061.559 rows=13139337 loops=1)
          ->  Sort  (cost=8477658.65..8547771.35 rows=28045080 width=8) (actual time=252170.391..282495.516 rows=14870222 loops=1)
                Sort Key: public.item.id
                Sort Method: external merge  Disk: 261528kB
                ->  Merge Join  (cost=3010452.34..3474579.76 rows=28045080 width=8) (actual time=138444.102..210768.838 rows=14870222 loops=1)
                      Merge Cond: (list_wishlist.list = public.item.list)
                      ->  Sort  (cost=689954.53..695268.01 rows=2125390 width=8) (actual time=30482.447..55193.049 rows=1286901 loops=1)
                            Sort Key: list_wishlist.list
                            Sort Method: external merge  Disk: 22624kB
                            ->  Hash Join  (cost=66643.55..408462.52 rows=2125390 width=8) (actual time=10417.244..26147.517 rows=1286901 loops=1)
                                  Hash Cond: (wishlist.prefered_stores = public.prefered_store.id)
                                  ->  Hash Join  (cost=38565.70..96225.43 rows=1226863 width=8) (actual time=8188.097..19815.024 rows=1226863 loops=1)
                                        Hash Cond: (list_wishlist.wishlist = wishlist.id)
                                        ->  Seq Scan on list_wishlist  (cost=0.00..22266.63 rows=1226863 width=8) (actual time=12.786..7467.442 rows=1226863 loops=1)
                                        ->  Hash  (cost=20352.20..20352.20 rows=1110120 width=8) (actual time=7314.531..7314.531 rows=1110087 loops=1)
                                              Buckets: 4096  Batches: 64  Memory Usage: 689kB
                                              ->  Seq Scan on wishlist  (cost=0.00..20352.20 rows=1110120 width=8) (actual time=7.621..6572.731 rows=1110087 loops=1)
                                  ->  Hash  (cost=14027.49..14027.49 rows=856349 width=8) (actual time=2159.339..2159.339 rows=856349 loops=1)
                                        Buckets: 4096  Batches: 64  Memory Usage: 536kB
                                        ->  Seq Scan on prefered_store  (cost=0.00..14027.49 rows=856349 width=8) (actual time=0.071..1602.971 rows=856349 loops=1)
                      ->  Materialize  (cost=2320484.45..2386181.13 rows=13139337 width=8) (actual time=107961.603..149020.809 rows=14870219 loops=1)
                            ->  Sort  (cost=2320484.45..2353332.79 rows=13139337 width=8) (actual time=107961.575..145971.848 rows=13139337 loops=1)
                                  Sort Key: public.item.list
                                  Sort Method: external merge  Disk: 231088kB
                                  ->  Seq Scan on item  (cost=0.00..228006.37 rows=13139337 width=8) (actual time=27.636..47661.750 rows=13139337 loops=1)
  CTE b
    ->  GroupAggregate  (cost=7166704.38..7843349.46 rows=13139337 width=12) (actual time=524258.000..794537.585 rows=13139337 loops=1)
          ->  Sort  (cost=7166704.38..7223638.09 rows=22773483 width=12) (actual time=524257.908..755765.703 rows=13858612 loops=1)
                Sort Key: public.item.id
                Sort Method: external merge  Disk: 297912kB
                ->  Merge Join  (cost=2448353.26..2826901.79 rows=22773483 width=12) (actual time=201205.036..425873.108 rows=13858612 loops=1)
                      Merge Cond: (public.prefered_store.id = public.item.prefered_stores)
                      ->  Sort  (cost=127685.43..129826.31 rows=856349 width=12) (actual time=4545.447..12507.054 rows=856346 loops=1)
                            Sort Key: public.prefered_store.id
                            Sort Method: external merge  Disk: 18408kB
                            ->  Seq Scan on prefered_store  (cost=0.00..14027.49 rows=856349 width=12) (actual time=0.060..2707.353 rows=856349 loops=1)
                      ->  Materialize  (cost=2320484.45..2386181.13 rows=13139337 width=8) (actual time=196659.554..406944.706 rows=13858611 loops=1)
                            ->  Sort  (cost=2320484.45..2353332.79 rows=13139337 width=8) (actual time=196659.535..396917.629 rows=13139337 loops=1)
                                  Sort Key: public.item.prefered_stores
                                  Sort Method: external merge  Disk: 231096kB
                                  ->  Seq Scan on item  (cost=0.00..228006.37 rows=13139337 width=8) (actual time=0.032..54885.583 rows=13139337 loops=1)
  ->  Sort  (cost=3253469.82..3286318.16 rows=13139337 width=36) (actual time=344329.838..353118.692 rows=13139337 loops=1)
        Sort Key: a.id
        Sort Method: external sort  Disk: 259792kB
        ->  CTE Scan on a  (cost=0.00..262786.74 rows=13139337 width=36) (actual time=252170.512..320132.738 rows=13139337 loops=1)
  ->  Materialize  (cost=3253469.82..3319166.50 rows=13139337 width=36) (actual time=838286.670..888495.578 rows=13139337 loops=1)
        ->  Sort  (cost=3253469.82..3286318.16 rows=13139337 width=36) (actual time=838286.652..886198.912 rows=13139337 loops=1)
              Sort Key: b.id
              Sort Method: external sort  Disk: 385320kB
              ->  CTE Scan on b  (cost=0.00..262786.74 rows=13139337 width=36) (actual time=524258.017..811717.462 rows=13139337 loops=1)
Total runtime: 1253101.865 ms

因为你用的是和? 为什么不使用UNION? 如果没有ForeignKey空表,可以使用LEFT JOIN,但这是一个快速比较


如果您发布一个图表,我可以重做SQL:D,在SQL中构建逗号分隔值的列几乎总是错误的方法。最好返回数据行,并让应用程序代码处理显示格式

通过删除string_agg函数和GROUP by子句进行测试

with a as (
  select item.id, 
         prefered_store.store wishlist_stores
  from item
  inner join list_wishlist on item.list=list_wishlist.list
  inner join wishlist on list_wishlist.wishlist=wishlist.id
  inner join prefered_store on wishlist.prefered_stores=prefered_store.id
), b as (
  select item.id, 
         prefered_store.store,
         prefered_store.comment item_stores_comments
    from item
    inner join prefered_store on item.prefered_stores=prefered_store.id
)
select * from a
inner join b on a.id = b.id
您发布的SQL FIDLE没有多大用处。它没有主键,没有辅助索引,也没有足够的行来避免顺序扫描。

请参见,在PostgreSQL中,SQL充当了一道优化屏障。因此,无论将哪个谓词添加到查询的外部部分,PostgreSQL都将执行WITH子句的完整遍历。所以要优化,你需要去掉CTE

这并不容易,因为您希望存储表是非规范化的

请在以下位置尝试此查询:

选择i.id项, 选择字符串| | |',| |注释';' 从id=i的首选项存储区。首选项存储区项目存储区注释, 字符串\u aggps.store::text','whishlist\u stores 来自项目一 加入列表\在lw上列出lw.list=i.list 在w.id=lw.wishlist上加入愿望列表w 在ps.id=w上加入首选门店ps 按i.id分组;
但我建议您检查模式设计。

您不能直观地分析SQL;你必须使用测试工具。从开始。编辑您的问题,并粘贴EXPLAIN ANALYZE的输出。1您完全没有首选存储表的可用键。2重复产品{preferred_store*item},这也可以压缩到一个公共表表达式中。目前,您正在有效地进行两次平方全表扫描。生成的哈希表太大,无法装入核心并溢出到磁盘。是否介意将这些表的SQL DDL粘贴到问题中?缺少存储表和注释表。不能简单地用UNION替换具有的公用表表达式。这是两件完全不同的事情。我同意你的看法,但他似乎是在用with来装十字架JOIN@ofko,请你解释一下,分析一下,缓冲区。。。请从您的架构中删除此查询的一部分?当我无限制地尝试时,它没有完成。我记性不好,我想我应该把它分成一百万行左右,或者你有什么建议吗?