Postgresql 连接大型表时postgres查询速度较慢_Postgresql_Query Optimization

Postgresql 连接大型表时postgres查询速度较慢

postgresql

Postgresql 连接大型表时postgres查询速度较慢,postgresql,query-optimization,Postgresql,Query Optimization,我有一个疑问，它的表现相当缓慢。我相信问题是我加入了几个大的表格，但我仍然希望有更好的表现。查询并解释分析如下： SELECT "m_advertsnapshot"."id", "m_advertsnapshot"."created", "m_advertsnapshot"."modified", "m_advertsnapshot"."snapshot_timestamp", "m_advertsnapshot"."source_name", C

我有一个疑问，它的表现相当缓慢。我相信问题是我加入了几个大的表格，但我仍然希望有更好的表现。查询并解释分析如下：

SELECT
    "m_advertsnapshot"."id",
    "m_advertsnapshot"."created",
    "m_advertsnapshot"."modified",
    "m_advertsnapshot"."snapshot_timestamp",
    "m_advertsnapshot"."source_name",
    COUNT(CASE m_advert.widget_listing_id IS NULL and m_advert.height IS NULL WHEN True THEN 1 ELSE null END) AS "adh_count_with_no_wl_and_missing_height",
    COUNT(CASE m_advert.widget_listing_id IS NULL and m_advert.height IS NOT NULL and m_advert.colour_id IS NOT NULL and m_advert.ctype IS NOT NULL WHEN True THEN 1 ELSE null END) AS "adh_count_with_no_wl_and_has_height_plate_ctype",
    COUNT(CASE m_advert.widget_listing_id IS NULL and m_advert.height IS NULL and m_advert.colour_id is NULL and m_advert.ctype is NULL  WHEN True THEN 1 ELSE null END) AS "adh_count_with_no_wl_and_missing_height_and_missing_plate_c268",
    COUNT("m_adverthistory"."id") AS "adh_count",
    COUNT(CASE m_advert.widget_listing_id IS NULL and m_advert.height IS NULL and m_advert.colour_id is NULL WHEN True THEN 1 ELSE null END) AS "adh_count_with_no_wl_and_missing_height_and_missing_plate",
    COUNT("m_advert"."widget_listing_id") AS "adh_count_with_wl"
FROM "m_advertsnapshot"
    LEFT OUTER JOIN "m_adverthistory" ON ("m_advertsnapshot"."id" = "m_adverthistory"."advert_snapshot_id")
    LEFT OUTER JOIN "m_advert" ON ("m_adverthistory"."advert_id" = "m_advert"."id")
GROUP BY
    "m_advertsnapshot"."id",
    "m_advertsnapshot"."created",
    "m_advertsnapshot"."modified",
    "m_advertsnapshot"."snapshot_timestamp",
    "m_advertsnapshot"."source_name"
ORDER BY
    "m_advertsnapshot"."snapshot_timestamp" DESC



"Sort  (cost=796180.41..796180.90 rows=196 width=72) (actual time=18051.504..18051.519 rows=196 loops=1)"
"  Sort Key: m_advertsnapshot.snapshot_timestamp"
"  Sort Method: quicksort  Memory: 60kB"
"  ->  HashAggregate  (cost=796170.99..796172.95 rows=196 width=72) (actual time=18051.330..18051.396 rows=196 loops=1)"
"        ->  Hash Right Join  (cost=227052.68..622950.33 rows=6298933 width=72) (actual time=2082.551..12166.226 rows=6298933 loops=1)"
"              Hash Cond: (m_adverthistory.advert_snapshot_id = m_advertsnapshot.id)"
"              ->  Hash Left Join  (cost=227045.27..536332.59 rows=6298933 width=24) (actual time=2082.483..9971.996 rows=6298933 loops=1)"
"                    Hash Cond: (m_adverthistory.advert_id = m_advert.id)"
"                    ->  Seq Scan on m_adverthistory  (cost=0.00..121858.33 rows=6298933 width=12) (actual time=0.003..1644.060 rows=6298933 loops=1)"
"                    ->  Hash  (cost=202575.12..202575.12 rows=1332812 width=20) (actual time=2080.897..2080.897 rows=1332812 loops=1)"
"                          Buckets: 2048  Batches: 128  Memory Usage: 525kB"
"                          ->  Seq Scan on m_advert  (cost=0.00..202575.12 rows=1332812 width=20) (actual time=0.007..1564.220 rows=1332812 loops=1)"
"              ->  Hash  (cost=4.96..4.96 rows=196 width=52) (actual time=0.062..0.062 rows=196 loops=1)"
"                    Buckets: 1024  Batches: 1  Memory Usage: 17kB"
"                    ->  Seq Scan on m_advertsnapshot  (cost=0.00..4.96 rows=196 width=52) (actual time=0.004..0.030 rows=196 loops=1)"
"Total runtime: 18051.730 ms"

使用postgres 9.2进行查询需要18秒。表格大小如下：

m_advertsnapshot - 196 rows
m_adverthistory - 6,298,933 rows
m_advert - 1,332,812 rows

DDLs：

如果您有任何关于如何提高该系统性能的想法，我们将不胜感激

谢谢

该模式看起来合理（对于查询，您实际上不需要索引，并且一些索引已经被FK约束覆盖）
连接表不需要代理键（但不会造成损害）
查询速度慢的真正原因是它需要所有表中的所有行来计算聚合。如果您需要100%的数据，索引就帮不了什么忙
添加额外的约束（例如快照时间戳>=某个日期）可能会导致使用索引的不同计划

对我来说，id（或advert_id}至少应该是{m_advert，m_advertsnapshot}主键的一部分。您的模式中有主键或外键吗？请向我们展示添加DDL的DDL。连接在主键/外键上。这些是由Django生成的（尽管我认为这没有什么区别）模式看起来合理（对于实际上不需要索引的查询，并且某些索引已经被FK约束覆盖），连接表不需要代理项（但不会造成损害）。查询速度慢的真正原因是，它需要所有表中的所有行来计算聚合。如果需要100%的数据索引，则不会有多大帮助。添加一个附加约束（例如快照上的时间戳>=某个日期）可能会导致使用索引的不同计划。您可以通过为该查询增加

work\u mem

来获得提升，为其哈希和排序提供更多空间。请在查询之前尝试

SET work\u mem='50MB'

，查看计划或性能是否发生更改。不要在

postgresql.c中设置此项onf

。您是否尝试更改历史记录中的两个索引，使它们同时为两个字段：广告id和广告快照id？有两个索引，两个字段（广告id、广告快照id）和（广告快照id、广告id）都可能会有所帮助，因为第二个键可以从索引本身拾取。

-- m_advertsnapshot

CREATE TABLE m_advertsnapshot
(
  id serial NOT NULL,
  snapshot_timestamp timestamp with time zone NOT NULL,
  source_name character varying(50),
  CONSTRAINT m_advertsnapshot_pkey PRIMARY KEY (id),
  CONSTRAINT m_advertsnapshot_source_name_6a9a437077520191_uniq UNIQUE (source_name, snapshot_timestamp)
)
WITH (
  OIDS=FALSE
);

CREATE INDEX m_advertsnapshot_snapshot_timestamp
  ON m_advertsnapshot
  USING btree
  (snapshot_timestamp);

-- m_adverthistory

CREATE TABLE m_adverthistory
(
  id serial NOT NULL,
  advert_id integer NOT NULL,
  advert_snapshot_id integer NOT NULL,
  observed_timestamp timestamp with time zone NOT NULL,
  CONSTRAINT m_adverthistory_pkey PRIMARY KEY (id),
  CONSTRAINT advert_id_refs_id_30735d9eef85241c FOREIGN KEY (advert_id)
      REFERENCES m_advert (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION DEFERRABLE INITIALLY DEFERRED,
  CONSTRAINT advert_snapshot_id_refs_id_55d3986f4f270624 FOREIGN KEY (advert_snapshot_id)
      REFERENCES m_advertsnapshot (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION DEFERRABLE INITIALLY DEFERRED,
  CONSTRAINT m_adverthistory_advert_id_13fa0dae39e78983_uniq UNIQUE (advert_id, advert_snapshot_id)
)
WITH (
  OIDS=FALSE
);

CREATE INDEX m_adverthistory_advert_id
  ON m_adverthistory
  USING btree
  (advert_id);

CREATE INDEX m_adverthistory_advert_snapshot_id
  ON m_adverthistory
  USING btree
  (advert_snapshot_id);

-- m_advert

CREATE TABLE m_advert
(
  id serial NOT NULL,
  widget_listing_id integer,
  height integer,
  ctype integer,
  colour_id integer,
  CONSTRAINT m_advert_pkey PRIMARY KEY (id),
  CONSTRAINT "colour_id_refs_id_1e4e2dac0183b419" FOREIGN KEY (colour_id)
      REFERENCES colour ("id") MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION DEFERRABLE INITIALLY DEFERRED,
  CONSTRAINT widget_listing_id_refs_id_5a7e62d0d4f48013 FOREIGN KEY (widget_listing_id)
      REFERENCES m_widgetlisting (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION DEFERRABLE INITIALLY DEFERRED,

)
WITH (
  OIDS=FALSE
);

CREATE INDEX m_advert_advert_seller_id
  ON m_advert
  USING btree
  (advert_seller_id);

CREATE INDEX m_advert_colour_id
  ON m_advert
  USING btree
  (colour_id);

CREATE INDEX m_advert_widget_listing_id
  ON m_advert
  USING btree
  (widget_listing_id);