Postgresql Postgres用于排序和连接的适当索引_Postgresql

Postgresql Postgres用于排序和连接的适当索引

postgresql

Postgresql Postgres用于排序和连接的适当索引,postgresql,Postgresql,我有一个简单的模式和查询，但在某些参数下，我的性能一直很糟糕模式： CREATE TABLE locations ( id integer NOT NULL, barcode_id integer NOT NULL ); CREATE TABLE barcodes ( id integer NOT NULL, value citext NOT NULL ); ALTER TABLE ONLY locations ADD CONSTRAINT locations_pkey P

我有一个简单的模式和查询，但在某些参数下，我的性能一直很糟糕

模式：

CREATE TABLE locations (
  id integer NOT NULL,
  barcode_id integer NOT NULL
);

CREATE TABLE barcodes (
  id integer NOT NULL,
  value citext NOT NULL
);

ALTER TABLE ONLY locations ADD CONSTRAINT locations_pkey PRIMARY KEY (id);
ALTER TABLE ONLY barcodes ADD CONSTRAINT barcodes_pkey PRIMARY KEY (id);
ALTER TABLE ONLY locations ADD CONSTRAINT fk_locations_barcodes FOREIGN KEY (barcode_id) REFERENCES barcodes(id);

CREATE INDEX index_barcodes_on_value ON barcodes (value);
CREATE INDEX index_locations_on_barcode_id ON locations (barcode_id);

EXPLAIN ANALYZE
SELECT *
FROM locations
JOIN barcodes ON locations.barcode_id = barcodes.id
ORDER BY barcodes.value ASC
LIMIT 50;

Limit  (cost=0.71..3564.01 rows=50 width=34) (actual time=0.043..683.025 rows=50 loops=1)
  ->  Nested Loop  (cost=0.71..4090955.00 rows=57404 width=34) (actual time=0.043..683.017 rows=50 loops=1)
        ->  Index Scan using index_barcodes_on_value on barcodes  (cost=0.42..26865.99 rows=496422 width=15) (actual time=0.023..218.775 rows=372138 loops=1)
        ->  Index Scan using index_locations_on_barcode_id on locations  (cost=0.29..5.32 rows=287 width=8) (actual time=0.001..0.001 rows=0 loops=372138)
              Index Cond: (barcode_id = barcodes.id)
Planning time: 0.167 ms
Execution time: 683.078 ms

查询：

CREATE TABLE locations (
  id integer NOT NULL,
  barcode_id integer NOT NULL
);

CREATE TABLE barcodes (
  id integer NOT NULL,
  value citext NOT NULL
);

ALTER TABLE ONLY locations ADD CONSTRAINT locations_pkey PRIMARY KEY (id);
ALTER TABLE ONLY barcodes ADD CONSTRAINT barcodes_pkey PRIMARY KEY (id);
ALTER TABLE ONLY locations ADD CONSTRAINT fk_locations_barcodes FOREIGN KEY (barcode_id) REFERENCES barcodes(id);

CREATE INDEX index_barcodes_on_value ON barcodes (value);
CREATE INDEX index_locations_on_barcode_id ON locations (barcode_id);

EXPLAIN ANALYZE
SELECT *
FROM locations
JOIN barcodes ON locations.barcode_id = barcodes.id
ORDER BY barcodes.value ASC
LIMIT 50;

Limit  (cost=0.71..3564.01 rows=50 width=34) (actual time=0.043..683.025 rows=50 loops=1)
  ->  Nested Loop  (cost=0.71..4090955.00 rows=57404 width=34) (actual time=0.043..683.017 rows=50 loops=1)
        ->  Index Scan using index_barcodes_on_value on barcodes  (cost=0.42..26865.99 rows=496422 width=15) (actual time=0.023..218.775 rows=372138 loops=1)
        ->  Index Scan using index_locations_on_barcode_id on locations  (cost=0.29..5.32 rows=287 width=8) (actual time=0.001..0.001 rows=0 loops=372138)
              Index Cond: (barcode_id = barcodes.id)
Planning time: 0.167 ms
Execution time: 683.078 ms

分析：

CREATE TABLE locations (
  id integer NOT NULL,
  barcode_id integer NOT NULL
);

CREATE TABLE barcodes (
  id integer NOT NULL,
  value citext NOT NULL
);

ALTER TABLE ONLY locations ADD CONSTRAINT locations_pkey PRIMARY KEY (id);
ALTER TABLE ONLY barcodes ADD CONSTRAINT barcodes_pkey PRIMARY KEY (id);
ALTER TABLE ONLY locations ADD CONSTRAINT fk_locations_barcodes FOREIGN KEY (barcode_id) REFERENCES barcodes(id);

CREATE INDEX index_barcodes_on_value ON barcodes (value);
CREATE INDEX index_locations_on_barcode_id ON locations (barcode_id);

EXPLAIN ANALYZE
SELECT *
FROM locations
JOIN barcodes ON locations.barcode_id = barcodes.id
ORDER BY barcodes.value ASC
LIMIT 50;

Limit  (cost=0.71..3564.01 rows=50 width=34) (actual time=0.043..683.025 rows=50 loops=1)
  ->  Nested Loop  (cost=0.71..4090955.00 rows=57404 width=34) (actual time=0.043..683.017 rows=50 loops=1)
        ->  Index Scan using index_barcodes_on_value on barcodes  (cost=0.42..26865.99 rows=496422 width=15) (actual time=0.023..218.775 rows=372138 loops=1)
        ->  Index Scan using index_locations_on_barcode_id on locations  (cost=0.29..5.32 rows=287 width=8) (actual time=0.001..0.001 rows=0 loops=372138)
              Index Cond: (barcode_id = barcodes.id)
Planning time: 0.167 ms
Execution time: 683.078 ms

对于我的模式中的条目数（500000个条形码和60000个位置）来说，500+毫秒是没有意义的。我能做些什么来提高性能吗

注:

更奇怪的是，执行时间取决于数据。在起草这个问题时，我试图包含种子随机数据，但种子似乎表现得很好：

种子：

INSERT INTO barcodes (id, value) SELECT seed.id, gen_random_uuid() FROM generate_series(1,500000) AS seed(id);
INSERT INTO locations (id, barcode_id) SELECT seed.id, (RANDOM() * 500000)  FROM generate_series(1,60000) AS seed(id);

Limit  (cost=0.71..3602.63 rows=50 width=86) (actual time=0.089..1.123 rows=50 loops=1)
  ->  Nested Loop  (cost=0.71..4330662.42 rows=60116 width=86) (actual time=0.088..1.115 rows=50 loops=1)
        ->  Index Scan using index_barcodes_on_value on barcodes  (cost=0.42..44972.42 rows=500000 width=41) (actual time=0.006..0.319 rows=376 loops=1)
        ->  Index Scan using index_locations_on_barcode_id on locations  (cost=0.29..5.56 rows=301 width=8) (actual time=0.002..0.002 rows=0 loops=376)
              Index Cond: (barcode_id = barcodes.id)
Planning time: 0.213 ms
Execution time: 1.152 ms

分析：

INSERT INTO barcodes (id, value) SELECT seed.id, gen_random_uuid() FROM generate_series(1,500000) AS seed(id);
INSERT INTO locations (id, barcode_id) SELECT seed.id, (RANDOM() * 500000)  FROM generate_series(1,60000) AS seed(id);

Limit  (cost=0.71..3602.63 rows=50 width=86) (actual time=0.089..1.123 rows=50 loops=1)
  ->  Nested Loop  (cost=0.71..4330662.42 rows=60116 width=86) (actual time=0.088..1.115 rows=50 loops=1)
        ->  Index Scan using index_barcodes_on_value on barcodes  (cost=0.42..44972.42 rows=500000 width=41) (actual time=0.006..0.319 rows=376 loops=1)
        ->  Index Scan using index_locations_on_barcode_id on locations  (cost=0.29..5.56 rows=301 width=8) (actual time=0.002..0.002 rows=0 loops=376)
              Index Cond: (barcode_id = barcodes.id)
Planning time: 0.213 ms
Execution time: 1.152 ms

编辑：

CREATE TABLE locations (
  id integer NOT NULL,
  barcode_id integer NOT NULL
);

CREATE TABLE barcodes (
  id integer NOT NULL,
  value citext NOT NULL
);

ALTER TABLE ONLY locations ADD CONSTRAINT locations_pkey PRIMARY KEY (id);
ALTER TABLE ONLY barcodes ADD CONSTRAINT barcodes_pkey PRIMARY KEY (id);
ALTER TABLE ONLY locations ADD CONSTRAINT fk_locations_barcodes FOREIGN KEY (barcode_id) REFERENCES barcodes(id);

CREATE INDEX index_barcodes_on_value ON barcodes (value);
CREATE INDEX index_locations_on_barcode_id ON locations (barcode_id);

EXPLAIN ANALYZE
SELECT *
FROM locations
JOIN barcodes ON locations.barcode_id = barcodes.id
ORDER BY barcodes.value ASC
LIMIT 50;

Limit  (cost=0.71..3564.01 rows=50 width=34) (actual time=0.043..683.025 rows=50 loops=1)
  ->  Nested Loop  (cost=0.71..4090955.00 rows=57404 width=34) (actual time=0.043..683.017 rows=50 loops=1)
        ->  Index Scan using index_barcodes_on_value on barcodes  (cost=0.42..26865.99 rows=496422 width=15) (actual time=0.023..218.775 rows=372138 loops=1)
        ->  Index Scan using index_locations_on_barcode_id on locations  (cost=0.29..5.32 rows=287 width=8) (actual time=0.001..0.001 rows=0 loops=372138)
              Index Cond: (barcode_id = barcodes.id)
Planning time: 0.167 ms
Execution time: 683.078 ms

表格分析：

ANALYZE VERBOSE barcodes;
INFO:  analyzing "public.barcodes"
INFO:  "barcodes": scanned 2760 of 2760 pages, containing 496157 live 
rows and 0 dead rows; 30000 rows in sample, 496157 estimated total rows
ANALYZE
Time: 62.937 ms

ANALYZE VERBOSE locations;
INFO:  analyzing "public.locations"
INFO:  "locations": scanned 254 of 254 pages, containing 57394 live rows 
and 0 dead rows; 30000 rows in sample, 57394 estimated total rows
ANALYZE
Time: 21.447 ms

问题是具有低值的

条形码

在

位置

中没有匹配项，PostgreSQL无法知道这一点。因此，它计划通过索引以正确的输出顺序获取

条形码

，然后将

位置的值连接起来

，直到发现其中50个比预期的差得多

我将

分析表和
DROP INDEX index_barcodes_on_value;

这将阻止PostgreSQL选择该计划
我不知道PostgreSQL会选择什么计划。
对于嵌套循环，以下索引可能会有所帮助：
按条形码对两个表进行聚类_id@Jasen阅读“集群”——我从未使用过。我试着运行：使用条形码对条形码进行聚类\u pkey和群集位置但似乎没有帮助。我应该用不同的方法吗？@Jasen还有-听起来“cluster”不是一个永久性命令-但可能需要重新运行？请问一个愚蠢的问题：在运行查询之前，您是否在两个表上都运行了ANALYZE
？呵呵，这正是我要回答的，所以我正在投票。优化这一点将非常棘手，因为删除索引将需要一个庞大而缓慢的排序，除非两个表中的匹配行数都非常小。具体化“locations”表中的“value”列以及索引将使查询速度非常快，但它会使表非规范化并变得更胖，因此只有在这是一个非常重要的查询时才应该考虑它。