SQL join 1-many只查询父行,不重复

SQL join 1-many只查询父行,不重复,sql,postgresql,count,indexing,pattern-matching,Sql,Postgresql,Count,Indexing,Pattern Matching,我有两个表-发票和发票项目。这种关系是一对多。我的应用程序允许使用查询中的发票项目字段查询发票。只退回发票,不退回任何项目 例如,我希望获得所有具有项目的发票,其名称包含ac,不区分大小写。 输出是分页的,因此我执行一次查询以获得满足条件的发票的数量,然后执行另一次查询以获得发票的相应页面 表格大小如下: 发票-65000项记录 发票项目-3281518条记录 条款-5项 代表-5项 Shippvia-5项 每个发票最多链接到100个发票项目 我的问题是,我无法找出查询的最佳索引: 模式:

我有两个表-
发票
发票项目
。这种关系是一对多。我的应用程序允许使用查询中的发票项目字段查询发票。只退回发票,不退回任何项目

例如,我希望获得所有具有项目的发票,其名称包含
ac
,不区分大小写。 输出是分页的,因此我执行一次查询以获得满足条件的发票的数量,然后执行另一次查询以获得发票的相应页面

表格大小如下:

  • 发票-65000项记录
  • 发票项目-3281518条记录
  • 条款-5项
  • 代表-5项
  • Shippvia-5项
每个发票最多链接到100个发票项目

我的问题是,我无法找出查询的最佳索引:

模式:

CREATE TABLE invoiceitems
(
  id serial NOT NULL,
  invoice_id integer NOT NULL,
  name text NOT NULL,
  ...
  CONSTRAINT invoiceitems_pkey PRIMARY KEY (id),
  CONSTRAINT invoiceitems_invoice_id_fkey FOREIGN KEY (invoice_id)
      REFERENCES invoices (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
);

CREATE INDEX idx_lower_name
  ON invoiceitems
  USING btree
  (lower(name) COLLATE pg_catalog."default" text_pattern_ops);

CREATE TABLE invoices
(
  id serial NOT NULL,
  term_id integer,
  rep_id integer NOT NULL,
  ship_via_id integer,
  ...
  CONSTRAINT invoices_pkey PRIMARY KEY (id),
  CONSTRAINT invoices_rep_id_fkey FOREIGN KEY (rep_id)
      REFERENCES reps (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT invoices_ship_via_id_fkey FOREIGN KEY (ship_via_id)
      REFERENCES shipvia (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT invoices_term_id_fkey FOREIGN KEY (term_id)
      REFERENCES terms (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
);
SELECT COUNT(DISTINCT(o.id))
FROM invoices o
JOIN invoiceitems items ON items.invoice_id = o.id
LEFT JOIN terms t ON t.id = o.term_id
LEFT JOIN reps r ON r.id = o.rep_id
LEFT JOIN shipVia s ON s.id = o.ship_via_id WHERE LOWER(items.name) LIKE '%ac%';
CREATE INDEX invoiceitems_name_gin_trgm_idx ON invoiceitems USING gin (name gin_trgm_ops);
EXPLAIN ANALYZE SELECT COUNT(DISTINCT(o.id)) 
FROM invoices o 
JOIN invoiceitems items ON items.invoice_id = o.id 
WHERE items.name ILIKE '%ac%';

"Aggregate  (cost=78961.52..78961.53 rows=1 width=4) (actual time=5205.448..5205.450 rows=1 loops=1)"
"  ->  Nested Loop  (cost=0.00..78960.73 rows=316 width=4) (actual time=0.396..5176.761 rows=6518 loops=1)"
"        ->  Seq Scan on invoiceitems items  (cost=0.00..76885.98 rows=316 width=4) (actual time=0.021..4502.043 rows=6518 loops=1)"
"              Filter: (name ~~* '%ac%'::text)"
"              Rows Removed by Filter: 3275000"
"        ->  Index Only Scan using invoices_pkey on invoices o  (cost=0.00..6.56 rows=1 width=4) (actual time=0.012..0.015 rows=1 loops=6518)"
"              Index Cond: (id = items.invoice_id)"
"              Heap Fetches: 6518"
"Total runtime: 5205.509 ms"
EXPLAIN ANALYZE SELECT COUNT(1)
FROM   invoices o
WHERE EXISTS (
   SELECT 1
   FROM   invoiceitems i 
   WHERE  i.invoice_id = o.id
   AND    i.name ILIKE '%ac%'
   );

"Aggregate  (cost=76920.43..76920.44 rows=1 width=0) (actual time=5713.597..5713.598 rows=1 loops=1)"
"  ->  Nested Loop  (cost=76886.76..76919.64 rows=316 width=0) (actual time=5583.706..5703.801 rows=6518 loops=1)"
"        ->  HashAggregate  (cost=76886.76..76886.82 rows=5 width=4) (actual time=5583.568..5594.977 rows=6518 loops=1)"
"              ->  Seq Scan on invoiceitems i  (cost=0.00..76885.98 rows=316 width=4) (actual time=0.295..5148.801 rows=6518 loops=1)"
"                    Filter: (name ~~* '%ac%'::text)"
"                    Rows Removed by Filter: 3275000"
"        ->  Index Only Scan using invoices_pkey on invoices o  (cost=0.00..6.56 rows=1 width=4) (actual time=0.006..0.008 rows=1 loops=6518)"
"              Index Cond: (id = i.invoice_id)"
"              Heap Fetches: 6518"
"Total runtime: 5713.804 ms"
计数查询:

CREATE TABLE invoiceitems
(
  id serial NOT NULL,
  invoice_id integer NOT NULL,
  name text NOT NULL,
  ...
  CONSTRAINT invoiceitems_pkey PRIMARY KEY (id),
  CONSTRAINT invoiceitems_invoice_id_fkey FOREIGN KEY (invoice_id)
      REFERENCES invoices (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
);

CREATE INDEX idx_lower_name
  ON invoiceitems
  USING btree
  (lower(name) COLLATE pg_catalog."default" text_pattern_ops);

CREATE TABLE invoices
(
  id serial NOT NULL,
  term_id integer,
  rep_id integer NOT NULL,
  ship_via_id integer,
  ...
  CONSTRAINT invoices_pkey PRIMARY KEY (id),
  CONSTRAINT invoices_rep_id_fkey FOREIGN KEY (rep_id)
      REFERENCES reps (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT invoices_ship_via_id_fkey FOREIGN KEY (ship_via_id)
      REFERENCES shipvia (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT invoices_term_id_fkey FOREIGN KEY (term_id)
      REFERENCES terms (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
);
SELECT COUNT(DISTINCT(o.id))
FROM invoices o
JOIN invoiceitems items ON items.invoice_id = o.id
LEFT JOIN terms t ON t.id = o.term_id
LEFT JOIN reps r ON r.id = o.rep_id
LEFT JOIN shipVia s ON s.id = o.ship_via_id WHERE LOWER(items.name) LIKE '%ac%';
CREATE INDEX invoiceitems_name_gin_trgm_idx ON invoiceitems USING gin (name gin_trgm_ops);
EXPLAIN ANALYZE SELECT COUNT(DISTINCT(o.id)) 
FROM invoices o 
JOIN invoiceitems items ON items.invoice_id = o.id 
WHERE items.name ILIKE '%ac%';

"Aggregate  (cost=78961.52..78961.53 rows=1 width=4) (actual time=5205.448..5205.450 rows=1 loops=1)"
"  ->  Nested Loop  (cost=0.00..78960.73 rows=316 width=4) (actual time=0.396..5176.761 rows=6518 loops=1)"
"        ->  Seq Scan on invoiceitems items  (cost=0.00..76885.98 rows=316 width=4) (actual time=0.021..4502.043 rows=6518 loops=1)"
"              Filter: (name ~~* '%ac%'::text)"
"              Rows Removed by Filter: 3275000"
"        ->  Index Only Scan using invoices_pkey on invoices o  (cost=0.00..6.56 rows=1 width=4) (actual time=0.012..0.015 rows=1 loops=6518)"
"              Index Cond: (id = items.invoice_id)"
"              Heap Fetches: 6518"
"Total runtime: 5205.509 ms"
EXPLAIN ANALYZE SELECT COUNT(1)
FROM   invoices o
WHERE EXISTS (
   SELECT 1
   FROM   invoiceitems i 
   WHERE  i.invoice_id = o.id
   AND    i.name ILIKE '%ac%'
   );

"Aggregate  (cost=76920.43..76920.44 rows=1 width=0) (actual time=5713.597..5713.598 rows=1 loops=1)"
"  ->  Nested Loop  (cost=76886.76..76919.64 rows=316 width=0) (actual time=5583.706..5703.801 rows=6518 loops=1)"
"        ->  HashAggregate  (cost=76886.76..76886.82 rows=5 width=4) (actual time=5583.568..5594.977 rows=6518 loops=1)"
"              ->  Seq Scan on invoiceitems i  (cost=0.00..76885.98 rows=316 width=4) (actual time=0.295..5148.801 rows=6518 loops=1)"
"                    Filter: (name ~~* '%ac%'::text)"
"                    Rows Removed by Filter: 3275000"
"        ->  Index Only Scan using invoices_pkey on invoices o  (cost=0.00..6.56 rows=1 width=4) (actual time=0.006..0.008 rows=1 loops=6518)"
"              Index Cond: (id = i.invoice_id)"
"              Heap Fetches: 6518"
"Total runtime: 5713.804 ms"
结果:

CREATE TABLE invoiceitems
(
  id serial NOT NULL,
  invoice_id integer NOT NULL,
  name text NOT NULL,
  ...
  CONSTRAINT invoiceitems_pkey PRIMARY KEY (id),
  CONSTRAINT invoiceitems_invoice_id_fkey FOREIGN KEY (invoice_id)
      REFERENCES invoices (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
);

CREATE INDEX idx_lower_name
  ON invoiceitems
  USING btree
  (lower(name) COLLATE pg_catalog."default" text_pattern_ops);

CREATE TABLE invoices
(
  id serial NOT NULL,
  term_id integer,
  rep_id integer NOT NULL,
  ship_via_id integer,
  ...
  CONSTRAINT invoices_pkey PRIMARY KEY (id),
  CONSTRAINT invoices_rep_id_fkey FOREIGN KEY (rep_id)
      REFERENCES reps (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT invoices_ship_via_id_fkey FOREIGN KEY (ship_via_id)
      REFERENCES shipvia (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT invoices_term_id_fkey FOREIGN KEY (term_id)
      REFERENCES terms (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
);
SELECT COUNT(DISTINCT(o.id))
FROM invoices o
JOIN invoiceitems items ON items.invoice_id = o.id
LEFT JOIN terms t ON t.id = o.term_id
LEFT JOIN reps r ON r.id = o.rep_id
LEFT JOIN shipVia s ON s.id = o.ship_via_id WHERE LOWER(items.name) LIKE '%ac%';
CREATE INDEX invoiceitems_name_gin_trgm_idx ON invoiceitems USING gin (name gin_trgm_ops);
EXPLAIN ANALYZE SELECT COUNT(DISTINCT(o.id)) 
FROM invoices o 
JOIN invoiceitems items ON items.invoice_id = o.id 
WHERE items.name ILIKE '%ac%';

"Aggregate  (cost=78961.52..78961.53 rows=1 width=4) (actual time=5205.448..5205.450 rows=1 loops=1)"
"  ->  Nested Loop  (cost=0.00..78960.73 rows=316 width=4) (actual time=0.396..5176.761 rows=6518 loops=1)"
"        ->  Seq Scan on invoiceitems items  (cost=0.00..76885.98 rows=316 width=4) (actual time=0.021..4502.043 rows=6518 loops=1)"
"              Filter: (name ~~* '%ac%'::text)"
"              Rows Removed by Filter: 3275000"
"        ->  Index Only Scan using invoices_pkey on invoices o  (cost=0.00..6.56 rows=1 width=4) (actual time=0.012..0.015 rows=1 loops=6518)"
"              Index Cond: (id = items.invoice_id)"
"              Heap Fetches: 6518"
"Total runtime: 5205.509 ms"
EXPLAIN ANALYZE SELECT COUNT(1)
FROM   invoices o
WHERE EXISTS (
   SELECT 1
   FROM   invoiceitems i 
   WHERE  i.invoice_id = o.id
   AND    i.name ILIKE '%ac%'
   );

"Aggregate  (cost=76920.43..76920.44 rows=1 width=0) (actual time=5713.597..5713.598 rows=1 loops=1)"
"  ->  Nested Loop  (cost=76886.76..76919.64 rows=316 width=0) (actual time=5583.706..5703.801 rows=6518 loops=1)"
"        ->  HashAggregate  (cost=76886.76..76886.82 rows=5 width=4) (actual time=5583.568..5594.977 rows=6518 loops=1)"
"              ->  Seq Scan on invoiceitems i  (cost=0.00..76885.98 rows=316 width=4) (actual time=0.295..5148.801 rows=6518 loops=1)"
"                    Filter: (name ~~* '%ac%'::text)"
"                    Rows Removed by Filter: 3275000"
"        ->  Index Only Scan using invoices_pkey on invoices o  (cost=0.00..6.56 rows=1 width=4) (actual time=0.006..0.008 rows=1 loops=6518)"
"              Index Cond: (id = i.invoice_id)"
"              Heap Fetches: 6518"
"Total runtime: 5713.804 ms"
6518

查询计划

"Aggregate  (cost=107651.35..107651.36 rows=1 width=4)"
"  ->  Hash Join  (cost=3989.50..106010.59 rows=656304 width=4)"
"        Hash Cond: (items.invoice_id = o.id)"
"        ->  Seq Scan on invoiceitems items  (cost=0.00..85089.77 rows=656304 width=4)"
"              Filter: (lower(name) ~~ '%ac%'::text)"
"        ->  Hash  (cost=2859.00..2859.00 rows=65000 width=16)"
"              ->  Seq Scan on invoices o  (cost=0.00..2859.00 rows=65000 width=16)"
似乎我在
invoiceitems.name
字段上的函数索引根本不起作用。我想这是因为我在寻找名字的一部分,它不是名字的严格前缀。我不确定,但我的发票主键索引似乎在这里也不起作用

我的问题是我可以优化计数查询和/或我的模式以提高性能吗

我必须允许按名称的某些部分进行搜索,这些部分不是严格的前缀,而且我还必须支持不区分大小写的搜索

我返回匹配记录的查询同样错误:

SELECT DISTINCT(o.id), t.terms, r.rep, s.ship_via, ...
FROM invoices o
JOIN invoiceitems items ON items.invoice_id = o.id
LEFT JOIN terms t ON t.id = o.term_id
LEFT JOIN reps r ON r.id = o.rep_id
LEFT JOIN shipVia s ON s.id = o.ship_via_id WHERE LOWER(items.name) LIKE '%ac%' LIMIT 100;
及其计划:

"Limit  (cost=901846.63..901854.13 rows=100 width=627)"
"  ->  Unique  (cost=901846.63..951069.43 rows=656304 width=627)"
"        ->  Sort  (cost=901846.63..903487.39 rows=656304 width=627)"
"              Sort Key: o.id, t.terms, r.rep, s.ship_via, ..."
"              ->  Hash Join  (cost=11509.54..286596.53 rows=656304 width=627)"
"                    Hash Cond: (items.invoice_id = o.id)"
"                    ->  Seq Scan on invoiceitems items  (cost=0.00..85089.77 rows=656304 width=4)"
"                          Filter: (lower(name) ~~ '%ac%'::text)"
"                    ->  Hash  (cost=5491.03..5491.03 rows=65000 width=627)"
"                          ->  Hash Left Join  (cost=113.02..5491.03 rows=65000 width=627)"
"                                Hash Cond: (o.ship_via_id = s.id)"
"                                ->  Hash Left Join  (cost=75.35..4559.61 rows=65000 width=599)"
"                                      Hash Cond: (o.rep_id = r.id)"
"                                      ->  Hash Left Join  (cost=37.67..3628.19 rows=65000 width=571)"
"                                            Hash Cond: (o.term_id = t.id)"
"                                            ->  Seq Scan on invoices o  (cost=0.00..2859.00 rows=65000 width=543)"
"                                            ->  Hash  (cost=22.30..22.30 rows=1230 width=36)"
"                                                  ->  Seq Scan on terms t  (cost=0.00..22.30 rows=1230 width=36)"
"                                      ->  Hash  (cost=22.30..22.30 rows=1230 width=36)"
"                                            ->  Seq Scan on reps r  (cost=0.00..22.30 rows=1230 width=36)"
"                                ->  Hash  (cost=22.30..22.30 rows=1230 width=36)"
"                                      ->  Seq Scan on shipvia s  (cost=0.00..22.30 rows=1230 width=36)"
我只限于PostgreSQL。切换到SQL Server不是一个选项

编辑============================================================================================

我遵循了埃尔文非常翔实的指示,以下是我所拥有的

索引:

CREATE TABLE invoiceitems
(
  id serial NOT NULL,
  invoice_id integer NOT NULL,
  name text NOT NULL,
  ...
  CONSTRAINT invoiceitems_pkey PRIMARY KEY (id),
  CONSTRAINT invoiceitems_invoice_id_fkey FOREIGN KEY (invoice_id)
      REFERENCES invoices (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
);

CREATE INDEX idx_lower_name
  ON invoiceitems
  USING btree
  (lower(name) COLLATE pg_catalog."default" text_pattern_ops);

CREATE TABLE invoices
(
  id serial NOT NULL,
  term_id integer,
  rep_id integer NOT NULL,
  ship_via_id integer,
  ...
  CONSTRAINT invoices_pkey PRIMARY KEY (id),
  CONSTRAINT invoices_rep_id_fkey FOREIGN KEY (rep_id)
      REFERENCES reps (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT invoices_ship_via_id_fkey FOREIGN KEY (ship_via_id)
      REFERENCES shipvia (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT invoices_term_id_fkey FOREIGN KEY (term_id)
      REFERENCES terms (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
);
SELECT COUNT(DISTINCT(o.id))
FROM invoices o
JOIN invoiceitems items ON items.invoice_id = o.id
LEFT JOIN terms t ON t.id = o.term_id
LEFT JOIN reps r ON r.id = o.rep_id
LEFT JOIN shipVia s ON s.id = o.ship_via_id WHERE LOWER(items.name) LIKE '%ac%';
CREATE INDEX invoiceitems_name_gin_trgm_idx ON invoiceitems USING gin (name gin_trgm_ops);
EXPLAIN ANALYZE SELECT COUNT(DISTINCT(o.id)) 
FROM invoices o 
JOIN invoiceitems items ON items.invoice_id = o.id 
WHERE items.name ILIKE '%ac%';

"Aggregate  (cost=78961.52..78961.53 rows=1 width=4) (actual time=5205.448..5205.450 rows=1 loops=1)"
"  ->  Nested Loop  (cost=0.00..78960.73 rows=316 width=4) (actual time=0.396..5176.761 rows=6518 loops=1)"
"        ->  Seq Scan on invoiceitems items  (cost=0.00..76885.98 rows=316 width=4) (actual time=0.021..4502.043 rows=6518 loops=1)"
"              Filter: (name ~~* '%ac%'::text)"
"              Rows Removed by Filter: 3275000"
"        ->  Index Only Scan using invoices_pkey on invoices o  (cost=0.00..6.56 rows=1 width=4) (actual time=0.012..0.015 rows=1 loops=6518)"
"              Index Cond: (id = items.invoice_id)"
"              Heap Fetches: 6518"
"Total runtime: 5205.509 ms"
EXPLAIN ANALYZE SELECT COUNT(1)
FROM   invoices o
WHERE EXISTS (
   SELECT 1
   FROM   invoiceitems i 
   WHERE  i.invoice_id = o.id
   AND    i.name ILIKE '%ac%'
   );

"Aggregate  (cost=76920.43..76920.44 rows=1 width=0) (actual time=5713.597..5713.598 rows=1 loops=1)"
"  ->  Nested Loop  (cost=76886.76..76919.64 rows=316 width=0) (actual time=5583.706..5703.801 rows=6518 loops=1)"
"        ->  HashAggregate  (cost=76886.76..76886.82 rows=5 width=4) (actual time=5583.568..5594.977 rows=6518 loops=1)"
"              ->  Seq Scan on invoiceitems i  (cost=0.00..76885.98 rows=316 width=4) (actual time=0.295..5148.801 rows=6518 loops=1)"
"                    Filter: (name ~~* '%ac%'::text)"
"                    Rows Removed by Filter: 3275000"
"        ->  Index Only Scan using invoices_pkey on invoices o  (cost=0.00..6.56 rows=1 width=4) (actual time=0.006..0.008 rows=1 loops=6518)"
"              Index Cond: (id = i.invoice_id)"
"              Heap Fetches: 6518"
"Total runtime: 5713.804 ms"
包含联接但不包含额外表的计数查询:

CREATE TABLE invoiceitems
(
  id serial NOT NULL,
  invoice_id integer NOT NULL,
  name text NOT NULL,
  ...
  CONSTRAINT invoiceitems_pkey PRIMARY KEY (id),
  CONSTRAINT invoiceitems_invoice_id_fkey FOREIGN KEY (invoice_id)
      REFERENCES invoices (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
);

CREATE INDEX idx_lower_name
  ON invoiceitems
  USING btree
  (lower(name) COLLATE pg_catalog."default" text_pattern_ops);

CREATE TABLE invoices
(
  id serial NOT NULL,
  term_id integer,
  rep_id integer NOT NULL,
  ship_via_id integer,
  ...
  CONSTRAINT invoices_pkey PRIMARY KEY (id),
  CONSTRAINT invoices_rep_id_fkey FOREIGN KEY (rep_id)
      REFERENCES reps (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT invoices_ship_via_id_fkey FOREIGN KEY (ship_via_id)
      REFERENCES shipvia (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT invoices_term_id_fkey FOREIGN KEY (term_id)
      REFERENCES terms (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
);
SELECT COUNT(DISTINCT(o.id))
FROM invoices o
JOIN invoiceitems items ON items.invoice_id = o.id
LEFT JOIN terms t ON t.id = o.term_id
LEFT JOIN reps r ON r.id = o.rep_id
LEFT JOIN shipVia s ON s.id = o.ship_via_id WHERE LOWER(items.name) LIKE '%ac%';
CREATE INDEX invoiceitems_name_gin_trgm_idx ON invoiceitems USING gin (name gin_trgm_ops);
EXPLAIN ANALYZE SELECT COUNT(DISTINCT(o.id)) 
FROM invoices o 
JOIN invoiceitems items ON items.invoice_id = o.id 
WHERE items.name ILIKE '%ac%';

"Aggregate  (cost=78961.52..78961.53 rows=1 width=4) (actual time=5205.448..5205.450 rows=1 loops=1)"
"  ->  Nested Loop  (cost=0.00..78960.73 rows=316 width=4) (actual time=0.396..5176.761 rows=6518 loops=1)"
"        ->  Seq Scan on invoiceitems items  (cost=0.00..76885.98 rows=316 width=4) (actual time=0.021..4502.043 rows=6518 loops=1)"
"              Filter: (name ~~* '%ac%'::text)"
"              Rows Removed by Filter: 3275000"
"        ->  Index Only Scan using invoices_pkey on invoices o  (cost=0.00..6.56 rows=1 width=4) (actual time=0.012..0.015 rows=1 loops=6518)"
"              Index Cond: (id = items.invoice_id)"
"              Heap Fetches: 6518"
"Total runtime: 5205.509 ms"
EXPLAIN ANALYZE SELECT COUNT(1)
FROM   invoices o
WHERE EXISTS (
   SELECT 1
   FROM   invoiceitems i 
   WHERE  i.invoice_id = o.id
   AND    i.name ILIKE '%ac%'
   );

"Aggregate  (cost=76920.43..76920.44 rows=1 width=0) (actual time=5713.597..5713.598 rows=1 loops=1)"
"  ->  Nested Loop  (cost=76886.76..76919.64 rows=316 width=0) (actual time=5583.706..5703.801 rows=6518 loops=1)"
"        ->  HashAggregate  (cost=76886.76..76886.82 rows=5 width=4) (actual time=5583.568..5594.977 rows=6518 loops=1)"
"              ->  Seq Scan on invoiceitems i  (cost=0.00..76885.98 rows=316 width=4) (actual time=0.295..5148.801 rows=6518 loops=1)"
"                    Filter: (name ~~* '%ac%'::text)"
"                    Rows Removed by Filter: 3275000"
"        ->  Index Only Scan using invoices_pkey on invoices o  (cost=0.00..6.56 rows=1 width=4) (actual time=0.006..0.008 rows=1 loops=6518)"
"              Index Cond: (id = i.invoice_id)"
"              Heap Fetches: 6518"
"Total runtime: 5713.804 ms"
带有半联接的计数查询:

CREATE TABLE invoiceitems
(
  id serial NOT NULL,
  invoice_id integer NOT NULL,
  name text NOT NULL,
  ...
  CONSTRAINT invoiceitems_pkey PRIMARY KEY (id),
  CONSTRAINT invoiceitems_invoice_id_fkey FOREIGN KEY (invoice_id)
      REFERENCES invoices (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
);

CREATE INDEX idx_lower_name
  ON invoiceitems
  USING btree
  (lower(name) COLLATE pg_catalog."default" text_pattern_ops);

CREATE TABLE invoices
(
  id serial NOT NULL,
  term_id integer,
  rep_id integer NOT NULL,
  ship_via_id integer,
  ...
  CONSTRAINT invoices_pkey PRIMARY KEY (id),
  CONSTRAINT invoices_rep_id_fkey FOREIGN KEY (rep_id)
      REFERENCES reps (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT invoices_ship_via_id_fkey FOREIGN KEY (ship_via_id)
      REFERENCES shipvia (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT invoices_term_id_fkey FOREIGN KEY (term_id)
      REFERENCES terms (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
);
SELECT COUNT(DISTINCT(o.id))
FROM invoices o
JOIN invoiceitems items ON items.invoice_id = o.id
LEFT JOIN terms t ON t.id = o.term_id
LEFT JOIN reps r ON r.id = o.rep_id
LEFT JOIN shipVia s ON s.id = o.ship_via_id WHERE LOWER(items.name) LIKE '%ac%';
CREATE INDEX invoiceitems_name_gin_trgm_idx ON invoiceitems USING gin (name gin_trgm_ops);
EXPLAIN ANALYZE SELECT COUNT(DISTINCT(o.id)) 
FROM invoices o 
JOIN invoiceitems items ON items.invoice_id = o.id 
WHERE items.name ILIKE '%ac%';

"Aggregate  (cost=78961.52..78961.53 rows=1 width=4) (actual time=5205.448..5205.450 rows=1 loops=1)"
"  ->  Nested Loop  (cost=0.00..78960.73 rows=316 width=4) (actual time=0.396..5176.761 rows=6518 loops=1)"
"        ->  Seq Scan on invoiceitems items  (cost=0.00..76885.98 rows=316 width=4) (actual time=0.021..4502.043 rows=6518 loops=1)"
"              Filter: (name ~~* '%ac%'::text)"
"              Rows Removed by Filter: 3275000"
"        ->  Index Only Scan using invoices_pkey on invoices o  (cost=0.00..6.56 rows=1 width=4) (actual time=0.012..0.015 rows=1 loops=6518)"
"              Index Cond: (id = items.invoice_id)"
"              Heap Fetches: 6518"
"Total runtime: 5205.509 ms"
EXPLAIN ANALYZE SELECT COUNT(1)
FROM   invoices o
WHERE EXISTS (
   SELECT 1
   FROM   invoiceitems i 
   WHERE  i.invoice_id = o.id
   AND    i.name ILIKE '%ac%'
   );

"Aggregate  (cost=76920.43..76920.44 rows=1 width=0) (actual time=5713.597..5713.598 rows=1 loops=1)"
"  ->  Nested Loop  (cost=76886.76..76919.64 rows=316 width=0) (actual time=5583.706..5703.801 rows=6518 loops=1)"
"        ->  HashAggregate  (cost=76886.76..76886.82 rows=5 width=4) (actual time=5583.568..5594.977 rows=6518 loops=1)"
"              ->  Seq Scan on invoiceitems i  (cost=0.00..76885.98 rows=316 width=4) (actual time=0.295..5148.801 rows=6518 loops=1)"
"                    Filter: (name ~~* '%ac%'::text)"
"                    Rows Removed by Filter: 3275000"
"        ->  Index Only Scan using invoices_pkey on invoices o  (cost=0.00..6.56 rows=1 width=4) (actual time=0.006..0.008 rows=1 loops=6518)"
"              Index Cond: (id = i.invoice_id)"
"              Heap Fetches: 6518"
"Total runtime: 5713.804 ms"
半联接似乎没有效果。为什么?

(我认为这无关紧要,但我删除了
lower(invoiceitems.name)
上的原始功能索引)

编辑2=======================================================================================================

我想重点介绍fetchrows查询,并提供更多的上下文

首先,用户可能要求从发票(但不是从发票项)按任意字段对列进行排序

此外,用户还可以提供涉及发票和发票项目字段的筛选语句列表。这些筛选器语句捕获了按字符串或数值进行筛选的语义,例如,筛选器可以是“发票项目名称包含'ac'且发票折扣高于5%”

我很清楚,我不可能对每个字段都编制索引,我可能只需要对最常见的字段编制索引,如发票项目名称和其他一些字段

无论如何,以下是我迄今为止在invoices和InvoicesItems表上的索引:

发票

  • id作为主键
发票项目

  • id作为主键
  • 使用btree(发票id)在发票项上创建索引发票项\u发票id\u idx
  • 使用gin在发票项上创建索引发票项。\u name\u gin\u trgm\u idx(name COLLATE pg\u catalog.“default”gin\u trgm\u ops)
下面是对发票项使用JOIN的fetch rows查询的分析:

explain analyze
SELECT DISTINCT(o.id), t.terms, r.rep, s.ship_via, ...
FROM invoices o
JOIN invoiceitems items ON items.invoice_id = o.id
LEFT JOIN terms t ON t.id = o.term_id
LEFT JOIN reps r ON r.id = o.rep_id
LEFT JOIN shipVia s ON s.id = o.ship_via_id 
WHERE (items.name ILIKE '%df%' AND items.name IS NOT NULL) LIMIT 100;

"Limit  (cost=79100.70..79106.95 rows=100 width=312) (actual time=4637.195..4637.195 rows=0 loops=1)"
"  ->  Unique  (cost=79100.70..79120.45 rows=316 width=312) (actual time=4637.190..4637.190 rows=0 loops=1)"
"        ->  Sort  (cost=79100.70..79101.49 rows=316 width=312) (actual time=4637.186..4637.186 rows=0 loops=1)"
"              Sort Key: o.id, o.customer, o.business_no, o.bill_to_name, o.bill_to_address1, o.bill_to_address2, o.bill_to_postal_code, o.ship_to_name, o.ship_to_address1, o.ship_to_address2, o.ship_to_postal_code, o.purchase_order_no, t.terms, r.rep, ((o.ship_date)::text), s.ship_via, o.delivery, o.hst_percents, o.sub_total, o.total_before_hst, o.total, o.total_discount, o.hst, o.item_count"
"              Sort Method: quicksort  Memory: 25kB"
"              ->  Hash Left Join  (cost=113.02..79087.58 rows=316 width=312) (actual time=4637.179..4637.179 rows=0 loops=1)"
"                    Hash Cond: (o.ship_via_id = s.id)"
"                    ->  Hash Left Join  (cost=75.35..79043.98 rows=316 width=284) (actual time=4637.123..4637.123 rows=0 loops=1)"
"                          Hash Cond: (o.rep_id = r.id)"
"                          ->  Hash Left Join  (cost=37.67..79001.96 rows=316 width=256) (actual time=4637.119..4637.119 rows=0 loops=1)"
"                                Hash Cond: (o.term_id = t.id)"
"                                ->  Nested Loop  (cost=0.00..78960.73 rows=316 width=228) (actual time=4637.115..4637.115 rows=0 loops=1)"
"                                      ->  Seq Scan on invoiceitems items  (cost=0.00..76885.98 rows=316 width=4) (actual time=4637.108..4637.108 rows=0 loops=1)"
"                                            Filter: ((name IS NOT NULL) AND (name ~~* '%df%'::text))"
"                                            Rows Removed by Filter: 3281518"
"                                      ->  Index Scan using invoices_pkey on invoices o  (cost=0.00..6.56 rows=1 width=228) (never executed)"
"                                            Index Cond: (id = items.invoice_id)"
"                                ->  Hash  (cost=22.30..22.30 rows=1230 width=36) (never executed)"
"                                      ->  Seq Scan on terms t  (cost=0.00..22.30 rows=1230 width=36) (never executed)"
"                          ->  Hash  (cost=22.30..22.30 rows=1230 width=36) (never executed)"
"                                ->  Seq Scan on reps r  (cost=0.00..22.30 rows=1230 width=36) (never executed)"
"                    ->  Hash  (cost=22.30..22.30 rows=1230 width=36) (never executed)"
"                          ->  Seq Scan on shipvia s  (cost=0.00..22.30 rows=1230 width=36) (never executed)"
"Total runtime: 4637.731 ms"
explain analyze
SELECT o.id, t.terms, r.rep, s.ship_via, ...
FROM invoices o
LEFT JOIN terms t ON t.id = o.term_id
LEFT JOIN reps r ON r.id = o.rep_id
LEFT JOIN shipVia s ON s.id = o.ship_via_id 
WHERE EXISTS (
   SELECT 1
   FROM   invoiceitems i 
   WHERE  i.invoice_id = o.id
   AND    i.name ILIKE '%df%'
   AND    i.name IS NOT NULL
   ) LIMIT 100;

"Limit  (cost=0.19..43302.88 rows=100 width=610) (actual time=5771.852..5771.852 rows=0 loops=1)"
"  ->  Nested Loop Left Join  (cost=0.19..136836.68 rows=316 width=610) (actual time=5771.848..5771.848 rows=0 loops=1)"
"        ->  Nested Loop Left Join  (cost=0.19..135404.33 rows=316 width=582) (actual time=5771.844..5771.844 rows=0 loops=1)"
"              ->  Nested Loop Left Join  (cost=0.19..134052.55 rows=316 width=554) (actual time=5771.841..5771.841 rows=0 loops=1)"
"                    ->  Merge Semi Join  (cost=0.19..132700.78 rows=316 width=526) (actual time=5771.837..5771.837 rows=0 loops=1)"
"                          Merge Cond: (o.id = i.invoice_id)"
"                          ->  Index Scan using invoices_pkey on invoices o  (cost=0.00..3907.27 rows=65000 width=526) (actual time=0.017..0.017 rows=1 loops=1)"
"                          ->  Index Scan using invoiceitems_invoice_id_idx on invoiceitems i  (cost=0.00..129298.19 rows=316 width=4) (actual time=5771.812..5771.812 rows=0 loops=1)"
"                                Filter: ((name IS NOT NULL) AND (name ~~* '%df%'::text))"
"                                Rows Removed by Filter: 3281518"
"                    ->  Index Scan using terms_pkey on terms t  (cost=0.00..4.27 rows=1 width=36) (never executed)"
"                          Index Cond: (id = o.term_id)"
"              ->  Index Scan using reps_pkey on reps r  (cost=0.00..4.27 rows=1 width=36) (never executed)"
"                    Index Cond: (id = o.rep_id)"
"        ->  Index Scan using shipvia_pkey on shipvia s  (cost=0.00..4.27 rows=1 width=36) (never executed)"
"              Index Cond: (id = o.ship_via_id)"
"Total runtime: 5771.948 ms"
以下是使用WHERE EXISTS而不是针对发票项的联接对fetch rows查询的分析:

explain analyze
SELECT DISTINCT(o.id), t.terms, r.rep, s.ship_via, ...
FROM invoices o
JOIN invoiceitems items ON items.invoice_id = o.id
LEFT JOIN terms t ON t.id = o.term_id
LEFT JOIN reps r ON r.id = o.rep_id
LEFT JOIN shipVia s ON s.id = o.ship_via_id 
WHERE (items.name ILIKE '%df%' AND items.name IS NOT NULL) LIMIT 100;

"Limit  (cost=79100.70..79106.95 rows=100 width=312) (actual time=4637.195..4637.195 rows=0 loops=1)"
"  ->  Unique  (cost=79100.70..79120.45 rows=316 width=312) (actual time=4637.190..4637.190 rows=0 loops=1)"
"        ->  Sort  (cost=79100.70..79101.49 rows=316 width=312) (actual time=4637.186..4637.186 rows=0 loops=1)"
"              Sort Key: o.id, o.customer, o.business_no, o.bill_to_name, o.bill_to_address1, o.bill_to_address2, o.bill_to_postal_code, o.ship_to_name, o.ship_to_address1, o.ship_to_address2, o.ship_to_postal_code, o.purchase_order_no, t.terms, r.rep, ((o.ship_date)::text), s.ship_via, o.delivery, o.hst_percents, o.sub_total, o.total_before_hst, o.total, o.total_discount, o.hst, o.item_count"
"              Sort Method: quicksort  Memory: 25kB"
"              ->  Hash Left Join  (cost=113.02..79087.58 rows=316 width=312) (actual time=4637.179..4637.179 rows=0 loops=1)"
"                    Hash Cond: (o.ship_via_id = s.id)"
"                    ->  Hash Left Join  (cost=75.35..79043.98 rows=316 width=284) (actual time=4637.123..4637.123 rows=0 loops=1)"
"                          Hash Cond: (o.rep_id = r.id)"
"                          ->  Hash Left Join  (cost=37.67..79001.96 rows=316 width=256) (actual time=4637.119..4637.119 rows=0 loops=1)"
"                                Hash Cond: (o.term_id = t.id)"
"                                ->  Nested Loop  (cost=0.00..78960.73 rows=316 width=228) (actual time=4637.115..4637.115 rows=0 loops=1)"
"                                      ->  Seq Scan on invoiceitems items  (cost=0.00..76885.98 rows=316 width=4) (actual time=4637.108..4637.108 rows=0 loops=1)"
"                                            Filter: ((name IS NOT NULL) AND (name ~~* '%df%'::text))"
"                                            Rows Removed by Filter: 3281518"
"                                      ->  Index Scan using invoices_pkey on invoices o  (cost=0.00..6.56 rows=1 width=228) (never executed)"
"                                            Index Cond: (id = items.invoice_id)"
"                                ->  Hash  (cost=22.30..22.30 rows=1230 width=36) (never executed)"
"                                      ->  Seq Scan on terms t  (cost=0.00..22.30 rows=1230 width=36) (never executed)"
"                          ->  Hash  (cost=22.30..22.30 rows=1230 width=36) (never executed)"
"                                ->  Seq Scan on reps r  (cost=0.00..22.30 rows=1230 width=36) (never executed)"
"                    ->  Hash  (cost=22.30..22.30 rows=1230 width=36) (never executed)"
"                          ->  Seq Scan on shipvia s  (cost=0.00..22.30 rows=1230 width=36) (never executed)"
"Total runtime: 4637.731 ms"
explain analyze
SELECT o.id, t.terms, r.rep, s.ship_via, ...
FROM invoices o
LEFT JOIN terms t ON t.id = o.term_id
LEFT JOIN reps r ON r.id = o.rep_id
LEFT JOIN shipVia s ON s.id = o.ship_via_id 
WHERE EXISTS (
   SELECT 1
   FROM   invoiceitems i 
   WHERE  i.invoice_id = o.id
   AND    i.name ILIKE '%df%'
   AND    i.name IS NOT NULL
   ) LIMIT 100;

"Limit  (cost=0.19..43302.88 rows=100 width=610) (actual time=5771.852..5771.852 rows=0 loops=1)"
"  ->  Nested Loop Left Join  (cost=0.19..136836.68 rows=316 width=610) (actual time=5771.848..5771.848 rows=0 loops=1)"
"        ->  Nested Loop Left Join  (cost=0.19..135404.33 rows=316 width=582) (actual time=5771.844..5771.844 rows=0 loops=1)"
"              ->  Nested Loop Left Join  (cost=0.19..134052.55 rows=316 width=554) (actual time=5771.841..5771.841 rows=0 loops=1)"
"                    ->  Merge Semi Join  (cost=0.19..132700.78 rows=316 width=526) (actual time=5771.837..5771.837 rows=0 loops=1)"
"                          Merge Cond: (o.id = i.invoice_id)"
"                          ->  Index Scan using invoices_pkey on invoices o  (cost=0.00..3907.27 rows=65000 width=526) (actual time=0.017..0.017 rows=1 loops=1)"
"                          ->  Index Scan using invoiceitems_invoice_id_idx on invoiceitems i  (cost=0.00..129298.19 rows=316 width=4) (actual time=5771.812..5771.812 rows=0 loops=1)"
"                                Filter: ((name IS NOT NULL) AND (name ~~* '%df%'::text))"
"                                Rows Removed by Filter: 3281518"
"                    ->  Index Scan using terms_pkey on terms t  (cost=0.00..4.27 rows=1 width=36) (never executed)"
"                          Index Cond: (id = o.term_id)"
"              ->  Index Scan using reps_pkey on reps r  (cost=0.00..4.27 rows=1 width=36) (never executed)"
"                    Index Cond: (id = o.rep_id)"
"        ->  Index Scan using shipvia_pkey on shipvia s  (cost=0.00..4.27 rows=1 width=36) (never executed)"
"              Index Cond: (id = o.ship_via_id)"
"Total runtime: 5771.948 ms"
我没有尝试第三个选项,即按不同的发票id对invoiceitems行进行排序,因为这种方法似乎只有在没有给出排序时才可行,而通常相反,排序是存在的。

三联图指数 使用模块提供的三角图索引,该模块为GIN或GiST索引提供运算符类,以支持所有模式,而不仅仅是左锚定模式:

在这篇文章中可以找到模式匹配和索引的概述。
更多关于如何在这个相关答案中使用三角图索引的信息(在许多其他答案中):

例如:

CREATE EXTENSION pg_tgrm;  -- only once per db

CREATE INDEX invoiceitems_name_gist_trgm_idx
ON invoiceitems USING gist (name gist_trgm_ops);
一个
GIN
索引可能会更快,但也更大

根据经验法则,
GIN
索引比
GiST
索引搜索速度更快, 但构建或更新速度较慢;因此,
GIN
更适合静态数据 和
GiST
获取经常更新的数据

这完全取决于你的具体要求

附加树索引 当然,您还需要在
invoiceitems.invoice\u id
上使用普通的btree索引(默认值)

CREATE INDEX invoiceitems_invoice_id_idx ON invoiceitems (invoice_id);
仅用于索引扫描的多列索引 在Postgres 9.2或更高版本中,您可能会从中获得一些额外的好处。GIN索引通常对
integer
列(如
invoice\u id
)没有意义。但为了节省额外的堆查找,将其包含在多列GIN(或GiST)索引中可能需要付出代价。你得测试一下

为此,您需要(或分别)附加模块。杜松子酒的例子:

CREATE EXTENSION btree_gin;

CREATE INDEX invoiceitems_name_gin_trgm_idx
ON invoiceitems USING gin (name gin_trgm_ops, invoice_id);
这将消除对上述btree索引的需要,但无论如何都要确保创建它,以使fk检查速度更快,但对于许多其他情况也是如此

询问 计数 为了

查询以获取发票的计数

。。。省略只会造成伤害的其他表格(如果有):

或者您可以测试这个变量,它作为子选择连接到上述查询。可能更快:

SELECT t.terms, r.rep, s.ship_via, ...
FROM  (
   SELECT DISTINCT invoice_id
   FROM   invoiceitems
   WHERE  name ILIKE '%ac%'
   ORDER  BY invoice_id           -- order by id = cheapest with above index
   LIMIT  100                     -- LIMIT early!
   ) item
JOIN   invoices     o ON o.id = item.invoice_id
LEFT   JOIN terms   t ON t.id = o.term_id
LEFT   JOIN reps    r ON r.id = o.rep_id
LEFT   JOIN shipVia s ON s.id = o.ship_via_id
-- ORDER BY ???
;
此示例通过
发票id
获取前100个(因为您没有提供排序顺序)。这完全取决于细节…

三联图指数 使用wh模块提供的三元索引