Performance 在数百万行上加速Postgres查询?

Performance 在数百万行上加速Postgres查询?,performance,postgresql,Performance,Postgresql,我正在使用Postgres 9.4。我有一张大桌子。这是我的表的结构: processing_date | date | practice_id | character varying(6) | chemical_id | character varying(9) | items | bigint | cost | double precision | Inde

我正在使用Postgres 9.4。我有一张大桌子。这是我的表的结构:

 processing_date | date                 |
 practice_id     | character varying(6) |
 chemical_id     | character varying(9) |
 items           | bigint               |
 cost            | double precision     |
Indexes:
    "vw_idx_chem_by_practice_chem_id" btree (chemical_id)
    "vw_idx_chem_by_practice_chem_id_vc" btree (chemical_id varchar_pattern_ops)
    "vw_idx_chem_by_practice_joint_id" btree (practice_id, chemical_id)
现在我想在表上运行一个LIKE查询。这是我的疑问:

EXPLAIN (ANALYSE, BUFFERS) SELECT sum(pr.cost) as actual_cost, 
      sum(pr.items) as items, pr.practice_id as row_id, 
      pc.name as row_name, pr.processing_date as date 
FROM vw_chemical_summary_by_practice pr 
JOIN frontend_practice pc ON pr.practice_id=pc.code 
WHERE (pr.chemical_id LIKE '0401%' ) 
GROUP BY pr.practice_id, pc.code, date 
ORDER BY date, pr.practice_id;
这是一个解释的结果:

正如您所看到的,它的速度很慢,部分原因是它在近400万行上运行位图堆扫描。(后续排序也很慢。)

我能做些什么来加快速度吗

我想知道我是否应该创建一个进一步的物化视图,或者多列索引是否会有所帮助,以便Postgres可以查看索引而不是磁盘

有什么方法可以使排序更有效吗

更新:以下是物化视图的定义:

    CREATE MATERIALIZED VIEW vw_chemical_summary_by_practice
    AS SELECT processing_date, practice_id, chemical_id, 
    SUM(total_items) AS items, SUM(actual_cost) AS cost
    FROM frontend_prescription
    GROUP BY processing_date, practice_id, chemical_id
以及基础表:

id                | integer                 | not null default nextval('frontend_prescription_id_seq'::regclass)
 presentation_code | character varying(15)   | not null
 total_items       | integer                 | not null
 actual_cost       | double precision        | not null
 processing_date   | date                    | not null
 practice_id       | character varying(6)    | not null
Indexes:
    "frontend_prescription_pkey" PRIMARY KEY, btree (id)
    "frontend_prescription_528f368c" btree (processing_date)
    "frontend_prescription_6ea07fe3" btree (practice_id)
    "frontend_prescription_idx_code" btree (presentation_code varchar_pattern_ops)
    "frontend_prescription_idx_date_and_code" btree (processing_date, presentation_code)
更新2:如果不清楚的话,我需要得到所有以'0401'开头的化学药品ID的总开支和项目,按实践和按月计算

-- assuming this is your original table:
CREATE TABLE practice_chemical_old
    ( processing_date date NOT NULL
    , practice_id     character varying(6) NOT NULL
    , chemical_id     character varying(9) NOT NULL
    , items           bigint NOT NULL DEFAULT NULL
    , cost            double precision
    );

-- create these three new tables to decompose it into
CREATE TABLE practice
    ( practice_id SERIAL NOT NULL PRIMARY KEY
    , practice_name character varying(6) UNIQUE
    );
CREATE TABLE chemical
    ( chemical_id SERIAL NOT NULL PRIMARY KEY
    , chemical_name character varying(9) UNIQUE
    );

CREATE TABLE practice_chemical_new
    ( practice_id INTEGER NOT NULL REFERENCES practice (practice_id)
    , chemical_id INTEGER NOT NULL REFERENCES chemical (chemical_id)
    , processing_date date NOT NULL
    , items bigint NOT NULL default 0
    , cost double precision
            -- Not sure if processing_date should be part of the key, too
    , PRIMARY KEY (practice_id, chemical_id)
    );

CREATE UNIQUE INDEX ON practice_chemical_new(chemical_id, practice_id);

INSERT INTO practice(practice_name)
SELECT DISTINCT practice_id FROM practice_chemical_old;

INSERT INTO chemical(chemical_name)
SELECT DISTINCT chemical_id FROM practice_chemical_old;

-- now populate the new tables from the old ones ...
INSERT INTO practice_chemical_new(practice_id, chemical_id, processing_date,items,cost)
SELECT p.practice_id, c.chemical_id, pco.processing_date, pco.items, pco.cost
FROM practice_chemical_old pco
JOIN practice p ON p.practice_name = pco.practice_id
JOIN chemical c ON c.chemical_name = pco.chemical_id
    ;

-- Now,  the original table *could* be represented by the following view (or table, or table expression):
CREATE VIEW practice_chemical_fake AS
SELECT pcn.processing_date AS processing_date
    , p.practice_name AS practice_id
    , c.chemical_name AS chemical_id
    , pcn.items AS items
    , pcn.cost AS cost
FROM practice_chemical_new pcn
JOIN practice p ON p.practice_id = pcn.practice_id
JOIN chemical c ON c.chemical_id = pcn.chemical_id
    ;

注:从原始问题看,不清楚是否可能有多个{实践,化学}实例(处理日期不同)。您可能需要稍微更改PK的定义

根据vw_化学总结创建索引idx_测试(substr(实践id::文本从1到4));然后执行substr(practice_id::text from 1 to 4)=“0401”如果practice_id包含分层的值,则最好为其值的前几个字符创建索引。索引小得多。你能添加视图定义吗?@Horia补充道,thanks@nagylzs那有帮助吗?我认为在400万行中查找值是很慢的,而不是索引查询。有没有理由在文本字段中输入一个数值<代码>其中(pr.chemical_id>='0401'和pr.chemical_id<'0402')可能会有所帮助,在文本案例中是的,可以使用具有不同日期的{practice,chemical}的多个实例。谢谢。我不太清楚为什么它会有帮助。查询仍将获取
化学标识开始的任何行
0401
。在这种情况下,仍有400万个条目。那么,这是否意味着我们将在新表上运行400万行查询,而不是在旧表上运行?哪一种可能也一样慢?不,只有2500种化学物质需要搜索。使用整数id和索引的FK,连接实际上是便宜的。另外:内存和磁盘占用空间(页数)更小。顺便说一句:你为什么不试试?保持新旧版本并行很容易,我会试试。对于从
0401
开始的化学品,我会在上面运行什么查询来获得按实践和按月计算的总支出?我想这是
SELECT*FROM practice\u chemical\u new where chemical\u id'0401%
-这不需要像以前那样搜索那么多行吗?这就是为什么我创建了“假视图”,模拟了“旧表”。它不是关于行,而是关于(磁盘)页面。