Performance 在数百万行上加速Postgres查询?
我正在使用Postgres 9.4。我有一张大桌子。这是我的表的结构:Performance 在数百万行上加速Postgres查询?,performance,postgresql,Performance,Postgresql,我正在使用Postgres 9.4。我有一张大桌子。这是我的表的结构: processing_date | date | practice_id | character varying(6) | chemical_id | character varying(9) | items | bigint | cost | double precision | Inde
processing_date | date |
practice_id | character varying(6) |
chemical_id | character varying(9) |
items | bigint |
cost | double precision |
Indexes:
"vw_idx_chem_by_practice_chem_id" btree (chemical_id)
"vw_idx_chem_by_practice_chem_id_vc" btree (chemical_id varchar_pattern_ops)
"vw_idx_chem_by_practice_joint_id" btree (practice_id, chemical_id)
现在我想在表上运行一个LIKE查询。这是我的疑问:
EXPLAIN (ANALYSE, BUFFERS) SELECT sum(pr.cost) as actual_cost,
sum(pr.items) as items, pr.practice_id as row_id,
pc.name as row_name, pr.processing_date as date
FROM vw_chemical_summary_by_practice pr
JOIN frontend_practice pc ON pr.practice_id=pc.code
WHERE (pr.chemical_id LIKE '0401%' )
GROUP BY pr.practice_id, pc.code, date
ORDER BY date, pr.practice_id;
这是一个解释的结果:
正如您所看到的,它的速度很慢,部分原因是它在近400万行上运行位图堆扫描。(后续排序也很慢。)
我能做些什么来加快速度吗
我想知道我是否应该创建一个进一步的物化视图,或者多列索引是否会有所帮助,以便Postgres可以查看索引而不是磁盘
有什么方法可以使排序更有效吗
更新:以下是物化视图的定义:
CREATE MATERIALIZED VIEW vw_chemical_summary_by_practice
AS SELECT processing_date, practice_id, chemical_id,
SUM(total_items) AS items, SUM(actual_cost) AS cost
FROM frontend_prescription
GROUP BY processing_date, practice_id, chemical_id
以及基础表:
id | integer | not null default nextval('frontend_prescription_id_seq'::regclass)
presentation_code | character varying(15) | not null
total_items | integer | not null
actual_cost | double precision | not null
processing_date | date | not null
practice_id | character varying(6) | not null
Indexes:
"frontend_prescription_pkey" PRIMARY KEY, btree (id)
"frontend_prescription_528f368c" btree (processing_date)
"frontend_prescription_6ea07fe3" btree (practice_id)
"frontend_prescription_idx_code" btree (presentation_code varchar_pattern_ops)
"frontend_prescription_idx_date_and_code" btree (processing_date, presentation_code)
更新2:如果不清楚的话,我需要得到所有以'0401'开头的化学药品ID的总开支和项目,按实践和按月计算
-- assuming this is your original table:
CREATE TABLE practice_chemical_old
( processing_date date NOT NULL
, practice_id character varying(6) NOT NULL
, chemical_id character varying(9) NOT NULL
, items bigint NOT NULL DEFAULT NULL
, cost double precision
);
-- create these three new tables to decompose it into
CREATE TABLE practice
( practice_id SERIAL NOT NULL PRIMARY KEY
, practice_name character varying(6) UNIQUE
);
CREATE TABLE chemical
( chemical_id SERIAL NOT NULL PRIMARY KEY
, chemical_name character varying(9) UNIQUE
);
CREATE TABLE practice_chemical_new
( practice_id INTEGER NOT NULL REFERENCES practice (practice_id)
, chemical_id INTEGER NOT NULL REFERENCES chemical (chemical_id)
, processing_date date NOT NULL
, items bigint NOT NULL default 0
, cost double precision
-- Not sure if processing_date should be part of the key, too
, PRIMARY KEY (practice_id, chemical_id)
);
CREATE UNIQUE INDEX ON practice_chemical_new(chemical_id, practice_id);
INSERT INTO practice(practice_name)
SELECT DISTINCT practice_id FROM practice_chemical_old;
INSERT INTO chemical(chemical_name)
SELECT DISTINCT chemical_id FROM practice_chemical_old;
-- now populate the new tables from the old ones ...
INSERT INTO practice_chemical_new(practice_id, chemical_id, processing_date,items,cost)
SELECT p.practice_id, c.chemical_id, pco.processing_date, pco.items, pco.cost
FROM practice_chemical_old pco
JOIN practice p ON p.practice_name = pco.practice_id
JOIN chemical c ON c.chemical_name = pco.chemical_id
;
-- Now, the original table *could* be represented by the following view (or table, or table expression):
CREATE VIEW practice_chemical_fake AS
SELECT pcn.processing_date AS processing_date
, p.practice_name AS practice_id
, c.chemical_name AS chemical_id
, pcn.items AS items
, pcn.cost AS cost
FROM practice_chemical_new pcn
JOIN practice p ON p.practice_id = pcn.practice_id
JOIN chemical c ON c.chemical_id = pcn.chemical_id
;
注:从原始问题看,不清楚是否可能有多个{实践,化学}实例(处理日期不同)。您可能需要稍微更改PK的定义 根据vw_化学总结创建索引idx_测试(substr(实践id::文本从1到4));然后执行substr(practice_id::text from 1 to 4)=“0401”如果practice_id包含分层的值,则最好为其值的前几个字符创建索引。索引小得多。你能添加视图定义吗?@Horia补充道,thanks@nagylzs那有帮助吗?我认为在400万行中查找值是很慢的,而不是索引查询。有没有理由在文本字段中输入一个数值<代码>其中(pr.chemical_id>='0401'和pr.chemical_id<'0402')可能会有所帮助,在文本案例中是的,可以使用具有不同日期的{practice,chemical}的多个实例。谢谢。我不太清楚为什么它会有帮助。查询仍将获取
化学标识开始的任何行0401
。在这种情况下,仍有400万个条目。那么,这是否意味着我们将在新表上运行400万行查询,而不是在旧表上运行?哪一种可能也一样慢?不,只有2500种化学物质需要搜索。使用整数id和索引的FK,连接实际上是便宜的。另外:内存和磁盘占用空间(页数)更小。顺便说一句:你为什么不试试?保持新旧版本并行很容易,我会试试。对于从0401
开始的化学品,我会在上面运行什么查询来获得按实践和按月计算的总支出?我想这是SELECT*FROM practice\u chemical\u new where chemical\u id'0401%
-这不需要像以前那样搜索那么多行吗?这就是为什么我创建了“假视图”,模拟了“旧表”。它不是关于行,而是关于(磁盘)页面。