在Postgresql中优化窗口查询
我有一个产品表,有大约17000000条记录 CREATE TABLE vendor_prices ( id serial PRIMARY KEY, vendor integer NOT NULL, sku character varying(25) NOT NULL, category_name character varying(100) NOT NULL, price numeric(8,5) NOT NULL, effective_date timestamp without time zone, expiration_date timestamp without time zone DEFAULT (now() + '1 year'::interval) ); 尽管我有一个问题,那就是执行的时间太长了,特别是对于有大量记录的供应商。此查询中的特定供应商在Postgresql中优化窗口查询,sql,postgresql,window-functions,postgresql-performance,Sql,Postgresql,Window Functions,Postgresql Performance,我有一个产品表,有大约17000000条记录 CREATE TABLE vendor_prices ( id serial PRIMARY KEY, vendor integer NOT NULL, sku character varying(25) NOT NULL, category_name character varying(100) NOT NULL, price numeric(8,5) NOT NULL, effective_date timestamp w
,其中vendor=516
有大约300万行,其中只有大约80K行不是多余的。如何改进此查询
下面是执行解释分析的结果:
Aggregate (cost=987648.74..987648.75 rows=1 width=0) (actual time=38220.825..38220.825 rows=1 loops=1)
-> Subquery Scan on d (cost=862040.12..983596.85 rows=1620756 width=0) (actual time=31758.342..38211.262 rows=84245 loops=1)
Filter: (NOT d.del)
Rows Removed by Filter: 3094780
-> WindowAgg (cost=862040.12..951181.72 rows=3241513 width=25) (actual time=31758.220..37929.024 rows=3179025 loops=1)
-> Sort (cost=862040.12..870143.90 rows=3241513 width=25) (actual time=31758.196..34952.249 rows=3179025 loops=1)
Sort Key: vendor_prices.sku, vendor_prices.effective_date, vendor_prices.id
Sort Method: external merge Disk: 123448kB
-> Bitmap Heap Scan on vendor_prices (cost=60790.16..356386.08 rows=3241513 width=25) (actual time=350.911..1512.974 rows=3179025 loops=1)
Recheck Cond: (vendor = 516)
Heap Blocks: exact=47546
-> Bitmap Index Scan on idx_vendor_number (cost=0.00..59979.79 rows=3241513 width=0) (actual time=336.936..336.936 rows=3179025 loops=1)
Index Cond: (vendor = 516)
聚合(成本=987648.74..987648.75行=1宽度=0)(实际时间=38220.825..38220.825行=1圈=1)
->d上的子查询扫描(成本=862040.12..983596.85行=1620756宽度=0)(实际时间=31758.342..38211.262行=84245循环=1)
过滤器:(非d.del)
被筛选器删除的行:3094780
->WindowAgg(成本=862040.12..951181.72行=3241513宽度=25)(实际时间=31758.220..37929.024行=3179025循环=1)
->排序(成本=862040.12..870143.90行=3241513宽度=25)(实际时间=31758.196..34952.249行=3179025循环=1)
排序键:vendor\u prices.sku、vendor\u prices.effective\u date、vendor\u prices.id
排序方法:外部合并磁盘:123448kB
->供应商价格位图堆扫描(成本=60790.16..356386.08行=3241513宽度=25)(实际时间=350.911..1512.974行=3179025循环=1)
复查条件:(供应商=516)
堆块:精确=47546
->idx_供应商_编号上的位图索引扫描(成本=0.00..59979.79行=3241513宽度=0)(实际时间=336.936..336.936行=3179025循环=1)
索引条件:(供应商=516)
注:我有一个多列索引
,正如@Erwin在他的回答中所建议的:
- 在
上的[多列索引将非常适合这种情况-以这种特定的顺序(供应商、sku、生效日期、id)
但是它使用的是
idx\u供应商编号
,正如您在解释分析
中看到的那样,这只出现在供应商
列上外部合并磁盘:123448kB
是您最大的问题。尝试增加work\u mem
,直到这在内存中完成。@a\u horse\u,没有名称:谢谢-将work\u mem增加到512Meg,通过在内存中进行合并,总共减少了十几秒。这很有帮助,但我们真正需要的是至少提高一个数量级。
Aggregate (cost=987648.74..987648.75 rows=1 width=0) (actual time=38220.825..38220.825 rows=1 loops=1)
-> Subquery Scan on d (cost=862040.12..983596.85 rows=1620756 width=0) (actual time=31758.342..38211.262 rows=84245 loops=1)
Filter: (NOT d.del)
Rows Removed by Filter: 3094780
-> WindowAgg (cost=862040.12..951181.72 rows=3241513 width=25) (actual time=31758.220..37929.024 rows=3179025 loops=1)
-> Sort (cost=862040.12..870143.90 rows=3241513 width=25) (actual time=31758.196..34952.249 rows=3179025 loops=1)
Sort Key: vendor_prices.sku, vendor_prices.effective_date, vendor_prices.id
Sort Method: external merge Disk: 123448kB
-> Bitmap Heap Scan on vendor_prices (cost=60790.16..356386.08 rows=3241513 width=25) (actual time=350.911..1512.974 rows=3179025 loops=1)
Recheck Cond: (vendor = 516)
Heap Blocks: exact=47546
-> Bitmap Index Scan on idx_vendor_number (cost=0.00..59979.79 rows=3241513 width=0) (actual time=336.936..336.936 rows=3179025 loops=1)
Index Cond: (vendor = 516)