PostgreSQL中时间戳和分组方式的查询优化_Postgresql_Indexing_Timestamp_Query Optimization

PostgreSQL中时间戳和分组方式的查询优化

postgresql indexing

PostgreSQL中时间戳和分组方式的查询优化,postgresql,indexing,timestamp,query-optimization,Postgresql,Indexing,Timestamp,Query Optimization,我想用以下结构查询我的表： Table "public.company_geo_table" Column | Type | Collation | Nullable | Default --------------------+--------+-----------+----------+--------- geoname_id | bigint | | | date

我想用以下结构查询我的表：

               Table "public.company_geo_table"
       Column       |  Type  | Collation | Nullable | Default 
--------------------+--------+-----------+----------+---------
 geoname_id         | bigint |           |          | 
 date               | text   |           |          | 
 cik                | text   |           |          | 
 count              | bigint |           |          | 
 country_iso_code   | text   |           |          | 
 subdivision_1_name | text   |           |          | 
 city_name          | text   |           |          | 
Indexes:
    "cik_country_index" btree (cik, country_iso_code)
    "cik_geoname_index" btree (cik, geoname_id)
    "cik_index" btree (cik)
    "date_index" brin (date)

我尝试了下面的sql查询，它需要在一段时间内查询特定的cik编号，并按带有geoname_id的cik分组（不同区域）

解释结果表明，他们只使用cik索引和日期索引，而没有使用cik_geoname索引。为什么？有什么方法可以优化我的解决方案吗？有新的指数吗？先谢谢你

HashAggregate  (cost=117182.79..117521.42 rows=27091 width=47) (actual time=560132.903..560134.229 rows=3552 loops=1)
   Group Key: cik, geoname_id
   ->  Bitmap Heap Scan on company_geo_table  (cost=16467.77..116979.48 rows=27108 width=23) (actual time=6486.232..560114.828 rows=8175 loops=1)
         Recheck Cond: ((date >= '2016-01-01'::text) AND (date <= '2016-01-10'::text) AND (cik = '1288776'::text))
         Rows Removed by Index Recheck: 16621155
         Heap Blocks: lossy=193098
         ->  BitmapAnd  (cost=16467.77..16467.77 rows=27428 width=0) (actual time=6469.640..6469.641 rows=0 loops=1)
               ->  Bitmap Index Scan on date_index  (cost=0.00..244.81 rows=7155101 width=0) (actual time=53.034..53.035 rows=8261120 loops=1)
                     Index Cond: ((date >= '2016-01-01'::text) AND (date <= '2016-01-10'::text))
               ->  Bitmap Index Scan on cik_index  (cost=0.00..16209.15 rows=739278 width=0) (actual time=6370.930..6370.930 rows=676231 loops=1)
                     Index Cond: (cik = '1111111'::text)
 Planning time: 12.909 ms
 Execution time: 560135.432 ms

HashAggregate（成本=117182.79..117521.42行=27091宽度=47）（实际时间=560132.903..560134.229行=3552循环=1）
组密钥：cik，地理名称\u id
->公司地理表格上的位图堆扫描（成本=16467.77..116979.48行=27108宽度=23）（实际时间=6486.232..560114.828行=8175循环=1）
重新检查条件：（（日期>='2016-01-01'：：文本）和（日期位图和（成本=16467.77..16467.77行=27428宽度=0）（实际时间=6469.640..6469.641行=0循环=1）
->位图索引日期扫描-索引（成本=0.00..244.81行=7155101宽度=0）（实际时间=53.034..53.035行=8261120循环=1）
索引条件：（（日期>='2016-01-01'：：文本）和（cik_索引上的日期位图索引扫描（成本=0.00..16209.15行=739278宽度=0）（实际时间=6370.930..6370.930行=676231循环=1）
索引条件：（cik='1111111'：：文本）
计划时间：12.909毫秒
执行时间：560135.432毫秒

没有很好的估计（可能值'1111111'使用得太频繁了（我不确定影响如何，但看起来，

cik

列的数据类型（文本）不正确），估计不好的原因（或部分原因）是什么

Bitmap Heap Scan on company_geo_table  (cost=16467.77..116979.48 rows=27108 width=23) (actual time=6486.232..560114.828 rows=8175 loops=1)

似乎复合索引

（日期，cik）

有帮助

您的问题似乎在这里：

Rows Removed by Index Recheck: 16621155
Heap Blocks: lossy=193098

您的

work\u mem

设置太低，因此PostgreSQL无法适合每个表行包含一位的位图，因此它将降级为每8K块一位。这意味着在位图堆扫描期间必须删除许多误报命中

尝试使用更高的

work\u mem

，看看这是否会提高查询性能

理想的指数应该是

CREATE INDEX ON company_geo_table (cik, date);

索引用于更快地搜索数据，因此键应在联接中使用或与输入值一起使用，因此当where条件/JOIN/中使用的两个列都带有输入值时，将使用它。但在您的情况下，字段geoname_id不是JOIN/input value的一部分。感谢您的重播。“1111111111”只是一个示例，不是真值。“cik”是每个公司的unqiue标识符，因此我认为文本是合理的。

CREATE INDEX ON company_geo_table (cik, date);