Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/postgresql/10.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/asp.net/35.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Postgresql 在where子句中使用非常常见的值时查询速度较慢_Postgresql_Query Performance - Fatal编程技术网

Postgresql 在where子句中使用非常常见的值时查询速度较慢

Postgresql 在where子句中使用非常常见的值时查询速度较慢,postgresql,query-performance,Postgresql,Query Performance,我有一些航班(~3 mill)和一些聚合航班(~15 mill),现在我想要聚合航班中不存在的航班 现在,我想知道如何在此查询中获得最佳性能: select f.id from flights f left join aggregated_flights af on af.flight_id = f.id where af.flight_id is null and f.status = 'COMMITED' ; 如果省略status子句,查询速度会非常快,

我有一些航班(~3 mill)和一些聚合航班(~15 mill),现在我想要聚合航班中不存在的航班

现在,我想知道如何在此查询中获得最佳性能:

select 
  f.id 
from 
  flights f left join 
  aggregated_flights af on af.flight_id = f.id 
where 
  af.flight_id is null and 
  f.status = 'COMMITED' 
;
如果省略status子句,查询速度会非常快,但如果包含status子句,则查询需要1-2分钟

对于约99%的航班,“状态”列中的值为“已提交”

我创建了一个部分索引,如下所示:

create index on flights (id) where status = 'COMMITED';
但这似乎没有效果——查询速度仍然很慢

这里有什么建议

(有Postgresql 9.4和9.6方面的经验)

表定义:

app=> \d flights
                                          Table "public.flights"
        Column        |            Type             |                      Modifiers                       
----------------------+-----------------------------+------------------------------------------------------
 id                   | integer                     | not null default nextval('flights_id_seq'::regclass)
 name                 | character varying           | 
 aircraft_id          | integer                     | 
 status               | character varying           | 
 departure_airport_id | integer                     | 
 arrival_airport_id   | integer                     | 
 departure_time       | timestamp without time zone | 
 off_block            | timestamp without time zone | 
 arrival_time         | timestamp without time zone | 
 on_block             | timestamp without time zone | 
 radiation_amount     | numeric(10,6)               | 
 total_day_minutes    | integer                     | 
 total_night_minutes  | integer                     | 
 total_instr_minutes  | integer                     | 
 approach_type_id     | integer                     | 
 note                 | character varying           | 
 created_at           | timestamp without time zone | 
 updated_at           | timestamp without time zone | 
 flight_type_id       | integer                     | 
 owner_id             | integer                     | 
 night_landing        | boolean                     | 
 load_filename        | character varying           | 
 recalc               | boolean                     | 
Indexes:
    "flights_pkey" PRIMARY KEY, btree (id)
    "flights_id_idx" btree (id) WHERE status::text = 'COMMITED'::text
    "index_flights_combined" btree (name, departure_airport_id, off_block)
    "index_flights_on_aircraft_id" btree (aircraft_id)
    "index_flights_on_approach_type_id" btree (approach_type_id)
    "index_flights_on_arrival_airport_id" btree (arrival_airport_id)
    "index_flights_on_created_at" btree (created_at)
    "index_flights_on_departure_airport_id" btree (departure_airport_id)
    "index_flights_on_flight_type_id" btree (flight_type_id)
    "index_flights_on_off_block" btree (off_block)
    "index_flights_on_on_block" btree (on_block)
    "index_flights_on_owner_id" btree (owner_id)
自动真空:

app=> show autovacuum;
 autovacuum 
------------
 on
(1 row)
分析:

app=> analyze verbose flights;
INFO:  analyzing "public.flights"
INFO:  "flights": scanned 30000 of 80606 pages, containing 1161009 live rows and 0 dead rows; 30000 rows in sample, 3122535 estimated total rows
ANALYZE
解释输出:

app=> explain (analyze, buffers) select f.id from flights f left join aggregated_flights af on af.flight_id = f.id where af.flight_id is null and f.status = 'COMMITED' limit 100;

Limit  (cost=7.25..68.59 rows=100 width=4) (actual time=58744.490..58744.604 rows=100 loops=1)
  Buffers: shared hit=367361 read=248982
  ->  Merge Anti Join  (cost=7.25..1829081.46 rows=2981880 width=4) (actual time=58744.489..58744.586 rows=100 loops=1)
        Merge Cond: (f.id = af.flight_id)
        Buffers: shared hit=367361 read=248982
        ->  Index Scan using flights_id_idx on flights f  (cost=0.43..743949.15 rows=3106090 width=4) (actual time=0.066..24170.693 rows=3106983 loops=1)
              Buffers: shared hit=316162 read=85698
        ->  Index Only Scan using index_aggregated_flights_on_flight_id_and_flight_relation_id on aggregated_flights af  (cost=0.56..886207.11 rows=15357503 width=4) (actual time=0.014..31282.777 rows=15360252 loops=1)
              Heap Fetches: 0
              Buffers: shared hit=51199 read=163284
Planning time: 246.341 ms
Execution time: 58744.695 ms
Update我在
aggregated\u flights
表中添加了一个索引,就在flights\u id上。这确实加快了查询速度,但我仍然认为10秒有点多

app=> explain (analyze, buffers) select f.id from flights f left join aggregated_flights af on af.flight_id = f.id where af.flight_id is null and f.status = 'COMMITED' limit 1000;
                                                                                          QUERY PLAN                                                                                           
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=3.83..453.80 rows=1000 width=4) (actual time=9986.052..9986.508 rows=470 loops=1)
   Buffers: shared hit=365265 read=126777
   ->  Merge Anti Join  (cost=3.83..1341784.78 rows=2981880 width=4) (actual time=9986.050..9986.437 rows=470 loops=1)
         Merge Cond: (f.id = af.flight_id)
         Buffers: shared hit=365265 read=126777
         ->  Index Scan using flights_id_idx on flights f  (cost=0.43..743949.15 rows=3106090 width=4) (actual time=0.935..3891.800 rows=3107353 loops=1)
               Buffers: shared hit=317084 read=84797
         ->  Index Only Scan using aggregated_flights_flight_id_idx on aggregated_flights af  (cost=0.43..398876.22 rows=15360252 width=4) (actual time=0.023..3270.955 rows=15360252 loops=1)
               Heap Fetches: 0
               Buffers: shared hit=48181 read=41980
 Planning time: 53.676 ms
 Execution time: 9986.603 ms
(12 rows)

使用连接列上的两个索引,其中一个是局部的,以适应
WHERE
条件,可以尽可能提高查询速度

PostgreSQL方面唯一的改进是对部分索引进行只索引扫描。为此,您应该使用PostgreSQL 9.6或更高版本,其中只支持对部分索引进行索引扫描


除此之外,最好的优化是为机器提供足够的RAM来缓存整个数据库(或至少相关的索引),这样就不必从磁盘读取数据。您可以使用将表或索引加载到缓存中。

您可以为两个查询提供解释分析的输出吗?@Eelke是的,当然-将其添加到问题中这是一个简单的
EXPLAIN
输出,
EXPLAIN(ANALYZE,buffers)
提供的信息会更多helpful@a_horse_with_no_name啊,好,,我没有意识到这一点。按要求添加。您是否将查询与解释结果交换?因为现在看来,在这两种情况下,使用status子句的查询速度都要快得多。(假设我对执行时间的解释是正确的)也许我在更新问题时犯了一个错误。现在应该是正确的。还检查了自动真空设置并运行了分析。现在清楚了吗?看起来不对。我在执行计划中没有看到条件
f.status='committed'
。你是对的,但这是我得到的输出。不过,它确实使用了索引'flights\u id\u idx',这是我创建的部分索引,所以我猜它正在被应用。我只是注意到连接可能使用了一个“错误”的索引-一个名为'index\u aggregated\u flights\u on\u flights\u id\u和'u flights\u relationship\u id'的索引。我将深入研究这个更新:只为
聚合航班上的
航班id
创建了一个索引。现在查询正在使用它,并将查询时间缩短到~10秒。还是有点多,我想。。。?