PostgreSQL在按索引过滤方面非常慢_Postgresql

PostgreSQL在按索引过滤方面非常慢

postgresql

PostgreSQL在按索引过滤方面非常慢,postgresql,Postgresql,我有一个日志表，其中有60M条记录，并试图用索引对UserId字段执行测试查询。不幸的是，它非常慢，扫描大约需要13秒。我用8GB RAM、4个CPU和SSD在DO上托管数据库表结构 create table "Log" ( "Id" integer generated by default as identity constraint "PK_Log" primary key, "Result"

我有一个日志表，其中有60M条记录，并试图用索引对UserId字段执行测试查询。不幸的是，它非常慢，扫描大约需要13秒。我用8GB RAM、4个CPU和SSD在DO上托管数据库

表结构

create table "Log"
(
    "Id"                    integer generated by default as identity
        constraint "PK_Log"
            primary key,
    "Result"                boolean   not null,
    "DateStamp"             timestamp not null,
    "ConversionTime"        integer   not null,
    "ConversionCost"        integer   not null,
    "Converter"             text,
    "SourceFileFormat"      text,
    "DestinationFileFormat" text,
    "Agent"                 text,
    "AgentVersion"          text,
    "Os"                    text,
    "OsVersion"             text,
    "Ip"                    text,
    "Source"                text,
    "Error"                 text,
    "UserId"                integer   not null
        constraint "FK_Log_User_UserId"
            references "User"
            on delete cascade,
    "TokenId"               text
);

alter table "Log"
    owner to doadmin;

create index "IX_Log_ConversionCost"
    on "Log" ("ConversionCost");

create index "IX_Log_DateStamp"
    on "Log" ("DateStamp");

create index "IX_Log_Result"
    on "Log" ("Result");

create index "IX_Log_UserId"
    on "Log" ("UserId");

create index "IX_Log_UserId-all"
    on "Log" ("UserId", "DateStamp", "Result", "ConversionCost");

实际的测试SQL查询花费的时间比应该的要多

set random_page_cost=1;
set seq_page_cost=4;

EXPLAIN (buffers, analyse )
SELECT
    "UserId"
FROM "Log"
WHERE "UserId" = 7841

GROUP BY 1;

执行计划

Group  (cost=1000.57..1917660.13 rows=626 width=4) (actual time=2.404..13845.150 rows=1 loops=1)
  Group Key: "UserId"
  Buffers: shared hit=6038 read=857126
  ->  Gather  (cost=1000.57..1917657.00 rows=1252 width=4) (actual time=2.389..13895.478 rows=3 loops=1)
        Workers Planned: 2
        Workers Launched: 2
        Buffers: shared hit=6038 read=857126
        ->  Group  (cost=0.56..1916531.80 rows=626 width=4) (actual time=1.134..13841.871 rows=1 loops=3)
              Group Key: "UserId"
              Buffers: shared hit=6038 read=857126
              ->  Parallel Index Only Scan using "IX_Log_UserId" on "Log"  (cost=0.56..1913347.27 rows=1273813 width=4) (actual time=1.132..13721.591 rows=1064916 loops=3)
                    Index Cond: ("UserId" = 7841)
                    Heap Fetches: 3194749
                    Buffers: shared hit=6038 read=857126
Planning Time: 0.097 ms
Execution Time: 13895.539 ms

Group  (cost=1595170.54..1595171.02 rows=2 width=1) (actual time=29233.895..29233.899 rows=2 loops=1)
  Group Key: "Result"
  Buffers: shared hit=72330 read=362500
  ->  Gather Merge  (cost=1595170.54..1595171.01 rows=4 width=1) (actual time=29233.894..29286.971 rows=6 loops=1)
        Workers Planned: 2
        Workers Launched: 2
        Buffers: shared hit=199587 read=1075489
        ->  Sort  (cost=1594170.52..1594170.53 rows=2 width=1) (actual time=29227.651..29227.651 rows=2 loops=3)
              Sort Key: "Result"
              Sort Method: quicksort  Memory: 25kB
              Worker 0:  Sort Method: quicksort  Memory: 25kB
              Worker 1:  Sort Method: quicksort  Memory: 25kB
              Buffers: shared hit=199587 read=1075489
              ->  Partial HashAggregate  (cost=1594170.49..1594170.51 rows=2 width=1) (actual time=29227.616..29227.617 rows=2 loops=3)
                    Group Key: "Result"
                    Buffers: shared hit=199571 read=1075489
                    ->  Parallel Seq Scan on "Log"  (cost=0.00..1590985.96 rows=1273813 width=1) (actual time=0.220..28997.691 rows=1064916 loops=3)
                          Filter: ("UserId" = 7841)
                          Rows Removed by Filter: 19154346
                          Buffers: shared hit=199571 read=1075489
Planning Time: 3.910 ms
Execution Time: 29287.100 ms

这与解释时间有关

set random_page_cost=1;
set seq_page_cost=1;

explain (analyze, buffers, timing)
SELECT
    "UserId"
FROM "Log"
WHERE "UserId" = 7841

GROUP BY 1;

Group  (cost=33076.49..1327372.02 rows=626 width=4) (actual time=769.610..14062.469 rows=1 loops=1)
  Group Key: "UserId"
  Buffers: shared hit=3541 read=853584
  ->  Gather  (cost=33076.49..1327368.89 rows=1252 width=4) (actual time=769.167..14139.210 rows=3 loops=1)
        Workers Planned: 2
        Workers Launched: 2
        Buffers: shared hit=3541 read=853584
        ->  Group  (cost=32076.49..1326243.69 rows=626 width=4) (actual time=759.835..14043.414 rows=1 loops=3)
              Group Key: "UserId"
              Buffers: shared hit=3541 read=853584
              ->  Parallel Bitmap Heap Scan on "Log"  (cost=32076.49..1323059.16 rows=1273813 width=4) (actual time=759.611..13614.695 rows=1064916 loops=3)
                    Recheck Cond: ("UserId" = 7841)
                    Heap Blocks: exact=284018
                    Buffers: shared hit=3541 read=853584
                    ->  Bitmap Index Scan on "IX_Log_UserId"  (cost=0.00..31312.20 rows=3057152 width=0) (actual time=454.977..454.977 rows=3194749 loops=1)
                          Index Cond: ("UserId" = 7841)
                          Buffers: shared hit=3541 read=5192
Planning Time: 0.140 ms
Execution Time: 14141.648 ms

另一个真正的查询，结果也一样慢

set random_page_cost=1;
set seq_page_cost=1;

explain (analyze, buffers, timing)
SELECT
    "Result"
FROM "Log"
WHERE "UserId" = 7841

GROUP BY 1;

执行计划

Group  (cost=1000.57..1917660.13 rows=626 width=4) (actual time=2.404..13845.150 rows=1 loops=1)
  Group Key: "UserId"
  Buffers: shared hit=6038 read=857126
  ->  Gather  (cost=1000.57..1917657.00 rows=1252 width=4) (actual time=2.389..13895.478 rows=3 loops=1)
        Workers Planned: 2
        Workers Launched: 2
        Buffers: shared hit=6038 read=857126
        ->  Group  (cost=0.56..1916531.80 rows=626 width=4) (actual time=1.134..13841.871 rows=1 loops=3)
              Group Key: "UserId"
              Buffers: shared hit=6038 read=857126
              ->  Parallel Index Only Scan using "IX_Log_UserId" on "Log"  (cost=0.56..1913347.27 rows=1273813 width=4) (actual time=1.132..13721.591 rows=1064916 loops=3)
                    Index Cond: ("UserId" = 7841)
                    Heap Fetches: 3194749
                    Buffers: shared hit=6038 read=857126
Planning Time: 0.097 ms
Execution Time: 13895.539 ms

Group  (cost=1595170.54..1595171.02 rows=2 width=1) (actual time=29233.895..29233.899 rows=2 loops=1)
  Group Key: "Result"
  Buffers: shared hit=72330 read=362500
  ->  Gather Merge  (cost=1595170.54..1595171.01 rows=4 width=1) (actual time=29233.894..29286.971 rows=6 loops=1)
        Workers Planned: 2
        Workers Launched: 2
        Buffers: shared hit=199587 read=1075489
        ->  Sort  (cost=1594170.52..1594170.53 rows=2 width=1) (actual time=29227.651..29227.651 rows=2 loops=3)
              Sort Key: "Result"
              Sort Method: quicksort  Memory: 25kB
              Worker 0:  Sort Method: quicksort  Memory: 25kB
              Worker 1:  Sort Method: quicksort  Memory: 25kB
              Buffers: shared hit=199587 read=1075489
              ->  Partial HashAggregate  (cost=1594170.49..1594170.51 rows=2 width=1) (actual time=29227.616..29227.617 rows=2 loops=3)
                    Group Key: "Result"
                    Buffers: shared hit=199571 read=1075489
                    ->  Parallel Seq Scan on "Log"  (cost=0.00..1590985.96 rows=1273813 width=1) (actual time=0.220..28997.691 rows=1064916 loops=3)
                          Filter: ("UserId" = 7841)
                          Rows Removed by Filter: 19154346
                          Buffers: shared hit=199571 read=1075489
Planning Time: 3.910 ms
Execution Time: 29287.100 ms

为什么设置seq_page_cost=4？让它比随机访问更昂贵是没有意义的。您至少应该将两个值设置为相同的值。从单列索引读取863164块1064916行似乎过高。在那张桌子上运行真空吸尘器会改变什么吗？@a_horse_，名称见seq_page_，成本为1，并使两个值的切割时间从13到10相同。还是太长了。真空满了，不要做任何改进。你能添加新的计划吗？也许启用track\u io\u timing，然后运行explain analyze、buffers、timing？查询本身在一开始就没有什么意义。如果您想使用select exists select*from log（其中userid=7841）查找该用户ID是否存在，则可能要快得多，然后向我们展示真正的查询。就目前情况而言，这确实不是一个有效的测试，因为现有的解决方案会快得多。为什么设置seq_page_cost=4？让它比随机访问更昂贵是没有意义的。您至少应该将两个值设置为相同的值。从单列索引读取863164块1064916行似乎过高。在那张桌子上运行真空吸尘器会改变什么吗？@a_horse_，名称见seq_page_，成本为1，并使两个值的切割时间从13到10相同。还是太长了。真空满了，不要做任何改进。你能添加新的计划吗？也许启用track\u io\u timing，然后运行explain analyze、buffers、timing？查询本身在一开始就没有什么意义。如果您想使用select exists select*from log（其中userid=7841）查找该用户ID是否存在，则可能要快得多，然后向我们展示真正的查询。就目前而言，这实际上不是一个有效的测试，因为现有的解决方案会快得多