Postgresql顺序扫描降低了5亿行的性能

Postgresql顺序扫描降低了5亿行的性能,sql,postgresql,performance,Sql,Postgresql,Performance,我正试图用Postgresql 11选择包含约5亿行的表的所有行 在一个拥有32个CPU内核和256GB RAM的虚拟机上,以及一个读/写速度高达200MB/s的SSD上,这需要约15分钟的时间,这比我看到人们在~1s()内选择100万行时所预期的要高得多,尽管他们不会对行进行排序 此表上的查询主要包括对80%到100%的表执行SELECT操作,并对datetime进行筛选,其中行按datetime排序 以下是表格的说明: postgres=# \d+ ohlcv;

我正试图用Postgresql 11选择包含约5亿行的表的所有行

在一个拥有32个CPU内核和256GB RAM的虚拟机上,以及一个读/写速度高达200MB/s的SSD上,这需要约15分钟的时间,这比我看到人们在~1s()内选择100万行时所预期的要高得多,尽管他们不会对行进行排序

此表上的查询主要包括对80%到100%的表执行
SELECT
操作,并对datetime进行筛选,其中行按datetime排序

以下是表格的说明:

postgres=# \d+ ohlcv;
                                              Table "public.ohlcv"
  Column  |            Type             | Collation | Nullable | Default | Storage | Stats target | Description
----------+-----------------------------+-----------+----------+---------+---------+--------------+-------------
 datetime | timestamp without time zone |           | not null |         | plain   |              |
 open     | real                        |           | not null |         | plain   |              |
 high     | real                        |           | not null |         | plain   |              |
 low      | real                        |           | not null |         | plain   |              |
 close    | real                        |           | not null |         | plain   |              |
 volume   | integer                     |           | not null |         | plain   |              |
Indexes:
    "brin_datetime" brin (datetime)
一次添加所有行,然后添加brin索引

以下是查询,它似乎使用8个CPU而不是32个可用CPU:

postgres=# explain analyze
postgres-# select * from ohlcv order by datetime;
                                                                 QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
 Gather Merge  (cost=20712603.78..96175039.28 rows=610230784 width=28) (actual time=175360.971..721544.003 rows=610230801 loops=1)
   Workers Planned: 8
   Workers Launched: 8
   ->  Sort  (cost=20711603.64..20902300.76 rows=76278848 width=28) (actual time=125461.665..170299.327 rows=67803422 loops=9)
         Sort Key: datetime
         Sort Method: external merge  Disk: 2429104kB
         Worker 0:  Sort Method: external merge  Disk: 2404680kB
         Worker 1:  Sort Method: external merge  Disk: 2406280kB
         Worker 2:  Sort Method: external merge  Disk: 2656672kB
         Worker 3:  Sort Method: external merge  Disk: 2635904kB
         Worker 4:  Sort Method: external merge  Disk: 2637600kB
         Worker 5:  Sort Method: external merge  Disk: 2643400kB
         Worker 6:  Sort Method: external merge  Disk: 2437272kB
         Worker 7:  Sort Method: external merge  Disk: 2439272kB
         ->  Parallel Seq Scan on ohlcv  (cost=0.00..5249780.48 rows=76278848 width=28) (actual time=0.049..42506.065 rows=67803422 loops=9)
 Planning Time: 0.566 ms
 Execution Time: 1059414.396 ms
(17 rows)
以下是postgres的配置:

 check_function_bodies                  | on                                       | Check function bodies during CREATE FUNCTION.
 checkpoint_completion_target           | 0.5                                      | Time spent flushing dirty buffers during checkpoint, as fraction of checkpoint interval.
 checkpoint_flush_after                 | 256kB                                    | Number of pages after which previously performed writes are flushed to disk.
 checkpoint_timeout                     | 5min                                     | Sets the maximum time between automatic WAL checkpoints.
 checkpoint_warning                     | 30s                                      | Enables warnings if checkpoint segments are filled more frequently than this.
 client_encoding                        | UTF8                                     | Sets the client's character set encoding.
 client_min_messages                    | notice                                   | Sets the message levels that are sent to the client.
 cluster_name                           |                                          | Sets the name of the cluster, which is included in the process title.
 commit_delay                           | 0                                        | Sets the delay in microseconds between transaction commit and flushing WAL to disk.
 commit_siblings                        | 5                                        | Sets the minimum concurrent open transactions before performing commit_delay.
 config_file                            | /var/lib/postgresql/data/postgresql.conf | Sets the server's main configuration file.
 constraint_exclusion                   | partition                                | Enables the planner to use constraints to optimize queries.
 cpu_index_tuple_cost                   | 0.005                                    | Sets the planner's estimate of the cost of processing each index entry during an index scan.
 cpu_operator_cost                      | 0.0025                                   | Sets the planner's estimate of the cost of processing each operator or function call.
 cpu_tuple_cost                         | 0.01                                     | Sets the planner's estimate of the cost of processing each tuple (row).
 cursor_tuple_fraction                  | 0.1                                      | Sets the planner's estimate of the fraction of a cursor's rows that will be retrieved.
 data_checksums                         | off                                      | Shows whether data checksums are turned on for this cluster.
 data_directory                         | /var/lib/postgresql/data                 | Sets the server's data directory.
 data_directory_mode                    | 0700                                     | Mode of the data directory.
 data_sync_retry                        | off                                      | Whether to continue running after a failure to sync data files.
 DateStyle                              | ISO, MDY                                 | Sets the display format for date and time values.
 db_user_namespace                      | off                                      | Enables per-database user names.
 deadlock_timeout                       | 1s                                       | Sets the time to wait on a lock before checking for deadlock.
 debug_assertions                       | off                                      | Shows whether the running server has assertion checks enabled.
 debug_pretty_print                     | on                                       | Indents parse and plan tree displays.
 debug_print_parse                      | off                                      | Logs each query's parse tree.
 debug_print_plan                       | off                                      | Logs each query's execution plan.
 debug_print_rewritten                  | off                                      | Logs each query's rewritten parse tree.
 default_statistics_target              | 100                                      | Sets the default statistics target.
 default_tablespace                     |                                          | Sets the default tablespace to create tables and indexes in.
 default_text_search_config             | pg_catalog.english                       | Sets default text search configuration.
 default_transaction_deferrable         | off                                      | Sets the default deferrable status of new transactions.
 default_transaction_isolation          | read committed                           | Sets the transaction isolation level of each new transaction.
 default_transaction_read_only          | off                                      | Sets the default read-only status of new transactions.
 default_with_oids                      | off                                      | Create new tables with OIDs by default.
 dynamic_library_path                   | $libdir                                  | Sets the path for dynamically loadable modules.
 dynamic_shared_memory_type             | posix                                    | Selects the dynamic shared memory implementation used.
 effective_cache_size                   | 4GB                                      | Sets the planner's assumption about the total size of the data caches.
 effective_io_concurrency               | 1                                        | Number of simultaneous requests that can be handled efficiently by the disk subsystem.
 enable_bitmapscan                      | on                                       | Enables the planner's use of bitmap-scan plans.
 enable_gathermerge                     | on                                       | Enables the planner's use of gather merge plans.
 enable_hashagg                         | on                                       | Enables the planner's use of hashed aggregation plans.
 enable_hashjoin                        | on                                       | Enables the planner's use of hash join plans.
 enable_indexonlyscan                   | on                                       | Enables the planner's use of index-only-scan plans.
 enable_indexscan                       | on                                       | Enables the planner's use of index-scan plans.
 enable_material                        | on                                       | Enables the planner's use of materialization.
 enable_mergejoin                       | on                                       | Enables the planner's use of merge join plans.
 enable_nestloop                        | on                                       | Enables the planner's use of nested-loop join plans.
 enable_parallel_append                 | on                                       | Enables the planner's use of parallel append plans.
 enable_parallel_hash                   | on                                       | Enables the planner's use of parallel hash plans.
 enable_partition_pruning               | on                                       | Enable plan-time and run-time partition pruning.
 enable_partitionwise_aggregate         | off                                      | Enables partitionwise aggregation and grouping.
 enable_partitionwise_join              | off                                      | Enables partitionwise join.
 enable_seqscan                         | on                                       | Enables the planner's use of sequential-scan plans.
 enable_sort                            | on                                       | Enables the planner's use of explicit sort steps.
 enable_tidscan                         | on                                       | Enables the planner's use of TID scan plans.
 escape_string_warning                  | on                                       | Warn about backslash escapes in ordinary string literals.
 event_source                           | PostgreSQL                               | Sets the application name used to identify PostgreSQL messages in the event log.
 exit_on_error                          | off                                      | Terminate session on any error.
 external_pid_file                      |                                          | Writes the postmaster PID to the specified file.
 extra_float_digits                     | 0                                        | Sets the number of digits displayed for floating-point values.
 force_parallel_mode                    | off                                      | Forces use of parallel query facilities.
 from_collapse_limit                    | 8                                        | Sets the FROM-list size beyond which subqueries are not collapsed.
 fsync                                  | on                                       | Forces synchronization of updates to disk.
 full_page_writes                       | on                                       | Writes full pages to WAL when first modified after a checkpoint.
 geqo                                   | on                                       | Enables genetic query optimization.
 geqo_effort                            | 5                                        | GEQO: effort is used to set the default for other GEQO parameters.
 geqo_generations                       | 0                                        | GEQO: number of iterations of the algorithm.
 geqo_pool_size                         | 0                                        | GEQO: number of individuals in the population.
 geqo_seed                              | 0                                        | GEQO: seed for random path selection.
 geqo_selection_bias                    | 2                                        | GEQO: selective pressure within the population.
 geqo_threshold                         | 12                                       | Sets the threshold of FROM items beyond which GEQO is used.
 gin_fuzzy_search_limit                 | 0                                        | Sets the maximum allowed result for exact search by GIN.
 gin_pending_list_limit                 | 4MB                                      | Sets the maximum size of the pending list for GIN index.
 hba_file                               | /var/lib/postgresql/data/pg_hba.conf     | Sets the server's "hba" configuration file.
 hot_standby                            | on                                       | Allows connections and queries during recovery.
 hot_standby_feedback                   | off                                      | Allows feedback from a hot standby to the primary that will avoid query conflicts.
 huge_pages                             | try                                      | Use of huge pages on Linux or Windows.
 ident_file                             | /var/lib/postgresql/data/pg_ident.conf   | Sets the server's "ident" configuration file.
 idle_in_transaction_session_timeout    | 0                                        | Sets the maximum allowed duration of any idling transaction.
 ignore_checksum_failure                | off                                      | Continues processing after a checksum failure.
 ignore_system_indexes                  | off                                      | Disables reading from system indexes.
 integer_datetimes                      | on                                       | Datetimes are integer based.
 IntervalStyle                          | postgres                                 | Sets the display format for interval values.
 jit                                    | off                                      | Allow JIT compilation.
 jit_above_cost                         | 100000                                   | Perform JIT compilation if query is more expensive.
 jit_debugging_support                  | off                                      | Register JIT compiled function with debugger.
 jit_dump_bitcode                       | off                                      | Write out LLVM bitcode to facilitate JIT debugging.
 jit_expressions                        | on                                       | Allow JIT compilation of expressions.
 jit_inline_above_cost                  | 500000                                   | Perform JIT inlining if query is more expensive.
 jit_optimize_above_cost                | 500000                                   | Optimize JITed functions if query is more expensive.
 jit_profiling_support                  | off                                      | Register JIT compiled function with perf profiler.
 jit_provider                           | llvmjit                                  | JIT provider to use.
 jit_tuple_deforming                    | on                                       | Allow JIT compilation of tuple deforming.
 join_collapse_limit                    | 8                                        | Sets the FROM-list size beyond which JOIN constructs are not flattened.
 krb_caseins_users                      | off                                      | Sets whether Kerberos and GSSAPI user names should be treated as case-insensitive.
 krb_server_keyfile                     | FILE:/etc/postgresql-common/krb5.keytab  | Sets the location of the Kerberos server key file.
 lc_collate                             | en_US.utf8                               | Shows the collation order locale.
 lc_ctype                               | en_US.utf8                               | Shows the character classification and case conversion locale.
 lc_messages                            | en_US.utf8                               | Sets the language in which messages are displayed.
 lc_monetary                            | en_US.utf8                               | Sets the locale for formatting monetary amounts.
 lc_numeric                             | en_US.utf8                               | Sets the locale for formatting numbers.
 lc_time                                | en_US.utf8                               | Sets the locale for formatting date and time values.
 listen_addresses                       | *                                        | Sets the host name or IP address(es) to listen to.
 lo_compat_privileges                   | off                                      | Enables backward compatibility mode for privilege checks on large objects.
 local_preload_libraries                |                                          | Lists unprivileged shared libraries to preload into each backend.
 lock_timeout                           | 0                                        | Sets the maximum allowed duration of any wait for a lock.
 log_autovacuum_min_duration            | -1                                       | Sets the minimum execution time above which autovacuum actions will be logged.
 log_checkpoints                        | off                                      | Logs each checkpoint.
 log_connections                        | off                                      | Logs each successful connection.
 log_destination                        | stderr                                   | Sets the destination for server log output.
 log_directory                          | log                                      | Sets the destination directory for log files.
 log_disconnections                     | off                                      | Logs end of a session, including duration.
 log_duration                           | off                                      | Logs the duration of each completed SQL statement.
 log_error_verbosity                    | default                                  | Sets the verbosity of logged messages.
 log_executor_stats                     | off                                      | Writes executor performance statistics to the server log.
 log_file_mode                          | 0600                                     | Sets the file permissions for log files.
 log_filename                           | postgresql-%Y-%m-%d_%H%M%S.log           | Sets the file name pattern for log files.
 log_hostname                           | off                                      | Logs the host name in the connection logs.
 log_line_prefix                        | %m [%p]                                  | Controls information prefixed to each log line.
 log_lock_waits                         | off                                      | Logs long lock waits.
 log_min_duration_statement             | -1                                       | Sets the minimum execution time above which statements will be logged.
 log_min_error_statement                | error                                    | Causes all statements generating error at or above this level to be logged.
 log_min_messages                       | warning                                  | Sets the message levels that are logged.
 log_parser_stats                       | off                                      | Writes parser performance statistics to the server log.
 log_planner_stats                      | off                                      | Writes planner performance statistics to the server log.
 log_replication_commands               | off                                      | Logs each replication command.
 log_rotation_age                       | 1d                                       | Automatic log file rotation will occur after N minutes.
 log_rotation_size                      | 10MB                                     | Automatic log file rotation will occur after N kilobytes.
 log_statement                          | none                                     | Sets the type of statements logged.
 log_statement_stats                    | off                                      | Writes cumulative performance statistics to the server log.
 log_temp_files                         | -1                                       | Log the use of temporary files larger than this number of kilobytes.
 log_timezone                           | UTC                                      | Sets the time zone to use in log messages.
 log_truncate_on_rotation               | off                                      | Truncate existing log files of same name during log rotation.
 logging_collector                      | off                                      | Start a subprocess to capture stderr output and/or csvlogs into log files.
 maintenance_work_mem                   | 64MB                                     | Sets the maximum memory to be used for maintenance operations.
 max_connections                        | 100                                      | Sets the maximum number of concurrent connections.
 max_files_per_process                  | 1000                                     | Sets the maximum number of simultaneously open files for each server process.
 max_function_args                      | 100                                      | Shows the maximum number of function arguments.
 max_identifier_length                  | 63                                       | Shows the maximum identifier length.
 max_index_keys                         | 32                                       | Shows the maximum number of index keys.
 max_locks_per_transaction              | 64                                       | Sets the maximum number of locks per transaction.
 max_logical_replication_workers        | 4                                        | Maximum number of logical replication worker processes.
 max_parallel_maintenance_workers       | 2                                        | Sets the maximum number of parallel processes per maintenance operation.
 max_parallel_workers                   | 32                                       | Sets the maximum number of parallel workers that can be active at one time.
 max_parallel_workers_per_gather        | 32                                       | Sets the maximum number of parallel processes per executor node.
 max_pred_locks_per_page                | 2                                        | Sets the maximum number of predicate-locked tuples per page.
 max_pred_locks_per_relation            | -2                                       | Sets the maximum number of predicate-locked pages and tuples per relation.
 max_pred_locks_per_transaction         | 64                                       | Sets the maximum number of predicate locks per transaction.
 max_prepared_transactions              | 0                                        | Sets the maximum number of simultaneously prepared transactions.
 max_replication_slots                  | 10                                       | Sets the maximum number of simultaneously defined replication slots.
 max_stack_depth                        | 2MB                                      | Sets the maximum stack depth, in kilobytes.
 max_standby_archive_delay              | 30s                                      | Sets the maximum delay before canceling queries when a hot standby server is processing archived WAL data.
 max_standby_streaming_delay            | 30s                                      | Sets the maximum delay before canceling queries when a hot standby server is processing streamed WAL data.
 max_sync_workers_per_subscription      | 2                                        | Maximum number of table synchronization workers per subscription.
 max_wal_senders                        | 10                                       | Sets the maximum number of simultaneously running WAL sender processes.
 max_wal_size                           | 1GB                                      | Sets the WAL size that triggers a checkpoint.
 max_worker_processes                   | 32                                       | Maximum number of concurrent worker processes.
 min_parallel_index_scan_size           | 512kB                                    | Sets the minimum amount of index data for a parallel scan.
 min_parallel_table_scan_size           | 8MB                                      | Sets the minimum amount of table data for a parallel scan.
 min_wal_size                           | 80MB                                     | Sets the minimum size to shrink the WAL to.
 old_snapshot_threshold                 | -1                                       | Time before a snapshot is too old to read pages changed after the snapshot was taken.
 operator_precedence_warning            | off                                      | Emit a warning for constructs that changed meaning since PostgreSQL 9.4.
 parallel_leader_participation          | on                                       | Controls whether Gather and Gather Merge also run subplans.
 parallel_setup_cost                    | 1000                                     | Sets the planner's estimate of the cost of starting up worker processes for parallel query.
 parallel_tuple_cost                    | 0.1                                      | Sets the planner's estimate of the cost of passing each tuple (row) from worker to master backend.
 password_encryption                    | md5                                      | Encrypt passwords.
 port                                   | 5432                                     | Sets the TCP port the server listens on.
 post_auth_delay                        | 0                                        | Waits N seconds on connection startup after authentication.
 pre_auth_delay                         | 0                                        | Waits N seconds on connection startup before authentication.
 quote_all_identifiers                  | off                                      | When generating SQL fragments, quote all identifiers.
 random_page_cost                       | 4                                        | Sets the planner's estimate of the cost of a nonsequentially fetched disk page.
 restart_after_crash                    | on                                       | Reinitialize server after backend crash.
 row_security                           | on                                       | Enable row security.
 search_path                            | "$user", public                          | Sets the schema search order for names that are not schema-qualified.
 segment_size                           | 1GB                                      | Shows the number of pages per disk file.
 seq_page_cost                          | 1   
 work_mem                               | 4MB                                      | Sets the maximum memory to be used for query workspaces.


是否可以将执行时间降低到几分钟或更少,或者是预期的执行时间?

只有几件事可以帮助完成此查询:

  • 实际的扫描似乎不是问题(它花费了42秒),但是如果表格可以保存在RAM中,它可能会更快

  • 您的主要问题是排序,PostgreSQL已经并行化了

    您可以调整以下几点:

    • 尽可能增加
      work\u mem
      ,这将加快排序速度


    • 增加
      max\u-worker\u进程
      (这将需要重新启动),
      max\u-parallel\u-workers
      max\u-parallel\u-workers\u-per\u-gather
      ,以便更多的内核可用于查询

      PostgreSQL有一个内部逻辑来计算一个表准备使用的最大并行工数:它将考虑与

      一样多的并行工。 log3(表大小/
      最小平行表大小

      您可以强制it部门使用比以下方面更多的流程:

      ALTER TABLE ohlcv SET (parallel_workers = 20);
      
      但是
      max\u parallel\u workers
      仍然是上限


如果表上没有删除和更新,并且数据是按排序顺序插入的,只要将
synchronize\u seqscans=off设置为“工作”\u mem设置为什么,就可以省略
order BY
子句?看起来它可以使用增加来避免这些外部排序。您可以尝试增加有效的io并发性,也许它会使用更多的并行线程。系统有什么类型的硬盘?work_mem设置为4MB,机器有一个大约200MB/s的SSD用于读/写,我根据解释分析输出更新了POST,看起来3GB的
work_mem
应该可以防止一切溢出到磁盘上。如果您也提高了
有效的io\U并发性
,您应该能够以较低的
工作内存
。我要补充的是,您的许多其他配置设置似乎不协调。如果所有内容都在SSD上,则
seq_page_cost
random_page_cost
应该更接近。类似地,
effective\u cache\u size
应至少设置为RAM的一半(假设机器上没有其他运行对象等)。
max\u worker\u进程
max\u parallel\u worker\u per\u聚集
已设置为32。我想问题是为什么博士后只使用8名员工,而不是使用所有可能的员工?我在答案中添加了一些关于这一点的内容。使用
ALTER TABLE
。使用
ALTER
确实允许postgres在最初几分钟内使用更多的内核,但大部分时间仍然是100%使用1个cpu,总体性能类似(~15mn),我还将
work\u mem
调整为8GB。postgres(或一般的relationnal db)真的适合这种总是以相同顺序返回数百万行的请求吗?有没有办法确保此表的行以排序和连续的方式存储在磁盘上,以获得更好的性能?我认为集群可以做到这一点,但排序仍然需要和没有集群一样多的时间。如果您经常运行此查询,我想知道为什么。为什么不只获取最新的行?您最好使用一个支持“索引组织表”或“集群索引”的数据库,或者它们选择如何调用该功能。行是否按排序顺序插入?它们曾经被修改过吗?它们是按排序顺序插入的,永远不会更新,新记录将被追加。我们需要所有/大部分行,因为处理这些数据的逻辑每次都在变化,因为这是一个研究项目,我们不能只选择最新的行并更新一些统计数据。关于聚集索引,这不是CLUSTER命令所做的吗?