Sql postgres大表选择优化_Sql_Postgresql_Sqlperformance_Postgresql Performance

Sql postgres大表选择优化

sql postgresql

Sql postgres大表选择优化,sql,postgresql,sqlperformance,postgresql-performance,Sql,Postgresql,Sqlperformance,Postgresql Performance,我必须为许可软件将数据库提取到外部数据库服务器。 DB必须是Postgres，并且我无法更改应用程序中的select查询（无法更改源代码）表（必须是1个表）包含大约6,5M行，并且在主列（前缀）中有唯一的值所有请求均为读取请求，无插入/更新/删除，每天约有20万次选择，峰值为15 TPS 选择查询是： SELECT prefix, changeprefix, deletelast, outgroup, tariff FROM table WHERE '00436641997142' LIK

我必须为许可软件将数据库提取到外部数据库服务器。 DB必须是Postgres，并且我无法更改应用程序中的select查询（无法更改源代码）

表（必须是1个表）包含大约6,5M行，并且在主列（前缀）中有唯一的值

所有请求均为读取请求，无插入/更新/删除，每天约有20万次选择，峰值为15 TPS

选择查询是：

SELECT prefix, changeprefix, deletelast, outgroup, tariff FROM table 
WHERE '00436641997142' LIKE prefix 
AND company = 0  and ((current_time between timefrom and timeto) or (timefrom is null and timeto is null)) and (strpos("Day", cast(to_char(now(), 'ID') as varchar)) > 0  or "Day" is null )  
ORDER BY position('%' in prefix) ASC, char_length(prefix) DESC 
LIMIT 1;

 SELECT prefix, changeprefix, deletelast, outgroup, tariff FROM table 
 WHERE '00436641997142' LIKE prefix

解释并分析以下内容

Limit  (cost=406433.75..406433.75 rows=1 width=113) (actual time=1721.360..1721.361 rows=1 loops=1)
  ->  Sort  (cost=406433.75..406436.72 rows=1188 width=113) (actual time=1721.358..1721.358 rows=1 loops=1)
        Sort Key: ("position"((prefix)::text, '%'::text)), (char_length(prefix)) DESC
        Sort Method: quicksort  Memory: 25kB
        ->  Seq Scan on table  (cost=0.00..406427.81 rows=1188 width=113) (actual time=1621.159..1721.345 rows=1 loops=1)
              Filter: ((company = 0) AND ('00381691997142'::text ~~ (prefix)::text) AND ((strpos(("Day")::text, (to_char(now(), 'ID'::text))::text) > 0) OR ("Day" IS NULL)) AND (((('now'::cstring)::time with time zone >= (timefrom)::time with time zone) AN (...)
              Rows Removed by Filter: 6417130
Planning time: 0.165 ms
Execution time: 1721.404 ms`

查询最慢的部分是：

SELECT prefix, changeprefix, deletelast, outgroup, tariff FROM table 
WHERE '00436641997142' LIKE prefix 
AND company = 0  and ((current_time between timefrom and timeto) or (timefrom is null and timeto is null)) and (strpos("Day", cast(to_char(now(), 'ID') as varchar)) > 0  or "Day" is null )  
ORDER BY position('%' in prefix) ASC, char_length(prefix) DESC 
LIMIT 1;

 SELECT prefix, changeprefix, deletelast, outgroup, tariff FROM table 
 WHERE '00436641997142' LIKE prefix

生成1,6s（仅测试查询的这一部分）

部分查询单独测试：

Seq Scan on table  (cost=0.00..181819.07 rows=32086 width=113) (actual time=1488.359..1580.607 rows=1 loops=1)
  Filter: ('004366491997142'::text ~~ (prefix)::text)
  Rows Removed by Filter: 6417130
Planning time: 0.061 ms
Execution time: 1580.637 ms

关于数据本身： “prefix”列的前几位数字相同（前5位），其余数字不同且唯一

Postgres版本是9.5 我更改了Postgres的以下设置：

random-page-cost = 40
effective_cashe_size = 4GB
shared_buffer = 4GB
work_mem = 1GB

我尝试了几种索引类型（unique、gin、gist、hash），但在所有情况下都不使用索引（如上面的解释所述），结果速度是相同的。我也做了，但没有明显的改进：

vacuum analyze verbose table

请建议设置数据库和/或索引配置，以加快此查询的执行时间

当前硬件为 Win7上的i5、SSD、16GB RAM，但我可以选择购买更强大的硬件。据我所知，在读取（无插入/更新）占主导地位的情况下，更快的CPU内核比内核数量或磁盘速度更重要>请确认

增补1：添加9个索引后，也不使用索引

增补2： 1）我找到了不使用索引的原因：查询中的词序部分类似于原因。如果查询是：

SELECT prefix, changeprefix, deletelast, outgroup, tariff FROM table WHERE prefix like '00436641997142%'
AND company = 0  and 
((current_time between timefrom and timeto) or (timefrom is null and timeto is null)) and (strpos("Day", cast(to_char(now(), 'ID') as varchar)) > 0  or "Day" is null )
 ORDER BY position('%' in prefix) ASC, char_length(prefix) DESC LIMIT 1

它使用索引

注意区别：

... WHERE '00436641997142%' like prefix ...

正确使用索引的查询：

... WHERE prefix like '00436641997142%' ...

既然我不能改变查询本身，有没有办法克服这个问题？我可以更改数据和Postgres设置，但不能查询本身

2）另外，为了使用parallel seq.scan，我安装了Postgres 9.6版本。在这种情况下，仅当查询的最后一部分被复制时才使用并行扫描。因此，我的问题是：

SELECT prefix, changeprefix, deletelast, outgroup, tariff FROM table WHERE '00436641997142' LIKE prefix 
AND company = 0  and 
((current_time between timefrom and timeto) or (timefrom is null and timeto is null))
 ORDER BY position('%' in prefix) ASC, char_length(prefix) DESC LIMIT 1

使用并行模式

知道如何强制原始查询（我无法更改查询）吗

使用平行顺序。扫描？

您应该通过添加适当的运算符类来更改索引，具体如下：

运算符类text_pattern_ops、varchar_pattern_ops和 bpchar_pattern_ops支持类型text、varchar、，和char。与默认运算符的差异类是指严格按字符比较值字符，而不是根据特定于区域设置的排序规则规则。这使得这些运算符类适合查询使用涉及模式匹配表达式（如或POSIX正则表达式表达式），当数据库不使用标准的“C”语言环境时。例如，您可以对varchar列进行如下索引：

在test_表上创建索引test_索引（col varchar_pattern_ops）
很难为像strin-like-pattern
这样的查询建立索引，因为通配符（%and u）可以无处不在
我可以提出一个高风险的解决方案：
稍微重新设计桌子-使其可转位。再添加两列固定宽度的prefix\u low
和prefix\u high
——例如char（32）
，或任务所需的任意长度。还为前缀长度添加一个smallint
列。用匹配前缀和前缀长度的最低值和最高值填充它们。例如：
select rpad(rtrim('00436641997142%','%'), 32, '0') AS prefix_low, rpad(rtrim('00436641997142%','%'), 32, '9') AS prefix_high, length(rtrim('00436641997142%','%')) AS prefix_length;

       prefix_low                 |               prefix_high             |   prefix_length
----------------------------------+---------------------------------------+-----
 00436641997142000000000000000000 | 00436641997142999999999999999999      |   14


使用这些值创建索引
CREATE INDEX table_prefix_low_high_idx ON table (prefix_low, prefix_high);


对照表检查修改的请求：
SELECT prefix, changeprefix, deletelast, outgroup, tariff 
FROM table 
WHERE '00436641997142%' BETWEEN prefix_low AND prefix_high
  AND company = 0  
  AND ((current_time between timefrom and timeto) or (timefrom is null and timeto is null)) and (strpos("Day", cast(to_char(now(), 'ID') as varchar)) > 0  or "Day" is null )
ORDER BY prefix_length DESC 
LIMIT 1

检查它与索引的工作情况，尝试对其进行优化-为前缀添加/删除索引\u长度将其添加到索引之间，等等
现在您需要将查询重写到数据库。安装PgBouncer和。它允许您使用简单的python代码动态重写查询，如示例所示：
import re

def rewrite_query(username, query):
   q1=r"""^SELECT [^']*'(?P<id>\d+)%'[^'] ORDER BY (?P<position>\('%' in prefix\) ASC, char_length\(prefix\) LIMIT """
   if not re.match(q1, query):
      return query  # nothing to do with other queries
   else:
      new_query = # ... rewrite query here
   return new_query

重新导入
def rewrite_查询（用户名，查询）：
q1=r“^SELECT[^']*”（？P\d+）%[^']按（？P\（“%”前缀）ASC、字符长度（前缀）限制）排序”
如果没有重新匹配（q1，查询）：
返回查询#与其他查询无关
其他：
新建_查询=#…在此处重写查询
返回新的查询

运行pgBouncer并将其连接到数据库。尝试像应用程序一样发出不同的查询，并检查它们是如何被重写的。因为要处理文本，您必须调整regexp以匹配所有必需的查询并正确重写它们
代理就绪并调试后，将应用程序重新连接到pgBouncer
赞成者：

不更改应用程序
数据库的基本结构没有变化

相反：

额外的维护-您需要触发器来保留所有新列的实际数据
需要支持的额外工具
“重写”使用regexp，因此它与应用程序发出的特定查询密切相关。您需要运行它一段时间，并制定可靠的重写规则

进一步发展：
highjack在pgsql本身中解析了查询树
如果我正确理解了您的问题，创建重写查询的代理服务器可能是解决方案
这是一个例子
然后你可以在你的查询中将“LIKE”改为“=”，它会运行得更快。
你为什么用LIKE
而不是=
？我不能更改它>它是在我无权访问的源代码中。用“=”而不是“LIKE”进行查询是3倍快，但我无法更改它随机页面成本=40
为什么那么高？在SSD上…默认值为4，我增加了以查看是否有任何差异。基本上响应时间与4或8的响应时间相同，返回多少行？尝试优化查询的这一部分。我不知道如何优化条件，如const-like-columnn
（不更改查询/数据结构/应用程序逻辑）。我刚刚添加了：在erm_表上创建索引测试索引（前缀bpchar_pattern_ops）；因为前缀字段是字符类型，但查询速度是