Postgresql Postgres使用了错误的索引
我有一个问题:Postgresql Postgres使用了错误的索引,postgresql,indexing,postgresql-performance,sql-execution-plan,Postgresql,Indexing,Postgresql Performance,Sql Execution Plan,我有一个问题: EXPLAIN ANALYZE SELECT CAST(DATE(associationtime) AS text) AS date , cast(SUM(extract(epoch FROM disassociationtime) - extract(epoch FROM associationtim
EXPLAIN ANALYZE
SELECT CAST(DATE(associationtime) AS text) AS date ,
cast(SUM(extract(epoch
FROM disassociationtime) - extract(epoch
FROM associationtime)) AS bigint) AS sessionduration,
cast(SUM(tx) AS bigint)AS tx,
cast(SUM(rx) AS bigint) AS rx,
cast(SUM(dataRetries) AS bigint) AS DATA,
cast(SUM(rtsRetries) AS bigint) AS rts,
count(*)
FROM SESSION
WHERE ssid_id=42
AND ap_id=1731
AND DATE(associationtime)>=DATE('Tue Nov 04 00:00:00 MSK 2014')
AND DATE(associationtime)<=DATE('Thu Nov 20 00:00:00 MSK 2014')
GROUP BY(DATE(associationtime))
ORDER BY DATE(associationtime);
如您所见,查询使用三个字段进行扫描:ssid\u id、ap\u id和associationtime。我有一个索引:
ssid_pkey | btree | {id}
ap_pkey | btree | {id}
testingshit_pkey | btree | {one,two,three}
session_date_ssid_idx | btree | {ssid_id,date(associationtime),"date_trunc('hour'::text, associationtime)"}
session_pkey | btree | {associationtime,disassociationtime,sessionduration,clientip,clientmac,devicename,tx,rx,protocol,snr,rssi,dataretries,rtsretries }
session_main_idx | btree | {ssid_id,ap_id,associationtime,disassociationtime,sessionduration,clientip,clientmac,devicename,tx,rx,protocol,snr,rssi,dataretres,rtsretries}
session_date_idx | btree | {date(associationtime),"date_trunc('hour'::text, associationtime)"}
session_date_apid_idx | btree | {ap_id,date(associationtime),"date_trunc('hour'::text, associationtime)"}
session_date_ssid_apid_idx | btree | {ssid_id,ap_id,date(associationtime),"date_trunc('hour'::text, associationtime)"}
ap_apname_idx | btree | {apname}
users_pkey | btree | {username}
user_roles_pkey | btree | {user_role_id}
session_lim_values_idx | btree | {date(associationtime)}
它被称为session\u date\u ssid\u apid\u idx
。但为什么查询使用了错误的索引呢
会话\u日期\u ssid\u apid\u idx:
------------+-----------------------------+-------------------------------------------
ssid_id | integer | ssid_id
ap_id | integer | ap_id
date | date | date(associationtime)
date_trunc | timestamp without time zone | date_trunc('hour'::text, associationtime)
会话\u lim\u值\u idx:
date | date | date(associationtime)
你会创建什么索引
UPD:\d会话
--------------------+-----------------------------+------------------------------------------------------
id | integer | NOT NULL DEFAULT nextval('session_id_seq'::regclass)
ssid_id | integer | NOT NULL
ap_id | integer | NOT NULL
associationtime | timestamp without time zone | NOT NULL
disassociationtime | timestamp without time zone | NOT NULL
sessionduration | character varying(100) | NOT NULL
clientip | character varying(100) | NOT NULL
clientmac | character varying(100) | NOT NULL
devicename | character varying(100) | NOT NULL
tx | integer | NOT NULL
rx | integer | NOT NULL
protocol | character varying(100) | NOT NULL
snr | integer | NOT NULL
rssi | integer | NOT NULL
dataretries | integer | NOT NULL
rtsretries | integer | NOT NULL
╚эфхъё√:
"session_pkey" PRIMARY KEY, btree (associationtime, disassociationtime, sessionduration, clientip, clientmac, devicename, tx, rx, protocol, snr, rssi, dataretries, rtsretries)
"session_date_ap_ssid_idx" btree (ssid_id, ap_id, associationtime)
"session_date_apid_idx" btree (ap_id, date(associationtime), date_trunc('hour'::text, associationtime))
"session_date_idx" btree (date(associationtime), date_trunc('hour'::text, associationtime))
"session_date_ssid_apid_idx" btree (ssid_id, ap_id, associationtime)
"session_date_ssid_idx" btree (ssid_id, date(associationtime), date_trunc('hour'::text, associationtime))
"session_lim_values_idx" btree (date(associationtime))
"session_main_idx" btree (ssid_id, ap_id, associationtime, disassociationtime, sessionduration, clientip, clientmac, devicename, tx, rx, protocol, snr, rssi, dataretries, rtsretries)
ssid\u id
和ap\u id
谓词中非常常见的值可以让Postgres在看似更合适但更大的索引session\u-date\ssid\u-apid\u-idx
(4列)上选择较小的索引session\u-lim\u-idx
(只有1个date
列),并过滤其余的索引
在您的例子中,大约4%的行具有ssid\u id=42和ap\u id=1731
。通常情况下,这并不能保证转向较小的指数。但其他几个因素也在起作用,可能会使规模倾斜,基本上是成本设置和统计。详情:
- 如果尚未按照中的建议调整成本设置,请调整成本设置
- 增加相关列的统计目标
,ssid\u id
并运行ap\u id
:分析
您将找到表达式SELECT * FROM pg_statistic WHERE starelid = 'session_date_ssid_apid_idx'::regclass;
的专用行。更多详情:日期(associationtime)
- 通过删除第4列“date\u trunc('hour':text,associationtime),使索引
更具吸引力(更小)session\u date\u ssid\u apid\u idx
- 我宁愿使用强制转换的标准语法:
而不是函数语法cast(associationtime作为日期)
。这一点也不重要,我只知道正常工作的标准方法。您可以在查询中使用与表达式索引兼容的速记语法date(associationtime)
,但在索引定义中使用详细形式associationtime::date
EXPLAIN ANALYZE
测试哪种查询计划实际上更快,只删除/重新创建要测试的索引。然后您将看到Postgres是否最终选择了最佳计划
你有相当多的索引,我会检查它们是否都被实际使用,并去掉其余的。索引有维护成本,如果可能的话,集中在较少的索引上通常是有益的(更容易放入缓存,并且可能在需要时已经缓存)。权衡成本和效益
在一边
我会使用:
SUM(extract(epoch FROM disassociationtime
- associationtime)::int) AS sessionduration
这真的是执行计划的完整输出吗?我希望其中至少还有一个步骤可以查找其他列。顺便说一句:您可以删除其中一个索引:
ssid\u pkey
或ap\u pkey
它们是相同的。另外,最好从psql的\d
命令的输出中显示索引列表从我目前看到的情况来看,除了系统目录的内容(或者至少使用视图pg_index
)之外,还应该使用索引session_date\u ssid\u apid\u idx
。或者您的问题中缺少某些内容,或者您的数据库有问题。我会删除该索引(或者全部索引),运行真空完全分析会话,重新创建索引(或全部索引)然后再试一次。或者,如果无法锁定表,请使用。或者,您的大多数列都有ssid_id=42和ap_id=1731
,因此这些谓词对于索引的选择是无关紧要的,使用较小的索引并过滤其余的更便宜。@ErwinBrandstetter,似乎您对ssid_id=42和ap_id=173的看法是正确的1
。如果我将这些值更改为不太受欢迎的值,则将选择新索引(右索引)。如果将count(*)选择为a,将count(ssid_id=42和ap_id=1731或NULL)从会话中选择为b,并且对于选择count(在'2014-11-04 0:0'和'2014-11-20 0:0'之间的associationtime)作为a,count(关联时间介于'2014-11-04 0:0'和'2014-11-20 0:0'之间,ssid_id=42和ap_id=1731或NULL)正如session中的b一样
?2020年的数据库选择了完全错误的索引,计划也很糟糕,这让人很沮丧……我们重新考虑了MySQL,但需要地理空间支持。我将random page cost设置为1.2,但它为一个简单查询选择了错误的索引,这取决于字段列表中的xyz_id的长度er值使postgres选择了不应该的正确索引。我们必须在系统范围内禁用排序,以便它选择正确的索引,否则它会选择单列索引,并选择对其进行排序,这将耗费大量时间和成本。无论我们真空分析多少次,甚至设置默认值\u statistics\u target=10000(被认为是最精确的设置)它不起作用。
SELECT * FROM pg_statistic
WHERE starelid = 'session_date_ssid_apid_idx'::regclass;
SUM(extract(epoch FROM disassociationtime
- associationtime)::int) AS sessionduration