postgresql中的视图优化
为了优化数据库访问,我必须在单个视图中聚合每个表的最后一次“读取”,但我注意到执行许多单个查询的成本比使用视图要低得多,所以我想知道我的视图中是否有错误,或者是否可以对其进行优化 以下是一些表格:postgresql中的视图优化,postgresql,views,correlated-subquery,Postgresql,Views,Correlated Subquery,为了优化数据库访问,我必须在单个视图中聚合每个表的最后一次“读取”,但我注意到执行许多单个查询的成本比使用视图要低得多,所以我想知道我的视图中是否有错误,或者是否可以对其进行优化 以下是一些表格: CREATE TABLE hives( id character(20) NOT NULL, master character(20) DEFAULT NULL::bpchar, owner integer, [...] CONSTRAINT hives_pkey PRIMARY
CREATE TABLE hives(
id character(20) NOT NULL,
master character(20) DEFAULT NULL::bpchar,
owner integer,
[...]
CONSTRAINT hives_pkey PRIMARY KEY (id),
CONSTRAINT hives_master_extk FOREIGN KEY (master)
REFERENCES hives (id) MATCH SIMPLE
ON UPDATE CASCADE ON DELETE SET NULL,
CONSTRAINT hives_owner_extk FOREIGN KEY (owner)
REFERENCES users (id) MATCH SIMPLE
ON UPDATE CASCADE ON DELETE CASCADE
)
CREATE TABLE dt_rain(
hive character(20) NOT NULL,
hiveconnection integer,
instant timestamp with time zone NOT NULL,
rain integer,
CONSTRAINT dt_rain_pkey PRIMARY KEY (hive, instant),
CONSTRAINT dt_rain_hive_connections_extk FOREIGN KEY (hiveconnection)
REFERENCES hives_connections (id) MATCH SIMPLE
ON UPDATE CASCADE ON DELETE SET NULL,
CONSTRAINT dt_rain_hive_extk FOREIGN KEY (hive)
REFERENCES hives (id) MATCH SIMPLE
ON UPDATE CASCADE ON DELETE CASCADE
)
CREATE TABLE dt_temperature
(
hive character(20) NOT NULL,
hiveconnection integer,
instant timestamp with time zone NOT NULL,
internal integer,
external integer,
CONSTRAINT dt_temperature_pkey PRIMARY KEY (hive, instant),
CONSTRAINT dt_temperature_hive_connections_extk FOREIGN KEY (hiveconnection)
REFERENCES hives_connections (id) MATCH SIMPLE
ON UPDATE CASCADE ON DELETE SET NULL,
CONSTRAINT dt_temperature_hive_extk FOREIGN KEY (hive)
REFERENCES hives (id) MATCH SIMPLE
ON UPDATE CASCADE ON DELETE CASCADE
)
每个数据表都包含所有读数的历史记录,并且非常大,并且共享相同的格式:配置单元键到配置单元表、即时、数据
我只想获得最后一个值,因此这里是视图:
CREATE OR REPLACE VIEW dt_last AS
SELECT id AS
hive,
b.instant AS inout_instant, "input", "output", timeout,
c.instant AS temperature_instant, "internal", "external",
d.instant AS weight_instant, weight,
e.instant AS rain_instant, rain,
f.instant AS voltage_instant, operational, panel, cell,
g.instant AS gps_instant, latitude, longitude, altitude
FROM hives
LEFT OUTER JOIN (
SELECT hive, instant, "input", "output", timeout FROM dt_inout_summary x
WHERE x.instant = (
SELECT MAX(x1.instant) FROM dt_inout_summary x1 WHERE x1.hive = x.hive
)
) b ON (id = b.hive)
LEFT OUTER JOIN (
SELECT hive, instant, "internal", "external" FROM dt_temperature x
WHERE x.instant = (
SELECT MAX(x1.instant) FROM dt_temperature x1 WHERE x1.hive = x.hive
)
) c ON (id = c.hive)
LEFT OUTER JOIN (
SELECT hive, instant, weight FROM dt_weight x
WHERE x.instant = (
SELECT MAX(x1.instant) FROM dt_weight x1 WHERE x1.hive = x.hive
)
) d ON (id = d.hive)
LEFT OUTER JOIN (
SELECT hive, instant, rain FROM dt_rain x
WHERE x.instant = (
SELECT MAX(x1.instant) FROM dt_inout_summary x1 WHERE x1.hive = x.hive
)
) e ON (id = e.hive)
LEFT OUTER JOIN (
SELECT hive, instant, operational, panel, cell FROM dt_voltage x
WHERE x.instant = (
SELECT MAX(x1.instant) FROM dt_inout_summary x1 WHERE x1.hive = x.hive
)
) f ON (id = f.hive)
LEFT OUTER JOIN (
SELECT hive, instant, latitude, longitude, altitude FROM dt_gps x
WHERE x.instant = (
SELECT MAX(x1.instant) FROM dt_gps x1 WHERE x1.hive = x.hive
)
) g ON (id = g.hive)
从该视图中选择每个记录大约需要1秒,这比从其中执行SELECT*要昂贵得多,其中hive=ORDER BY instant DESC Limit 1;每个蜂箱六次。
我很困惑
下面是查询分析器的图形视图,后面是解释分析输出
有没有办法以任何方式优化此视图
===按照joop的建议进行编辑,最大值替换为不存在,索引配置单元,即时描述
CREATE OR REPLACE VIEW dt_last4 AS
SELECT hives.id AS hive,
b.instant AS inout_instant,
b.input,
b.output,
b.timeout,
c.instant AS temperature_instant,
c.internal,
c.external,
d.instant AS weight_instant,
d.weight,
e.instant AS rain_instant,
e.rain,
f.instant AS voltage_instant,
f.operational,
f.panel,
f.cell,
g.instant AS gps_instant,
g.latitude,
g.longitude,
g.altitude
FROM hives
LEFT JOIN dt_inout_summary b ON b.hive = hives.id AND NOT (EXISTS ( SELECT 1
FROM dt_inout_summary nx
WHERE nx.hive = b.hive AND nx.instant > b.instant))
LEFT JOIN dt_temperature c ON c.hive = hives.id AND NOT (EXISTS ( SELECT 1
FROM dt_temperature nx
WHERE nx.hive = c.hive AND nx.instant > c.instant))
LEFT JOIN dt_weight d ON d.hive = hives.id AND NOT (EXISTS ( SELECT 1
FROM dt_weight nx
WHERE nx.hive = d.hive AND nx.instant > d.instant))
LEFT JOIN dt_rain e ON e.hive = hives.id AND NOT (EXISTS ( SELECT 1
FROM dt_rain nx
WHERE nx.hive = e.hive AND nx.instant > e.instant))
LEFT JOIN dt_voltage f ON f.hive = hives.id AND NOT (EXISTS ( SELECT 1
FROM dt_voltage nx
WHERE nx.hive = f.hive AND nx.instant > f.instant))
LEFT JOIN dt_gps g ON g.hive = hives.id AND NOT (EXISTS ( SELECT 1
FROM dt_gps nx
WHERE nx.hive = g.hive AND nx.instant > g.instant));
解释和分析:
第四个版本要好得多。仍有一些连续扫描。没有他们,这种观点将是一种纯粹的艺术状态 这是不存在的。。。构造,避免子查询中的最大聚合。它将受益于在dt_inout_摘要蜂巢中出现一个综合指数,即instant DESC
顺便说一句:您不需要子查询,只需一个普通的左连接即可完成相同的操作:
...
FROM hives h
LEFT JOIN dt_inout_summary x ON x.hive = h.id
AND NOT EXISTS(
SELECT 1
FROM dt_inout_summary nx
WHERE nx.hive = x.hive
AND nx.instant > x.instant
)
...
,但随后必须在主查询中引用x.yyyy字段。。。x、 配置单元,x.instant,x.input,x.output,x.timeout
更新:查询需要13个1+2*6项范围表项。这可能会导致优化器退出。
你可以尝试添加
SET join_collapse_limit = 16;
在你询问之前。另一种方法是将子查询拆分为CTE,CTE不会被优化器拆分,但CTE可能比相应的子查询慢一点:
CREATE OR REPLACE VIEW dt_last4cte AS
WITH cte_b AS (
SELECT *
FROM dt_inout_summary b WHERE NOT EXISTS ( SELECT 1
FROM dt_inout_summary nx
WHERE nx.hive = b.hive AND nx.instant > b.instant)
)
, cte_c AS (
SELECT *
FROM dt_temperature c WHERE NOT EXISTS ( SELECT 1
FROM dt_temperature nx
WHERE nx.hive = c.hive AND nx.instant > c.instant)
)
, cte_d AS (
SELECT *
FROM dt_weight d WHERE NOT EXISTS ( SELECT 1
FROM dt_weight nx WHERE nx.hive = d.hive AND nx.instant > d.instant)
)
, cte_e AS (
SELECT *
FROM dt_rain e WHERE NOT EXISTS ( SELECT 1
FROM dt_rain nx WHERE nx.hive = e.hive AND nx.instant > e.instant)
)
, cte_f AS (
SELECT *
FROM dt_voltage f WHERE NOT EXISTS ( SELECT 1
FROM dt_voltage nx WHERE nx.hive = f.hive AND nx.instant > f.instant)
)
, cte_g AS (
SELECT *
FROM dt_gps g WHERE NOT EXISTS ( SELECT 1
FROM dt_gps nx WHERE nx.hive = g.hive AND nx.instant > g.instant)
)
SELECT h0.id AS hive,
b.instant AS inout_instant,
b.input,
b.output,
b.timeout,
c.instant AS temperature_instant,
c.internal,
c.external,
d.instant AS weight_instant,
d.weight,
e.instant AS rain_instant,
e.rain,
f.instant AS voltage_instant,
f.operational,
f.panel,
f.cell,
g.instant AS gps_instant,
g.latitude,
g.longitude,
g.altitude
FROM hives h0
LEFT JOIN cte_b b ON b.hive = h0.id
LEFT JOIN cte_c c ON c.hive = h0.id
LEFT JOIN cte_d d ON d.hive = h0.id
LEFT JOIN cte_e e ON e.hive = h0.id
LEFT JOIN cte_f f ON f.hive = h0.id
LEFT JOIN cte_g g ON g.hive = h0.id
-- WHERE __aditional__conditions__
;
如果视图的典型用法为最终查询结果添加了附加条件,优化器可能会选择一个更具选择性的计划。替换最大。。。由相应的不存在。。。构造。并在dt_inout_summary配置单元上添加索引,同时为其他子表添加instant DESC。请将解释分析的输出添加到您的问题中。该图形可能看起来很性感,但相对来说毫无用处。您的意思是从dt_inout_摘要x中选择配置单元、即时、输入、输出、超时,其中x.hive=配置单元和不存在从dt_inout_摘要x1中选择配置单元、即时、输入、输出、超时,其中x.instanties。但是你似乎是对的,图像是错的,解释分析选择*的输出是从dt_最后;仅在图像下方。在som位置,预期行数和观察到的行数之间的差异太大。首先对所有表进行真空分析,看看是否有装配计划。另外:我无法解释hive=x。hive和instant不是来自NOT EXISTS…-建筑必须是不同的查询,IMHO。已应用此解决方案并相应更新。非常干净漂亮。现在好多了,但是还有一些顺序扫描,选择还需要几秒钟;在所有的桌子上?顺便说一句:序列扫描并不总是坏的。有时它们甚至是人们所希望的最好的。是的,我做了,但仍然保持大约3/4秒。假设每页最多显示10个配置单元,则每个表和每个配置单元执行单独查询的速度更快
...
FROM hives h
LEFT JOIN dt_inout_summary x ON x.hive = h.id
AND NOT EXISTS(
SELECT 1
FROM dt_inout_summary nx
WHERE nx.hive = x.hive
AND nx.instant > x.instant
)
...
SET join_collapse_limit = 16;
CREATE OR REPLACE VIEW dt_last4cte AS
WITH cte_b AS (
SELECT *
FROM dt_inout_summary b WHERE NOT EXISTS ( SELECT 1
FROM dt_inout_summary nx
WHERE nx.hive = b.hive AND nx.instant > b.instant)
)
, cte_c AS (
SELECT *
FROM dt_temperature c WHERE NOT EXISTS ( SELECT 1
FROM dt_temperature nx
WHERE nx.hive = c.hive AND nx.instant > c.instant)
)
, cte_d AS (
SELECT *
FROM dt_weight d WHERE NOT EXISTS ( SELECT 1
FROM dt_weight nx WHERE nx.hive = d.hive AND nx.instant > d.instant)
)
, cte_e AS (
SELECT *
FROM dt_rain e WHERE NOT EXISTS ( SELECT 1
FROM dt_rain nx WHERE nx.hive = e.hive AND nx.instant > e.instant)
)
, cte_f AS (
SELECT *
FROM dt_voltage f WHERE NOT EXISTS ( SELECT 1
FROM dt_voltage nx WHERE nx.hive = f.hive AND nx.instant > f.instant)
)
, cte_g AS (
SELECT *
FROM dt_gps g WHERE NOT EXISTS ( SELECT 1
FROM dt_gps nx WHERE nx.hive = g.hive AND nx.instant > g.instant)
)
SELECT h0.id AS hive,
b.instant AS inout_instant,
b.input,
b.output,
b.timeout,
c.instant AS temperature_instant,
c.internal,
c.external,
d.instant AS weight_instant,
d.weight,
e.instant AS rain_instant,
e.rain,
f.instant AS voltage_instant,
f.operational,
f.panel,
f.cell,
g.instant AS gps_instant,
g.latitude,
g.longitude,
g.altitude
FROM hives h0
LEFT JOIN cte_b b ON b.hive = h0.id
LEFT JOIN cte_c c ON c.hive = h0.id
LEFT JOIN cte_d d ON d.hive = h0.id
LEFT JOIN cte_e e ON e.hive = h0.id
LEFT JOIN cte_f f ON f.hive = h0.id
LEFT JOIN cte_g g ON g.hive = h0.id
-- WHERE __aditional__conditions__
;