Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/sql/81.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Sql 带指数的Postgres奇性行为_Sql_Postgresql - Fatal编程技术网

Sql 带指数的Postgres奇性行为

Sql 带指数的Postgres奇性行为,sql,postgresql,Sql,Postgresql,我有一个datavalue表,大约有2亿行,在site\u id和parameter\u id上都有索引。我需要执行诸如“返回所有带有数据的站点”和“返回带有数据的所有参数”之类的查询。站点表只有大约200行,而参数表只有大约100行 站点查询速度快,并使用索引: EXPLAIN ANALYZE select * from site where exists ( select 1 from datavalue where datavalue.site_id = site.

我有一个
datavalue
表,大约有2亿行,在
site\u id
parameter\u id
上都有索引。我需要执行诸如“返回所有带有数据的站点”和“返回带有数据的所有参数”之类的查询。
站点
表只有大约200行,而
参数
表只有大约100行

站点
查询速度快,并使用索引:

EXPLAIN ANALYZE
select *
from site
where exists (
      select 1 from datavalue
      where datavalue.site_id = site.id limit 1
);

Seq Scan on site  (cost=0.00..64.47 rows=64 width=113) (actual time=0.046..1.106 rows=89 loops=1)
  Filter: (SubPlan 1)
  Rows Removed by Filter: 39
  SubPlan 1
    ->  Limit  (cost=0.44..0.47 rows=1 width=0) (actual time=0.008..0.008 rows=1 loops=128)
          ->  Index Only Scan using ix_datavalue_site_id on datavalue  (cost=0.44..8142.71 rows=248930 width=0) (actual time=0.008..0.008 rows=1 loops=128)
                Index Cond: (site_id = site.id)
                Heap Fetches: 0
Planning time: 0.361 ms
Execution time: 1.149 ms
相同的参数查询速度相当慢,并且不使用索引:

EXPLAIN ANALYZE
select *
from parameter
where exists (
      select 1 from datavalue
      where datavalue.parameter_id = parameter.id limit 1
);

Seq Scan on parameter  (cost=0.00..20.50 rows=15 width=2648) (actual time=2895.972..21331.701 rows=15 loops=1)
  Filter: (SubPlan 1)
  Rows Removed by Filter: 6
  SubPlan 1
    ->  Limit  (cost=0.00..0.34 rows=1 width=0) (actual time=1015.790..1015.790 rows=1 loops=21)
          ->  Seq Scan on datavalue  (cost=0.00..502127.10 rows=1476987 width=0) (actual time=1015.786..1015.786 rows=1 loops=21)
                Filter: (parameter_id = parameter.id)
                Rows Removed by Filter: 7739355
Planning time: 0.123 ms
Execution time: 21331.736 ms
这到底是怎么回事?或者,做这件事的好方法是什么

表中的一些说明:

id BIGINT DEFAULT nextval('datavalue_id_seq'::regclass) NOT NULL,
value DOUBLE PRECISION NOT NULL,
site_id INTEGER NOT NULL,
parameter_id INTEGER NOT NULL,
deployment_id INTEGER,
instrument_id INTEGER,
invalid BOOLEAN,
Indexes:
    "datavalue_pkey" PRIMARY KEY, btree (id)
    "datavalue_datetime_utc_site_id_parameter_id_instrument_id_key" UNIQUE CONSTRAINT, btree (datetime_utc, site_id, parameter_id, instrument_id)
    "ix_datavalue_instrument_id" btree (instrument_id)
    "ix_datavalue_parameter_id" btree (parameter_id)
    "ix_datavalue_site_id" btree (site_id)
    "tmp_idx" btree (site_id, datetime_utc)
Foreign-key constraints:
    "datavalue_instrument_id_fkey" FOREIGN KEY (instrument_id) REFERENCES instrument(id) ON UPDATE CASCADE ON DELETE CASCADE
    "datavalue_parameter_id_fkey" FOREIGN KEY (parameter_id) REFERENCES parameter(id) ON UPDATE CASCADE ON DELETE CASCADE
    "datavalue_site_id_fkey" FOREIGN KEY (site_id) REFERENCES coastal.site(id) ON UPDATE CASCADE ON DELETE CASCADE
    "datavalue_statistic_type_id_fkey"
编辑:这是计数分布

select count(parameter_id), parameter_id from datavalue group by parameter_id

88169   14
2889171 8
15805   17
8570    12
4257262 21
3947049 15
1225902 2
4091090 3
103877  10
633764  11
994442  18
49232   20
14935   4
563638  13
2955919 7

更新:正如前面提到的没有名字的马一样,您可以删除限制1,查询将使用索引

显然,PostgreSQL错误地认为,如果您执行子查询并忽略了限制1,它将触及整个数据库。(结果证明这是不必要的。)

我在我的笔记本电脑上生成了相同的分发版本,包括:

create table testtbl (id integer, par_id integer);
insert into testtbl (id, par_id) values (0,0 );
insert into testtbl (id, par_id) select "generate_series", 4 from generate_series((select max(id) from testtbl), (select max(id) from testtbl)+14935);
insert into testtbl (id, par_id) select "generate_series", 12 from generate_series((select max(id) from testtbl), (select max(id) from testtbl)+8570);
insert into testtbl (id, par_id) select "generate_series", 17 from generate_series((select max(id) from testtbl), (select max(id) from testtbl)+15805);
insert into testtbl (id, par_id) select "generate_series", 20 from generate_series((select max(id) from testtbl), (select max(id) from testtbl)+49232);
insert into testtbl (id, par_id) select "generate_series", 14 from generate_series((select max(id) from testtbl), (select max(id) from testtbl)+88169);
insert into testtbl (id, par_id) select "generate_series", 10 from generate_series((select max(id) from testtbl), (select max(id) from testtbl)+103877);
insert into testtbl (id, par_id) select "generate_series", 2 from generate_series((select max(id) from testtbl), (select max(id) from testtbl)+1225902);
insert into testtbl (id, par_id) select "generate_series", 8 from generate_series((select max(id) from testtbl), (select max(id) from testtbl)+2889171);
insert into testtbl (id, par_id) select "generate_series", 7 from generate_series((select max(id) from testtbl), (select max(id) from testtbl)+2955919);
insert into testtbl (id, par_id) select "generate_series", 3 from generate_series((select max(id) from testtbl), (select max(id) from testtbl)+4091090);
insert into testtbl (id, par_id) select "generate_series", 13 from generate_series((select max(id) from testtbl), (select max(id) from testtbl)+563638);
insert into testtbl (id, par_id) select "generate_series", 11 from generate_series((select max(id) from testtbl), (select max(id) from testtbl)+633764);
insert into testtbl (id, par_id) select "generate_series", 18 from generate_series((select max(id) from testtbl), (select max(id) from testtbl)+994442);
insert into testtbl (id, par_id) select "generate_series", 15 from generate_series((select max(id) from testtbl), (select max(id) from testtbl)+3947049);
insert into testtbl (id, par_id) select "generate_series", 21 from generate_series((select max(id) from testtbl), (select max(id) from testtbl)+4257262);
delete from testtbl where id = 0 and par_id = 0;
create index testtbl_paridx on testtbl (par_id);
create table parameter (id integer);
insert into parameter select * from generate_series (1, 28);
analyze testtbl;
然后,如果我运行查询:

testdb=# explain analyze select * from parameter where exists (select 1 from testtbl where testtbl.par_id = parameter.id limit 1);
                                                           QUERY PLAN                                                            
---------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on parameter  (cost=0.00..643.29 rows=1200 width=4) (actual time=4083.514..54216.575 rows=15 loops=1)
   Filter: (SubPlan 1)
   Rows Removed by Filter: 13
   SubPlan 1
     ->  Limit  (cost=0.00..0.25 rows=1 width=0) (actual time=1936.299..1936.299 rows=1 loops=28)
           ->  Seq Scan on testtbl  (cost=0.00..369619.35 rows=1455927 width=0) (actual time=1936.294..1936.294 rows=1 loops=28)
                 Filter: (par_id = parameter.id)
                 Rows Removed by Filter: 14870626
 Planning time: 0.151 ms
 Execution time: 54216.620 ms
(10 rows)
如果禁用顺序扫描:

testdb=# set local enable_seqscan = off;
SET

testdb=# explain analyze select * from parameter where exists (select 1 from testtbl where testtbl.par_id = parameter.id limit 1);
                                                                      QUERY PLAN                                                                       
-------------------------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on parameter  (cost=10000000000.00..10000001395.02 rows=1200 width=4) (actual time=0.077..0.563 rows=15 loops=1)
   Filter: (SubPlan 1)
   Rows Removed by Filter: 13
   SubPlan 1
     ->  Limit  (cost=0.44..0.57 rows=1 width=0) (actual time=0.019..0.019 rows=1 loops=28)
           ->  Index Only Scan using ix_testtbl_par on testtbl  (cost=0.44..188678.87 rows=1455927 width=0) (actual time=0.018..0.018 rows=1 loops=28)
                 Index Cond: (par_id = parameter.id)
                 Heap Fetches: 15
 Planning time: 0.169 ms
 Execution time: 0.605 ms
(10 rows)
很快,但有点粗俗。您希望使用,以避免对所有查询禁用顺序扫描。SET LOCAL在事务提交之前有效

更新:一个更好的选择是按照一匹没有名字的马的建议,完全取消限制1

testdb=# explain analyze select * from parameter where exists (select 1 from testtbl where testtbl.par_id = parameter.id );
                                                                  QUERY PLAN                                                                  
----------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop Semi Join  (cost=0.44..1591.08 rows=1200 width=4) (actual time=0.070..0.492 rows=15 loops=1)
   ->  Seq Scan on parameter  (cost=0.00..34.00 rows=2400 width=4) (actual time=0.010..0.018 rows=28 loops=1)
   ->  Index Only Scan using testtbl_paridx on testtbl  (cost=0.44..29379.76 rows=1455923 width=4) (actual time=0.016..0.016 rows=1 loops=28)
         Index Cond: (par_id = parameter.id)
         Heap Fetches: 15
 Planning time: 0.216 ms
 Execution time: 0.532 ms
(7 rows)

我怀疑第二个查询的相关子查询中存在重复项。因此它需要获取限制1之前的所有行
datavalue.parameter\u id=parameter.id
,是否可以显示
\d datavalue
?附加一些DDL只需删除子选择中无用的
LIMIT
,Postgres将选择一个更好的计划:(基于hruske的测试设置)
exists之后的子查询每隔一次运行一次。
一个常见的误解(由于接触mysql、IIRC而引起):检查查询计划。子查询是查询计划不可分割的一部分。哇,感谢您的关注。是的,这确实让人感觉脏,但它确实起作用了……测试设置非常有用(+1),它表明,如果删除了无用的
限制1
,那么就不必再胡闹
enable_seqscan
:测试设置非常有用,但它忽略了主键和外键约束。此外:testtbl.par_id是整数而不是NULL,这可能会产生很大的差异,因为作为一个FK,这意味着基数被限制在parameter.id的域中。是的,限制1就是违规者。删除了它,它确实正确地使用了索引。非常感谢各位。我只想说,在测试设置方面做得很好,你们真的付出了很大的努力。