Sql 完全连接，带=任何不带'；不要使用索引_Sql_Postgresql_Indexing_Outer Join_Postgresql Performance

Sql 完全连接，带=任何不带'；不要使用索引

sql postgresql indexing

Sql 完全连接，带=任何不带'；不要使用索引,sql,postgresql,indexing,outer-join,postgresql-performance,Sql,Postgresql,Indexing,Outer Join,Postgresql Performance,使用Postgres 9.3.5，我似乎无法使用=anywhere子句获得完整的外部联接来使用相关索引一个简单的例子： create table t1(i int primary key, j int); create table t2(i int primary key, j int); insert into t1 (select x,x from generate_series(1,1000000) x); insert into t2 (select x,x from generat

使用Postgres 9.3.5，我似乎无法使用

=any

where子句获得完整的外部联接来使用相关索引

一个简单的例子：

create table t1(i int primary key, j int);
create table t2(i int primary key, j int);

insert into t1 (select x,x from generate_series(1,1000000) x);
insert into t2 (select x,x from generate_series(1,1000000) x);

vacuum analyze;

explain analyze
    select * 
        from t1 full join t2 using(i) 
        where i =any (array[1,2]);

（在我的实际查询中，数组是一个参数，长度可变）

我得到以下seq扫描查询计划：

 Hash Full Join  (cost=26925.00..66350.00 rows=10000 width=16) (actual time=178.308..1251.221 rows=2 loops=1)
   Hash Cond: (t1.i = t2.i)
   Filter: (COALESCE(t1.i, t2.i) = ANY ('{1,2}'::integer[]))
   Rows Removed by Filter: 999998
   ->  Seq Scan on t1  (cost=0.00..14425.00 rows=1000000 width=8) (actual time=0.011..59.463 rows=1000000 loops=1)
   ->  Hash  (cost=14425.00..14425.00 rows=1000000 width=8) (actual time=178.212..178.212 rows=1000000 loops=1)
         Buckets: 131072  Batches: 1  Memory Usage: 39063kB
         ->  Seq Scan on t2  (cost=0.00..14425.00 rows=1000000 width=8) (actual time=0.012..57.751 rows=1000000 loops=1)
 Total runtime: 1255.734 ms

我尝试过的不成功的事情：

使用（1,2）中的
```
i
```
或
```
i=1或i=2
```
而不是
```
=any
```
```
将enable_seqscan设置为f
```

使用左连接和反连接模拟完全连接：

explain analyze 
    select * from
        (select i, t1.j, t2.j from t1 left join t2 using(i) 
         union all
         select i, null, j from t2 
             where not exists (select 1 from t1 where t1.i = t2.i)) sub
    where i =any (array[1,2]);


 Append  (cost=0.85..51.61 rows=3 width=12) (actual time=0.007..0.018 rows=2 loops=1)
   ->  Nested Loop Left Join  (cost=0.85..29.79 rows=2 width=12) (actual time=0.007..0.010 rows=2 loops=1)
         ->  Index Scan using t1_pkey on t1  (cost=0.42..12.88 rows=2 width=8) (actual time=0.003..0.005 rows=2 loops=1)
               Index Cond: (i = ANY ('{1,2}'::integer[]))
         ->  Index Scan using t2_pkey on t2  (cost=0.42..8.44 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=2)
               Index Cond: (t1.i = i)
   ->  Nested Loop Anti Join  (cost=0.85..21.79 rows=1 width=8) (actual time=0.008..0.008 rows=0 loops=1)
         ->  Index Scan using t2_pkey on t2 t2_1  (cost=0.42..12.88 rows=2 width=8) (actual time=0.001..0.002 rows=2 loops=1)
               Index Cond: (i = ANY ('{1,2}'::integer[]))
         ->  Index Only Scan using t1_pkey on t1 t1_1  (cost=0.42..4.44 rows=1 width=4) (actual time=0.002..0.002 rows=1 loops=2)
               Index Cond: (i = t2_1.i)
               Heap Fetches: 0
 Total runtime: 0.065 ms

不过，这种方法会使实际查询变得非常复杂，并增加重复。有没有更好的方法让Postgres使用索引？

将谓词下推到子查询中可以达到以下目的：

EXPLAIN ANALYZE
SELECT * 
FROM      (SELECT * FROM t1 WHERE i = ANY ('{1,2}')) t1
FULL JOIN (SELECT * FROM t2 WHERE i = ANY ('{1,2}')) t2 USING (i);

（10万行）

显然，查询计划器不够聪明，无法得出结论，在完全联接之后，可以从列上的谓词使用基础表上的索引。这是可以改进的

现在无法测试第9.4页。也许已经改进了

顺便说一句，大多数客户端不能使用相同的名称处理结果中的多个列（即使Postgres可以这样做）。您的

的两个实例将是一个问题，您必须至少使用一个列别名，迫使您显式列出列。

谢谢！不幸的是，在我的例子中，连接是视图定义的一部分，应用程序将谓词应用于视图。我不认为有什么方法可以降低谓词并继续使用一个视图吗？@函数Trace:我会考虑一个函数，然后将数组项作为参数，可能是<代码>变量函数以方便使用。

QUERY PLAN
Merge Full Join (cost=0.58..25.26 rows=2 width=16) (actual time=0.084..0.100 rows=2 loops=1)
  Merge Cond: (t1.i = t2.i)
    -> Index Scan using t1_pkey on t1 (cost=0.29..12.62 rows=2 width=8) (actual time=0.044..0.048 rows=2 loops=1)
         Index Cond: (i = ANY ('{1,2}'::integer[]))
    -> Index Scan using t2_pkey on t2 (cost=0.29..12.62 rows=2 width=8) (actual time=0.028..0.033 rows=2 loops=1)
         Index Cond: (i = ANY ('{1,2}'::integer[]))
Total runtime: 0.256 ms