为什么sqlite使用distinct重新扫描表?

为什么sqlite使用distinct重新扫描表?,sqlite,distinct,Sqlite,Distinct,我需要得到ref和alt的不同元素。在添加distinct并重新扫描基表之前,我有一个非常有效的查询?既然我有一个临时表,它不应该简单地将其用作数据源吗 sqlite> explain query plan ...> select t1.ref, t1.alt from (SELECT * from Sample_szes where str_id = 'STR_832206') as t1; selectid|order|from|det

我需要得到ref和alt的不同元素。在添加distinct并重新扫描基表之前,我有一个非常有效的查询?既然我有一个临时表,它不应该简单地将其用作数据源吗

   sqlite> explain query plan
       ...> select t1.ref, t1.alt from (SELECT * from Sample_szes where str_id 
        = 'STR_832206') as t1;


 selectid|order|from|detail
    1|0|0|SEARCH TABLE vcfBase AS base USING INDEX vcfBase_strid_idx ( . 
     str_id=?) (~10 rows)
     1|1|1|SEARCH TABLE vcfhomozyg AS hzyg USING INDEX homozyg_strid_idx 
     (str_id=?) (~10 rows)
      2|0|0|SEARCH TABLE vcfBase AS base USING INDEX vcfBase_strid_idx 
      (str_id=?) (~10 rows)
     2|1|1|SEARCH TABLE vcfAlt AS alt USING INDEX vcfAlt_strid_idx 
    (str_id=?) (~2 rows)
    2|2|2|SEARCH TABLE altGT AS gt USING INDEX altGT_strid_idx (str_id=?) (~2 rows)
    0|0|0|COMPOUND SUBQUERIES 1 AND 2 (UNION ALL)
添加distinct,它将重新扫描大型基表

sqlite> explain query plan
 ...> select distinct t1.ref, t1.alt from (SELECT * from Sample_szes 
 where str_id = 'STR_832206') as t1;

selectid|order|from|detail
2|0|0|SCAN TABLE vcfBase AS base (~1000000 rows)
2|1|1|SEARCH TABLE vcfhomozyg AS hzyg USING INDEX homozyg_strid_idx 
(str_id=?) (~10 rows)
3|0|0|SCAN TABLE vcfBase AS base (~1000000 rows)
3|1|1|SEARCH TABLE vcfAlt AS alt USING INDEX vcfAlt_strid_idx (str_id=?) (~2 rows)
3|2|2|SEARCH TABLE altGT AS gt USING INDEX altGT_strid_idx (str_id=?) (~2 rows)
1|0|0|COMPOUND SUBQUERIES 2 AND 3 (UNION ALL)
0|0|0|SCAN SUBQUERY 1 (~1400000 rows)
0|0|0|USE TEMP B-TREE FOR DISTINCT

我想我可能已经找到了答案。在我的mac上,我有以下版本的sqlite

SQLite版本3.19.3 2017-06-27 16:48:08

    sqlite> explain query plan
   ...> select distinct t1.ref, t1.alt from (SELECT * from Sample_szes where str_id = 'STR_832206') as t1;
2|0|1|SEARCH TABLE vcfhomozyg AS hzyg USING INDEX homozyg_strid_idx (str_id=?)
2|1|0|SEARCH TABLE vcfBase AS base USING INDEX vcfBase_strid_idx (str_id=?)
3|0|1|SEARCH TABLE vcfAlt AS alt USING INDEX vcfAlt_strid_idx (str_id=?)
3|1|0|SEARCH TABLE vcfBase AS base USING INDEX vcfBase_strid_idx (str_id=?)
3|2|2|SEARCH TABLE altGT AS gt USING INDEX altGT_strid_idx (str_id=?)
1|0|0|COMPOUND SUBQUERIES 2 AND 3 (UNION ALL)
0|0|0|SCAN SUBQUERY 1
0|0|0|USE TEMP B-TREE FOR DISTINCT

您应该为refalt列创建一个复合索引。然后将使用该索引。否则,将创建临时B树(索引),这需要整个扫描来对索引的数据进行排序

我认为解释如下:-

如果SELECT查询包含ORDER BY、GROUP BY或DISTINCT子句, SQLite可能需要使用临时的b树结构来对输出进行排序 排。或者,它可能使用索引。使用索引几乎总是非常困难的 比执行排序更有效

如果需要临时b-树,则会在解释中添加一条记录 “详细信息”字段设置为字符串值的查询计划输出 表格“为xxx使用临时B-树”,其中xxx是“订购人”、“集团”之一 通过“或”不同的”。例如:

sqlite> EXPLAIN QUERY PLAN SELECT c, d FROM t2 ORDER BY c;
QUERY PLAN
|--SCAN TABLE t2
`--USE TEMP B-TREE FOR ORDER BY
在这种情况下,可以通过创建 t2(c)上的索引如下:

sqlite> CREATE INDEX i4 ON t2(c);
sqlite> EXPLAIN QUERY PLAN SELECT c, d FROM t2 ORDER BY c; 
QUERY PLAN
`--SCAN TABLE t2 USING INDEX i4


谢谢,这真的很有帮助