MySQL正在寻找一个好的索引

MySQL正在寻找一个好的索引,mysql,sql,select,indexing,Mysql,Sql,Select,Indexing,我有这张表(简化版) 尺寸: select count(*) from completions; -- => 4817574 现在我尝试执行以下查询: select completions.* from completions where (completed_at is not null) and completions.is_mongo_synced = 0 order by completions.id asc limit 10; 它需要9分钟 我发现没有

我有这张表(简化版)

尺寸:

select count(*) from completions; -- => 4817574
现在我尝试执行以下查询:

select completions.* 
from completions  
where 
  (completed_at is not null) 
  and completions.is_mongo_synced = 0 
  order by completions.id asc limit 10;
它需要9分钟

我发现没有使用任何索引,
explain extend
返回以下内容:

id: 1 
select_type: SIMPLE
table: completions 
type: index 
possible_keys: index_completions_on_completed_at_and_is_mongo_synced_and_id  
key: PRIMARY 
key_len: 4 
ref: NULL  
rows: 20  
filtered: 11616415.00 
Extra: Using where
如果我强制索引:

select completions.* 
from completions  
force index(index_completions_on_completed_at_and_is_mongo_synced_and_id)
where 
  (completed_at is not null) 
  and completions.is_mongo_synced = 0 
  order by completions.id asc limit 10;
KEY `index_completions_on_is_mongo_synced_and_id_and_completed_at` (`is_mongo_synced`,`id`,`completed_at`) USING BTREE,
它需要1,22s,这要好得多。
explain extend
返回:

id: 1
select_type: SIMPLE
table: completions
type: range
possible_keys: index_completions_on_completed_at_and_is_mongo_synced_and_id
key: index_completions_on_completed_at_and_is_mongo_synced_and_id
key_len: 6
ref: null
rows: 2323334
filtered: 100
Extra: Using index condition; Using filesort
id: 1
select_type: SIMPLE
table: completions
type: range
possible_keys: index_completions_on_completed_at_and_is_mongo_synced_and_id
key: index_completions_on_completed_at_and_is_mongo_synced_and_id
key_len: 6
ref: null
rows: 2323407
filtered: 100
Extra: Using index condition; Using filesort
id: 1
select_type: SIMPLE
table: completions
type: range
possible_keys: PRIMARYindex_completions_on_completed_at_and_is_mongo_synced_and_id
key: PRIMARY
key_len: 4
ref: null
rows: 2323451
filtered: 100
Extra: Using where
现在,如果我通过
completions.id
缩小查询范围,比如:

select completions.* 
from completions  
force index(index_completions_on_completed_at_and_is_mongo_synced_and_id)
where 
  (completed_at is not null) 
  and completions.is_mongo_synced = 0 
  and completions.id > 2000000
  order by completions.id asc limit 10;
需要1,31秒,仍然很好。
explain extend
返回:

id: 1
select_type: SIMPLE
table: completions
type: range
possible_keys: index_completions_on_completed_at_and_is_mongo_synced_and_id
key: index_completions_on_completed_at_and_is_mongo_synced_and_id
key_len: 6
ref: null
rows: 2323334
filtered: 100
Extra: Using index condition; Using filesort
id: 1
select_type: SIMPLE
table: completions
type: range
possible_keys: index_completions_on_completed_at_and_is_mongo_synced_and_id
key: index_completions_on_completed_at_and_is_mongo_synced_and_id
key_len: 6
ref: null
rows: 2323407
filtered: 100
Extra: Using index condition; Using filesort
id: 1
select_type: SIMPLE
table: completions
type: range
possible_keys: PRIMARYindex_completions_on_completed_at_and_is_mongo_synced_and_id
key: PRIMARY
key_len: 4
ref: null
rows: 2323451
filtered: 100
Extra: Using where
关键是,如果对于最后一个查询,我不强制索引:

select completions.* 
from completions  
where 
  (completed_at is not null) 
  and completions.is_mongo_synced = 0 
  and completions.id > 2000000
  order by completions.id asc limit 10;
它需要85ms,检查它是否为ms,而不是s
explain extend
返回:

id: 1
select_type: SIMPLE
table: completions
type: range
possible_keys: index_completions_on_completed_at_and_is_mongo_synced_and_id
key: index_completions_on_completed_at_and_is_mongo_synced_and_id
key_len: 6
ref: null
rows: 2323334
filtered: 100
Extra: Using index condition; Using filesort
id: 1
select_type: SIMPLE
table: completions
type: range
possible_keys: index_completions_on_completed_at_and_is_mongo_synced_and_id
key: index_completions_on_completed_at_and_is_mongo_synced_and_id
key_len: 6
ref: null
rows: 2323407
filtered: 100
Extra: Using index condition; Using filesort
id: 1
select_type: SIMPLE
table: completions
type: range
possible_keys: PRIMARYindex_completions_on_completed_at_and_is_mongo_synced_and_id
key: PRIMARY
key_len: 4
ref: null
rows: 2323451
filtered: 100
Extra: Using where
这不仅让我抓狂,而且上一个查询的性能会因为过滤器数量的微小变化而受到很大影响:

select completions.* 
from completions  
where 
  (completed_at is not null) 
  and completions.is_mongo_synced = 0 
  and completions.id > 1600000
  order by completions.id asc limit 10;
需要13s

select completions.* 
from completions  
force index(index_completions_on_completed_at_and_is_mongo_synced_and_id)
where 
  (completed_at is not null) 
  and completions.is_mongo_synced = 0 
  and completions.id > 2000000
  limit 10;
我不明白的事情:

  • 为什么当查询B假设使用更精确的索引时,下面的查询A比查询B快: c
  • 查询A:

    select completions.* 
    from completions  
    where 
      (completed_at is not null) 
      and completions.is_mongo_synced = 0 
      and completions.id > 2000000
      order by completions.id asc limit 10;
    
    select completions.* 
    from completions  
    where 
      (completed_at is not null) 
      and completions.is_mongo_synced = 0 
      and completions.id > 2000000
      order by completions.id asc limit 10;
    
    85ms

    select completions.* 
    from completions  
    force index(index_completions_on_completed_at_and_is_mongo_synced_and_id)
    where 
      (completed_at is not null) 
      and completions.is_mongo_synced = 0 
      and completions.id > 2000000
      limit 10;
    
    问题B:

    select completions.* 
    from completions  
    force index(index_completions_on_completed_at_and_is_mongo_synced_and_id)
    where 
      (completed_at is not null) 
      and completions.is_mongo_synced = 0 
      and completions.id > 2000000
      order by completions.id asc limit 10;
    
    select completions.* 
    from completions  
    where 
      (completed_at is not null) 
      and completions.is_mongo_synced = 0 
      and completions.id > 1600000
      order by completions.id asc limit 10;
    
    1,31s

    select completions.* 
    from completions  
    force index(index_completions_on_completed_at_and_is_mongo_synced_and_id)
    where 
      (completed_at is not null) 
      and completions.is_mongo_synced = 0 
      and completions.id > 2000000
      limit 10;
    
    2.为什么在以下查询中会出现这样的性能差异: 查询A:

    select completions.* 
    from completions  
    where 
      (completed_at is not null) 
      and completions.is_mongo_synced = 0 
      and completions.id > 2000000
      order by completions.id asc limit 10;
    
    select completions.* 
    from completions  
    where 
      (completed_at is not null) 
      and completions.is_mongo_synced = 0 
      and completions.id > 2000000
      order by completions.id asc limit 10;
    
    85ms

    select completions.* 
    from completions  
    force index(index_completions_on_completed_at_and_is_mongo_synced_and_id)
    where 
      (completed_at is not null) 
      and completions.is_mongo_synced = 0 
      and completions.id > 2000000
      limit 10;
    
    问题B:

    select completions.* 
    from completions  
    force index(index_completions_on_completed_at_and_is_mongo_synced_and_id)
    where 
      (completed_at is not null) 
      and completions.is_mongo_synced = 0 
      and completions.id > 2000000
      order by completions.id asc limit 10;
    
    select completions.* 
    from completions  
    where 
      (completed_at is not null) 
      and completions.is_mongo_synced = 0 
      and completions.id > 1600000
      order by completions.id asc limit 10;
    
    13s

    select completions.* 
    from completions  
    force index(index_completions_on_completed_at_and_is_mongo_synced_and_id)
    where 
      (completed_at is not null) 
      and completions.is_mongo_synced = 0 
      and completions.id > 2000000
      limit 10;
    
    3.为什么MySQL不自动使用以下查询的索引: 索引:

    key index_completions_on_completed_at_and_is_mongo_synced_and_id (completed_at,is_mongo_synced,id),
    
    查询:

    select completions.* 
    from completions  
    force index(index_completions_on_completed_at_and_is_mongo_synced_and_id)
    where 
      (completed_at is not null) 
      and completions.is_mongo_synced = 0 
      and completions.id > 2000000
      order by completions.id asc limit 10;
    
    更新 评论中要求提供更多数据

    基于
    的行数已同步
     select
         completions.is_mongo_synced,
         count(*)
     from completions
     group by completions.is_mongo_synced;
    
    结果:

    [
      {
        "is_mongo_synced":0,
        "count(*)":2731921
      },
      {
        "is_mongo_synced":1,
        "count(*)":2087869
      }
    ]
    
    不带订购人的查询
    select completions.* 
    from completions  
    where 
      (completed_at is not null) 
      and completions.is_mongo_synced = 0 
      and completions.id > 2000000
      limit 10;
    
    544ms

    select completions.* 
    from completions  
    force index(index_completions_on_completed_at_and_is_mongo_synced_and_id)
    where 
      (completed_at is not null) 
      and completions.is_mongo_synced = 0 
      and completions.id > 2000000
      limit 10;
    
    314ms

    select completions.* 
    from completions  
    force index(index_completions_on_completed_at_and_is_mongo_synced_and_id)
    where 
      (completed_at is not null) 
      and completions.is_mongo_synced = 0 
      and completions.id > 2000000
      limit 10;
    

    但是,无论如何,我需要订单,因为我正在一批一批地扫描表格。

    你的问题很复杂。但是,对于您的第一个查询:

    select completions.* 
    from completions  
    where completed_at is not null and
          completions.is_mongo_synced = 0 
    order by completions.id asc
    limit 10;
    
    上的最佳索引(已同步,已完成)
    。可能还有其他方法来编写查询,但是在您强制执行的索引中,列的顺序不是最优的

    第二个查询中的性能差异可能是因为数据实际上正在排序。额外的几十万行可能会影响排序时间。对
    id
    值的依赖可能是不使用索引的方式。如果将索引更改为
    (是否已同步、id、已完成)
    ,则索引使用的可能性更大

    MySQL有很好的复合索引文档。你可能想回顾一下

    添加建议的过滤器后 添加索引后:

    select completions.* 
    from completions  
    force index(index_completions_on_completed_at_and_is_mongo_synced_and_id)
    where 
      (completed_at is not null) 
      and completions.is_mongo_synced = 0 
      order by completions.id asc limit 10;
    
    KEY `index_completions_on_is_mongo_synced_and_id_and_completed_at` (`is_mongo_synced`,`id`,`completed_at`) USING BTREE,
    
    然后再次执行长查询

    select completions.* 
    from completions  
    where 
      (completed_at is not null) 
      and completions.is_mongo_synced = 0 
      order by completions.id asc limit 10;
    
    它需要156ms,非常好

    检查
    explain extended
    我们看到MySQL使用了正确的索引:

    id: 1
    select_type: SIMPLE
    table: completions
    type: ref
    possible_keys: index_completions_on_completed_at_and_is_mongo_synced_and_id,index_completions_on_is_mongo_synced_and_id_and_completed_at
    key: index_completions_on_is_mongo_synced_and_id_and_completed_at
    key_len: 2
    ref: const
    rows: 1626322
    filtered: 100
    Extra: Using index condition; Using where
    

    你试图强制索引

    (completed_at, is_mongo_synced, id)
    
    这是一个b-树,它必须首先探索
    completed\u at
    中所有不为
    NULL
    的不同值,然后为每个值同步正确的mongo\u,它们收集所有ID并对其进行排序,最后访问表以获取所需的行

    另一方面,使用主键(假设它是集群键),它只会跳转获取completions.id>2000000的页面,并读取连续行,直到它收集其中10行,如果不在该页面上,则获取下一行

    最后,两个查询可能都会检查表中相似的页数+第一个查询必须获取整个索引并对其进行排序

    如果要使用索引,请尝试

    (is_mongo_synced, id, completed_at)
    

    请参阅手册。

    注意:我假设InnoDB

    建立最优指标,

  • 收集所有“=”值。这只是
    是\u mongo\u同步的
    。这将使查找在索引中的一个连续点中进行
  • 再加上一件事
  • 如果在处添加
    completed\u,它将扫描所有非空条目,收集
    id
    进行后续排序。排序(
    orderby
    )需要花费一些成本,而
    索引(已同步,已完成,…)
    是无法避免的

    如果改为添加
    id
    ,现在有可能避免排序。但它仍然必须完成过滤(以避免在
    行处出现NULL
    completed_)。因此,
    索引(是否已同步,id等)
    可能很好

    如果您同时拥有这两个索引,那么优化器不擅长在这两个索引之间进行选择,因为这在很大程度上取决于数据的分布以及您是否有
    限制。了解数据的您可能无法正确选择哪个索引更好

    我说“…”。我的意思是你可以到此为止,或者你可以在索引中添加更多的列。添加更多的列将进入所谓的“覆盖索引”。如果
    SELECT
    中提到的所有列(在任何地方)都存在于二级索引中,那么它就是“覆盖”。所以首先,让我退一步

    在二级索引中查找内容时,它会在BTree的底部找到
    主键。然后通过向下钻取集群PK的BTree来查找其他列。这种额外的钻取成本可能很高。但是

    如果索引是“覆盖”,那么就不需要进行额外的BTree向下搜索

    您意外地得到了一个“覆盖”索引,但不是按最佳顺序。需要扫描整个索引,然后进行排序。我的每个索引都避免扫描整个索引,因此可能会更快

    通过添加额外的列,我有两个(相互竞争的)覆盖索引:

    KEY mci (is_mongo_synced, completed_at, id)
    KEY mic (is_mongo_synced, id, completed_at)
    
    旁白。。。因为PK会自动添加到每个辅助键,所以即使我只提到了前2列,这些3列索引仍然存在。所以,如果你尝试2和3,但没有发现任何区别,不要感到困惑

    为清楚起见,我将在“mci”和“mic”中留下3个明确的列

    重新分析它们

    “mci”将扫描包含
    is\u mongo\u sync的索引部分