Warning: file_get_contents(/data/phpspider/zhask/data//catemap/6/mongodb/12.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/ms-access/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Mysql 使用大型NOT IN语句优化查询_Mysql_Query Optimization - Fatal编程技术网

Mysql 使用大型NOT IN语句优化查询

Mysql 使用大型NOT IN语句优化查询,mysql,query-optimization,Mysql,Query Optimization,我试图找到只在某个时间戳之前存在的源站点。这个问题似乎很难胜任这项工作。你知道如何优化或改进索引吗 select distinct sourcesite from contentmeta where timestamp <= '2011-03-15' and sourcesite not in ( select distinct sourcesite from contentmeta where timestamp>'2011-03

我试图找到只在某个时间戳之前存在的源站点。这个问题似乎很难胜任这项工作。你知道如何优化或改进索引吗

select distinct sourcesite 
  from contentmeta 
  where timestamp <= '2011-03-15'
  and sourcesite not in (
    select distinct sourcesite 
      from contentmeta 
      where timestamp>'2011-03-15'
  );
sourcesite上有一个索引和时间戳,但查询仍然需要很长时间

mysql> EXPLAIN select distinct sourcesite from contentmeta where timestamp <= '2011-03-15' and sourcesite not in (select distinct sourcesite from contentmeta where timestamp>'2011-03-15');
+----+--------------------+-------------+----------------+---------------+----------+---------+------+--------+-------------------------------------------------+
| id | select_type        | table       | type           | possible_keys | key      | key_len | ref  | rows   | Extra                                           |
+----+--------------------+-------------+----------------+---------------+----------+---------+------+--------+-------------------------------------------------+
|  1 | PRIMARY            | contentmeta | index          | NULL          | sitetime | 14      | NULL | 725697 | Using where; Using index                        |
|  2 | DEPENDENT SUBQUERY | contentmeta | index_subquery | sitetime      | sitetime | 5       | func |     48 | Using index; Using where; Full scan on NULL key |
+----+--------------------+-------------+----------------+---------------+----------+---------+------+--------+-------------------------------------------------+

我发现not in并不能很好地优化许多数据库。改为使用左外部联接:

这假设sourcesite从不为null。

这应该可以:

SELECT DISTINCT c1.sourcesite
FROM contentmeta c1
LEFT JOIN contentmeta c2
  ON c2.sourcesite = c1.sourcesite
  AND c2.timestamp > '2011-03-15'
WHERE c1.timestamp <= '2011-03-15'
  AND c2.sourcesite IS NULL
为了获得最佳性能,在ContentMetaSourceSite上有一个多列索引timestamp


通常,联接的性能优于子查询,因为派生表不能使用索引。

子查询不需要DISTINCT,也不需要外部查询上的WHERE子句,因为您已经在按not IN进行筛选

尝试:


这可以在没有“不在”的情况下完成。因为它是最昂贵的mysql之一operation@MoyedAnsari:NOT IN不是最昂贵的操作。这个问题需要NOT IN或NOT EXISTS子查询或左连接-IS NULL查询。
SELECT DISTINCT c1.sourcesite
FROM contentmeta c1
LEFT JOIN contentmeta c2
  ON c2.sourcesite = c1.sourcesite
  AND c2.timestamp > '2011-03-15'
WHERE c1.timestamp <= '2011-03-15'
  AND c2.sourcesite IS NULL
select distinct sourcesite
from contentmeta
where sourcesite not in (
    select sourcesite
    from contentmeta
    where timestamp > '2011-03-15'
);