Mysql 使用大型NOT IN语句优化查询
我试图找到只在某个时间戳之前存在的源站点。这个问题似乎很难胜任这项工作。你知道如何优化或改进索引吗Mysql 使用大型NOT IN语句优化查询,mysql,query-optimization,Mysql,Query Optimization,我试图找到只在某个时间戳之前存在的源站点。这个问题似乎很难胜任这项工作。你知道如何优化或改进索引吗 select distinct sourcesite from contentmeta where timestamp <= '2011-03-15' and sourcesite not in ( select distinct sourcesite from contentmeta where timestamp>'2011-03
select distinct sourcesite
from contentmeta
where timestamp <= '2011-03-15'
and sourcesite not in (
select distinct sourcesite
from contentmeta
where timestamp>'2011-03-15'
);
sourcesite上有一个索引和时间戳,但查询仍然需要很长时间
mysql> EXPLAIN select distinct sourcesite from contentmeta where timestamp <= '2011-03-15' and sourcesite not in (select distinct sourcesite from contentmeta where timestamp>'2011-03-15');
+----+--------------------+-------------+----------------+---------------+----------+---------+------+--------+-------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------------+----------------+---------------+----------+---------+------+--------+-------------------------------------------------+
| 1 | PRIMARY | contentmeta | index | NULL | sitetime | 14 | NULL | 725697 | Using where; Using index |
| 2 | DEPENDENT SUBQUERY | contentmeta | index_subquery | sitetime | sitetime | 5 | func | 48 | Using index; Using where; Full scan on NULL key |
+----+--------------------+-------------+----------------+---------------+----------+---------+------+--------+-------------------------------------------------+
我发现not in并不能很好地优化许多数据库。改为使用左外部联接: 这假设sourcesite从不为null。这应该可以:
SELECT DISTINCT c1.sourcesite
FROM contentmeta c1
LEFT JOIN contentmeta c2
ON c2.sourcesite = c1.sourcesite
AND c2.timestamp > '2011-03-15'
WHERE c1.timestamp <= '2011-03-15'
AND c2.sourcesite IS NULL
为了获得最佳性能,在ContentMetaSourceSite上有一个多列索引timestamp
通常,联接的性能优于子查询,因为派生表不能使用索引。子查询不需要DISTINCT,也不需要外部查询上的WHERE子句,因为您已经在按not IN进行筛选 尝试:
这可以在没有“不在”的情况下完成。因为它是最昂贵的mysql之一operation@MoyedAnsari:NOT IN不是最昂贵的操作。这个问题需要NOT IN或NOT EXISTS子查询或左连接-IS NULL查询。
SELECT DISTINCT c1.sourcesite
FROM contentmeta c1
LEFT JOIN contentmeta c2
ON c2.sourcesite = c1.sourcesite
AND c2.timestamp > '2011-03-15'
WHERE c1.timestamp <= '2011-03-15'
AND c2.sourcesite IS NULL
select distinct sourcesite
from contentmeta
where sourcesite not in (
select sourcesite
from contentmeta
where timestamp > '2011-03-15'
);