Mysql 使用或查询性能
也许这是一个基本的问题,但我一直无法回答,我会感谢您的帮助:) 我在MySQL中有下表:Mysql 使用或查询性能,mysql,sql,performance,Mysql,Sql,Performance,也许这是一个基本的问题,但我一直无法回答,我会感谢您的帮助:) 我在MySQL中有下表: create table anotation ( chromosome enum ( 'Chr1', 'Chr2', 'Chr3', 'Chr4', 'Chr5', '
create table anotation
(
chromosome enum
(
'Chr1',
'Chr2',
'Chr3',
'Chr4',
'Chr5',
'ChrC',
'ChrM'
),
version varchar(10),
type enum
(
'CDS',
'chromosome',
'exon',
'five_prime_UTR',
'gene',
'mRNA',
'mRNA_TE_gene',
'miRNA',
'ncRNA',
'protein',
'pseudogene',
'pseudogenic_exon',
'pseudogenic_transcript',
'rRNA',
'snRNA',
'snoRNA',
'tRNA',
'three_prime_UTR',
'transposable_element_gene'
),
strand enum
(
'+',
'-'
),
phase tinyint,
atrributes text
);`
它有大约600000个值,我正在进行以下查询:
select distinct
anot_1.chromosome,
anot_1.start,
anot_1.end,
anot_1.atrributes
from
anotation anot_1,
anotation anot_2
where
anot_1.type='CDS'
and
anot_2.type='protein'
and
anot_1.chromosome!='ChrM'
and
anot_1.chromosome!='ChrC'
and
anot_1.chromosome=anot_2.chromosome
and
(
(
anot_2.start=anot_1.start
and
anot_1.end!=anot_2.end
and
anot_2.strand='+'
)
or
(
anot_2.start!=anot_1.start
and
anot_1.end=anot_2.end
and
anot_2.strand='-'
)
);
实际上,这需要很长一段时间才能完成,但当我执行查询时(相同的查询,但我从OR中删除了其中一个条件),它几乎会立即运行:
select distinct
anot_1.chromosome,
anot_1.start,
anot_1.end,
anot_1.atrributes
from
anotation anot_1,
anotation anot_2
where
anot_1.type='CDS'
and
anot_2.type='protein'
and
anot_1.chromosome!='ChrM'
and
anot_1.chromosome!='ChrC'
and
anot_1.chromosome=anot_2.chromosome
and
anot_2.start=anot_1.start
and
anot_1.end!=anot_2.end
and
anot_2.strand='+';`
任何人都知道发生了什么,如果是的话,我该如何解决?
谢谢你 这不是一个解决方案(我同意上面关于索引的评论),但是我已经改变了您的SQL,使用了一对左外部联接,而不是您当前的SQL,它在联接中有一个OR
虽然我不认为这在性能上会有太大的不同,但它可能会帮助您和其他人了解查询在做什么:-
SELECT distinct anot_1.chromosome, anot_1.start, anot_1.end, anot_1.atrributes
FROM anotation anot_1,
LEFT OUTER JOIN anotation anot_2 ON anot_1.chromosome = anot_2.chromosome AND anot_1.start = anot_2.start AND anot_1.end != anot_2.end AND anot_2.strAND = '+' AND anot_2.type='protein'
LEFT OUTER JOIN anotation anot_3 ON anot_1.chromosome = anot_3.chromosome AND anot_1.end = anot_3.end AND anot_1.start != anot_3.start AND anot_3.strAND = '+' AND anot_3.type='protein'
WHERE anot_1.type = 'CDS'
AND anot_1.chromosome != 'ChrM'
AND anot_1.chromosome != 'ChrC'
AND (anot_2.chromosome IS NOT NULL
OR anot_3.chromosome IS NOT NULL)
首先,我要对你的问题进行总体清理,将它们分开,然后进行联合
select
CDS.chromosome,
CDS.start,
CDS.end,
CDS.atrributes
from
(
select
a.chromosome,
a.start,
a.end,
a.attribures,
from
anotation a,
where
a.type='CDS'
and
not a.chromosome IN ('ChrM', 'ChrC')
) CDS
join
(
select
a.strand,
from
anotation a,
where
a.type='protien'
) Protien
on
CDS.chromosome = Protien.chromosome
and
CDS.start = Protien.start
and
CDS.end != Protien.end
where
Protien.strand = '+'
union
select
CDS.chromosome,
CDS.start,
CDS.end,
CDS.atrributes
from
(
select
a.chromosome,
a.start,
a.end,
a.attribures,
from
anotation a,
where
a.type='CDS'
and
not a.chromosome IN ('ChrM', 'ChrC')
) CDS
join
(
select
a.strand,
from
anotation a,
where
a.type='protien'
) Protien
on
CDS.chromosome = Protien.chromosome
and
CDS.start != Protien.start
and
CDS.end = Protien.end
where
Protien.strand = '-'
如果您将查询简化一点(删除演示问题不需要的所有crud),并以整洁有序的方式格式化SQL,以便查询之间的细微差异更加明显,这会有所帮助。您可以发布EXPLAIN的结果吗?(只需将EXPLAIN添加到查询的开头,如EXPLAIN SELECT…)您添加了索引吗?出于兴趣,您使用或与简化查询相比得到了多少结果?也许您只是查询了一个更大的集合*你应该仍然能够快速运行)是的,我已经为开始、结束、类型和染色体添加了索引。首先,thnx用于编辑,下次我将以该格式粘贴我的代码。是的,正如我所说的,join非常有效,这就是我一直在使用的。但我想问的是,为什么OR存在性能问题,而简单的查询工作得很好。@JorgeKageyama,我认为本质上,当您的条件在两个没有明确区分的反向集上工作时,引擎很难区分和优化最佳标准。尤其是当连接条件与查询的限制混合时。查询优化过程不是魔术,如果我们能给它正确的帮助,它是好的。