Mysql 使用或查询性能_Mysql_Sql_Performance

Mysql 使用或查询性能

mysql sql performance

Mysql 使用或查询性能,mysql,sql,performance,Mysql,Sql,Performance,也许这是一个基本的问题，但我一直无法回答，我会感谢您的帮助：）我在MySQL中有下表： create table anotation ( chromosome enum ( 'Chr1', 'Chr2', 'Chr3', 'Chr4', 'Chr5', '

也许这是一个基本的问题，但我一直无法回答，我会感谢您的帮助：）

我在MySQL中有下表：

create table anotation
    ( 
        chromosome enum
            (
                'Chr1',
                'Chr2',
                'Chr3',
                'Chr4',
                'Chr5',
                'ChrC',
                'ChrM'
            ), 
        version varchar(10),
        type enum
            (
                'CDS',
                'chromosome',
                'exon',
                'five_prime_UTR',
                'gene',
                'mRNA',
                'mRNA_TE_gene',
                'miRNA',
                'ncRNA',
                'protein',
                'pseudogene',
                'pseudogenic_exon',
                'pseudogenic_transcript',
                'rRNA',
                'snRNA',
                'snoRNA',
                'tRNA',
                'three_prime_UTR',
                'transposable_element_gene'
            ), 
        strand enum
            (
                 '+',
                 '-'
            ), 
        phase tinyint, 
        atrributes text
    );`

它有大约600000个值，我正在进行以下查询：

select distinct
            anot_1.chromosome,
            anot_1.start,
            anot_1.end,
            anot_1.atrributes 
    from 
            anotation anot_1,
            anotation anot_2 
    where
            anot_1.type='CDS'
        and
            anot_2.type='protein'
        and 
            anot_1.chromosome!='ChrM'
        and
            anot_1.chromosome!='ChrC'
        and
            anot_1.chromosome=anot_2.chromosome
        and 
        (
            (
                anot_2.start=anot_1.start
            and
                anot_1.end!=anot_2.end
            and
                anot_2.strand='+'
            ) 
        or 
            (
                anot_2.start!=anot_1.start
            and
                anot_1.end=anot_2.end
            and
                anot_2.strand='-'
            )
        );

实际上，这需要很长一段时间才能完成，但当我执行查询时（相同的查询，但我从OR中删除了其中一个条件），它几乎会立即运行：

select distinct
            anot_1.chromosome,
            anot_1.start,
            anot_1.end,
            anot_1.atrributes
    from
            anotation anot_1, 
            anotation anot_2
    where
            anot_1.type='CDS'
        and 
            anot_2.type='protein'
        and 
            anot_1.chromosome!='ChrM'
        and
            anot_1.chromosome!='ChrC'
        and
            anot_1.chromosome=anot_2.chromosome
        and
            anot_2.start=anot_1.start
        and
            anot_1.end!=anot_2.end
        and 
            anot_2.strand='+';`

任何人都知道发生了什么，如果是的话，我该如何解决？谢谢你

这不是一个解决方案（我同意上面关于索引的评论），但是我已经改变了您的SQL，使用了一对左外部联接，而不是您当前的SQL，它在联接中有一个OR

虽然我不认为这在性能上会有太大的不同，但它可能会帮助您和其他人了解查询在做什么：-

SELECT distinct anot_1.chromosome, anot_1.start, anot_1.end, anot_1.atrributes 
FROM anotation anot_1,
LEFT OUTER JOIN anotation anot_2 ON anot_1.chromosome = anot_2.chromosome AND anot_1.start = anot_2.start AND anot_1.end != anot_2.end AND anot_2.strAND = '+' AND anot_2.type='protein'
LEFT OUTER JOIN anotation anot_3 ON anot_1.chromosome = anot_3.chromosome AND anot_1.end = anot_3.end AND anot_1.start != anot_3.start AND anot_3.strAND = '+' AND anot_3.type='protein'
WHERE anot_1.type = 'CDS'  
AND anot_1.chromosome != 'ChrM' 
AND anot_1.chromosome != 'ChrC' 
AND (anot_2.chromosome IS NOT NULL
OR anot_3.chromosome IS NOT NULL)

首先，我要对你的问题进行总体清理，将它们分开，然后进行联合

select
            CDS.chromosome,
            CDS.start,
            CDS.end,
            CDS.atrributes 
    from
            (
            select
                        a.chromosome,
                        a.start,
                        a.end,
                        a.attribures,
                from
                        anotation a,
                where
                        a.type='CDS'
                    and
                        not a.chromosome IN ('ChrM', 'ChrC')
            ) CDS
        join
            (
            select
                        a.strand,
                from
                        anotation a,
                where
                        a.type='protien'
            ) Protien
                on 
                        CDS.chromosome = Protien.chromosome
                   and
                        CDS.start = Protien.start
                   and
                        CDS.end != Protien.end 
where
            Protien.strand = '+'
union
    select
            CDS.chromosome,
            CDS.start,
            CDS.end,
            CDS.atrributes 
    from
            (
            select
                        a.chromosome,
                        a.start,
                        a.end,
                        a.attribures,
                from
                        anotation a,
                where
                        a.type='CDS'
                    and
                        not a.chromosome IN ('ChrM', 'ChrC')
            ) CDS
        join
            (
            select
                        a.strand,
                from
                        anotation a,
                where
                        a.type='protien'
            ) Protien
                on 
                        CDS.chromosome = Protien.chromosome
                   and
                        CDS.start != Protien.start
                   and
                        CDS.end = Protien.end 
where
            Protien.strand = '-'

如果您将查询简化一点（删除演示问题不需要的所有crud），并以整洁有序的方式格式化SQL，以便查询之间的细微差异更加明显，这会有所帮助。您可以发布EXPLAIN的结果吗？（只需将EXPLAIN添加到查询的开头，如EXPLAIN SELECT…）您添加了索引吗？出于兴趣，您使用或与简化查询相比得到了多少结果？也许您只是查询了一个更大的集合*你应该仍然能够快速运行）是的，我已经为开始、结束、类型和染色体添加了索引。首先，thnx用于编辑，下次我将以该格式粘贴我的代码。是的，正如我所说的，join非常有效，这就是我一直在使用的。但我想问的是，为什么OR存在性能问题，而简单的查询工作得很好。@JorgeKageyama，我认为本质上，当您的条件在两个没有明确区分的反向集上工作时，引擎很难区分和优化最佳标准。尤其是当连接条件与查询的限制混合时。查询优化过程不是魔术，如果我们能给它正确的帮助，它是好的。