限制偏移量查询的MySQL查询优化

限制偏移量查询的MySQL查询优化,mysql,sql,Mysql,Sql,我有一个特别的疑问: SELECT users.role, users.first_name, users.last_name, users.email, projects.project_id, projects.reminder, projects.title, projects.user_id, COUNT(DISTINCT CASE WHEN citations.deleted=0 THEN citations.citation_id ELSE NULL END) AS nr

我有一个特别的疑问:

SELECT users.role, users.first_name, users.last_name, users.email,
   projects.project_id, projects.reminder, projects.title, projects.user_id,
   COUNT(DISTINCT CASE WHEN citations.deleted=0 THEN citations.citation_id ELSE NULL END) AS nr_citations,
   COUNT(DISTINCT CASE WHEN citations.deleted=1 THEN citations.citation_id ELSE NULL END) AS nr_citations_deleted,
   COUNT(DISTINCT CASE WHEN citations.deleted=0 AND authors.first_name != "" AND authors.last_name !="" AND authors.last_name NOT LIKE "author_lastname%" AND authors.last_name NOT LIKE "author_firstname%" THEN citations.citation_id ELSE NULL END) AS nr_citations_filled,
   COUNT(DISTINCT CASE WHEN citations.deleted=0 AND citations.user_comment IS NOT NULL THEN citations.citation_id ELSE NULL END) AS nr_comments,
   (CASE WHEN user_stats.type IN (4,66,67,68,73,74) THEN user_stats.type ELSE NULL END) AS source,
   COUNT(DISTINCT CASE WHEN user_stats.type=1 THEN user_stats.id ELSE NULL END) AS nr_export_word,
   MAX(CASE WHEN user_stats.type=1 THEN user_stats.timestamp ELSE NULL END) AS last_export_word,
   COUNT(DISTINCT CASE WHEN user_stats.type=3 THEN user_stats.id ELSE NULL END) AS nr_export_email,
   MAX(CASE WHEN user_stats.type=3 THEN user_stats.timestamp ELSE NULL END) AS last_export_email,
   MAX(export_format_class_name) as exported_style
FROM projects
LEFT JOIN projects_styles ON projects_styles.project_id = projects.project_id
LEFT JOIN users ON users.user_id = projects.user_id
LEFT JOIN user_stats ON user_stats.project_id = projects.project_id
LEFT JOIN citations ON citations.project_id = projects.project_id
LEFT JOIN citations_authors ON citations_authors.citation_id = citations.citation_id
LEFT JOIN authors ON authors.author_id = citations_authors.author_id
GROUP BY projects.project_id
ORDER BY projects.project_id DESC
LIMIT 0,4000;
对于低偏移量来说效果很好,但对于12000或16000偏移量来说效果很差。我知道这是正常的,但是每次偏移的时间都是指数增长的,我认为这不是正常的事情。我想我的问题并没有我想的那么好

稍后编辑: 以下是对我的问题的解释:

id,select_type,table,type,possible_keys,key,key_len,ref,rows,filtered,Extra
"1","SIMPLE","projects","index",NaN,"PRIMARY","4",NaN,"102","55850.00",""
"1","SIMPLE","projects_styles","ref","projects_styles_project_id_index","projects_styles_project_id_index","4","citelighter.projects.project_id","1","100.00",""
"1","SIMPLE","users","eq_ref","PRIMARY","PRIMARY","4","citelighter.projects.user_id","1","100.00",""
"1","SIMPLE","user_stats","ref","user_stats_project_id_index","user_stats_project_id_index","5","citelighter.projects.project_id","13","100.00",""
"1","SIMPLE","citations","ref","citations_project_id_index","citations_project_id_index","4","citelighter.projects.project_id","3","100.00",""
"1","SIMPLE","citations_authors","ref","citations_authors_citation_id_index","citations_authors_citation_id_index","4","citelighter.citations.citation_id","1","100.00",""
"1","SIMPLE","authors","eq_ref","PRIMARY","PRIMARY","4","citelighter.citations_authors.author_id","1","100.00",""

下面是如何查看查询的执行计划。只需在查询前面加上EXPLAIN或EXPLAIN扩展关键字。这是一个很好的方法,可以查看查询是否命中了您认为它命中的索引,或者它是否必须进行全表扫描,等等。有关输出的示例,请参阅MySQL文档。这将是优化查询的第一步:

EXPLAIN EXTENDED SELECT users.role, users.first_name, users.last_name, users.email,
    projects.project_id, projects.reminder, projects.title, projects.user_id,
    COUNT(DISTINCT CASE WHEN citations.deleted=0 THEN citations.citation_id ELSE NULL END) AS nr_citations,
    COUNT(DISTINCT CASE WHEN citations.deleted=1 THEN citations.citation_id ELSE NULL END) AS nr_citations_deleted,
    COUNT(DISTINCT CASE WHEN citations.deleted=0 AND authors.first_name != "" AND authors.last_name !="" AND authors.last_name NOT LIKE "author_lastname%" AND authors.last_name NOT LIKE "author_firstname%" THEN citations.citation_id ELSE NULL END) AS nr_citations_filled,
    COUNT(DISTINCT CASE WHEN citations.deleted=0 AND citations.user_comment IS NOT NULL THEN citations.citation_id ELSE NULL END) AS nr_comments,
    (CASE WHEN user_stats.type IN (4,66,67,68,73,74) THEN user_stats.type ELSE NULL END) AS source,
    COUNT(DISTINCT CASE WHEN user_stats.type=1 THEN user_stats.id ELSE NULL END) AS nr_export_word,
    MAX(CASE WHEN user_stats.type=1 THEN user_stats.timestamp ELSE NULL END) AS last_export_word,
    COUNT(DISTINCT CASE WHEN user_stats.type=3 THEN user_stats.id ELSE NULL END) AS nr_export_email,
    MAX(CASE WHEN user_stats.type=3 THEN user_stats.timestamp ELSE NULL END) AS last_export_email,
    MAX(export_format_class_name) as exported_style
FROM projects
LEFT JOIN projects_styles ON projects_styles.project_id = projects.project_id
LEFT JOIN users ON users.user_id = projects.user_id
LEFT JOIN user_stats ON user_stats.project_id = projects.project_id
LEFT JOIN citations ON citations.project_id = projects.project_id
LEFT JOIN citations_authors ON citations_authors.citation_id = citations.citation_id
LEFT JOIN authors ON authors.author_id = citations_authors.author_id
GROUP BY projects.project_id
ORDER BY projects.project_id DESC
LIMIT 0,4000;

它的速度很慢,因为它需要在返回所需的行之前计算所有这些偏移行。 您可以尝试在unqiue键上添加WHERE条件,但这假设键中没有间隙,这是不可能的。但要导出到csv,只需创建一个新的唯一列并使用它

WHERE  unique_col > 10000 ORDER BY unique_col LIMIT 4000;

为什么需要10000个偏移量?您是否尝试过查看查询的执行路径,以了解当存在10000个偏移量时,为什么它运行得如此缓慢?乍一看,我没有看到任何东西会使持续时间呈指数增长。在我看来,这是一条直线,虽然很陡峭。虽然MySQL可能会根据某些计数选择不同的执行路径,但我怀疑截止值是否在4到10000之间。比较解释结果以确定。Arg!我知道这并没有回答你的具体问题,但你的小组是错误的。我知道MySQL允许您以这种方式使用GROUP,但这是不合逻辑的,也是错误的。所有未聚合的列都应包含在您的分组中。我必须将所有数据导出到csv文件中。执行时间如下:偏移量0-26秒偏移量4000-1分钟33秒偏移量8000-3分钟41秒偏移量12000-7分钟20秒偏移量16000-13分钟05秒偏移量20000-19分钟22秒所以我应该使用where子句进行分页?不分页这不起作用,因为人们会跳过页面,所以你们以前不可能知道偏移量。你们可以在点击的时候计算它,但问题是谁对上千行使用分页?这是你们的问题。我想把表中的所有数据都放到CSV中。所以我通过每一步从数据库中获取4000行来实现这一点。