限制偏移量查询的MySQL查询优化_Mysql_Sql

限制偏移量查询的MySQL查询优化

mysql sql

限制偏移量查询的MySQL查询优化,mysql,sql,Mysql,Sql,我有一个特别的疑问： SELECT users.role, users.first_name, users.last_name, users.email, projects.project_id, projects.reminder, projects.title, projects.user_id, COUNT(DISTINCT CASE WHEN citations.deleted=0 THEN citations.citation_id ELSE NULL END) AS nr

我有一个特别的疑问：

SELECT users.role, users.first_name, users.last_name, users.email,
   projects.project_id, projects.reminder, projects.title, projects.user_id,
   COUNT(DISTINCT CASE WHEN citations.deleted=0 THEN citations.citation_id ELSE NULL END) AS nr_citations,
   COUNT(DISTINCT CASE WHEN citations.deleted=1 THEN citations.citation_id ELSE NULL END) AS nr_citations_deleted,
   COUNT(DISTINCT CASE WHEN citations.deleted=0 AND authors.first_name != "" AND authors.last_name !="" AND authors.last_name NOT LIKE "author_lastname%" AND authors.last_name NOT LIKE "author_firstname%" THEN citations.citation_id ELSE NULL END) AS nr_citations_filled,
   COUNT(DISTINCT CASE WHEN citations.deleted=0 AND citations.user_comment IS NOT NULL THEN citations.citation_id ELSE NULL END) AS nr_comments,
   (CASE WHEN user_stats.type IN (4,66,67,68,73,74) THEN user_stats.type ELSE NULL END) AS source,
   COUNT(DISTINCT CASE WHEN user_stats.type=1 THEN user_stats.id ELSE NULL END) AS nr_export_word,
   MAX(CASE WHEN user_stats.type=1 THEN user_stats.timestamp ELSE NULL END) AS last_export_word,
   COUNT(DISTINCT CASE WHEN user_stats.type=3 THEN user_stats.id ELSE NULL END) AS nr_export_email,
   MAX(CASE WHEN user_stats.type=3 THEN user_stats.timestamp ELSE NULL END) AS last_export_email,
   MAX(export_format_class_name) as exported_style
FROM projects
LEFT JOIN projects_styles ON projects_styles.project_id = projects.project_id
LEFT JOIN users ON users.user_id = projects.user_id
LEFT JOIN user_stats ON user_stats.project_id = projects.project_id
LEFT JOIN citations ON citations.project_id = projects.project_id
LEFT JOIN citations_authors ON citations_authors.citation_id = citations.citation_id
LEFT JOIN authors ON authors.author_id = citations_authors.author_id
GROUP BY projects.project_id
ORDER BY projects.project_id DESC
LIMIT 0,4000;

对于低偏移量来说效果很好，但对于12000或16000偏移量来说效果很差。我知道这是正常的，但是每次偏移的时间都是指数增长的，我认为这不是正常的事情。我想我的问题并没有我想的那么好

稍后编辑：以下是对我的问题的解释：

id,select_type,table,type,possible_keys,key,key_len,ref,rows,filtered,Extra
"1","SIMPLE","projects","index",NaN,"PRIMARY","4",NaN,"102","55850.00",""
"1","SIMPLE","projects_styles","ref","projects_styles_project_id_index","projects_styles_project_id_index","4","citelighter.projects.project_id","1","100.00",""
"1","SIMPLE","users","eq_ref","PRIMARY","PRIMARY","4","citelighter.projects.user_id","1","100.00",""
"1","SIMPLE","user_stats","ref","user_stats_project_id_index","user_stats_project_id_index","5","citelighter.projects.project_id","13","100.00",""
"1","SIMPLE","citations","ref","citations_project_id_index","citations_project_id_index","4","citelighter.projects.project_id","3","100.00",""
"1","SIMPLE","citations_authors","ref","citations_authors_citation_id_index","citations_authors_citation_id_index","4","citelighter.citations.citation_id","1","100.00",""
"1","SIMPLE","authors","eq_ref","PRIMARY","PRIMARY","4","citelighter.citations_authors.author_id","1","100.00",""

下面是如何查看查询的执行计划。只需在查询前面加上EXPLAIN或EXPLAIN扩展关键字。这是一个很好的方法，可以查看查询是否命中了您认为它命中的索引，或者它是否必须进行全表扫描，等等。有关输出的示例，请参阅MySQL文档。这将是优化查询的第一步：

EXPLAIN EXTENDED SELECT users.role, users.first_name, users.last_name, users.email,
    projects.project_id, projects.reminder, projects.title, projects.user_id,
    COUNT(DISTINCT CASE WHEN citations.deleted=0 THEN citations.citation_id ELSE NULL END) AS nr_citations,
    COUNT(DISTINCT CASE WHEN citations.deleted=1 THEN citations.citation_id ELSE NULL END) AS nr_citations_deleted,
    COUNT(DISTINCT CASE WHEN citations.deleted=0 AND authors.first_name != "" AND authors.last_name !="" AND authors.last_name NOT LIKE "author_lastname%" AND authors.last_name NOT LIKE "author_firstname%" THEN citations.citation_id ELSE NULL END) AS nr_citations_filled,
    COUNT(DISTINCT CASE WHEN citations.deleted=0 AND citations.user_comment IS NOT NULL THEN citations.citation_id ELSE NULL END) AS nr_comments,
    (CASE WHEN user_stats.type IN (4,66,67,68,73,74) THEN user_stats.type ELSE NULL END) AS source,
    COUNT(DISTINCT CASE WHEN user_stats.type=1 THEN user_stats.id ELSE NULL END) AS nr_export_word,
    MAX(CASE WHEN user_stats.type=1 THEN user_stats.timestamp ELSE NULL END) AS last_export_word,
    COUNT(DISTINCT CASE WHEN user_stats.type=3 THEN user_stats.id ELSE NULL END) AS nr_export_email,
    MAX(CASE WHEN user_stats.type=3 THEN user_stats.timestamp ELSE NULL END) AS last_export_email,
    MAX(export_format_class_name) as exported_style
FROM projects
LEFT JOIN projects_styles ON projects_styles.project_id = projects.project_id
LEFT JOIN users ON users.user_id = projects.user_id
LEFT JOIN user_stats ON user_stats.project_id = projects.project_id
LEFT JOIN citations ON citations.project_id = projects.project_id
LEFT JOIN citations_authors ON citations_authors.citation_id = citations.citation_id
LEFT JOIN authors ON authors.author_id = citations_authors.author_id
GROUP BY projects.project_id
ORDER BY projects.project_id DESC
LIMIT 0,4000;

它的速度很慢，因为它需要在返回所需的行之前计算所有这些偏移行。您可以尝试在unqiue键上添加WHERE条件，但这假设键中没有间隙，这是不可能的。但要导出到csv，只需创建一个新的唯一列并使用它

WHERE  unique_col > 10000 ORDER BY unique_col LIMIT 4000;

为什么需要10000个偏移量？您是否尝试过查看查询的执行路径，以了解当存在10000个偏移量时，为什么它运行得如此缓慢？乍一看，我没有看到任何东西会使持续时间呈指数增长。在我看来，这是一条直线，虽然很陡峭。虽然MySQL可能会根据某些计数选择不同的执行路径，但我怀疑截止值是否在4到10000之间。比较解释结果以确定。Arg！我知道这并没有回答你的具体问题，但你的小组是错误的。我知道MySQL允许您以这种方式使用GROUP，但这是不合逻辑的，也是错误的。所有未聚合的列都应包含在您的分组中。我必须将所有数据导出到csv文件中。执行时间如下：偏移量0-26秒偏移量4000-1分钟33秒偏移量8000-3分钟41秒偏移量12000-7分钟20秒偏移量16000-13分钟05秒偏移量20000-19分钟22秒所以我应该使用where子句进行分页？不分页这不起作用，因为人们会跳过页面，所以你们以前不可能知道偏移量。你们可以在点击的时候计算它，但问题是谁对上千行使用分页？这是你们的问题。我想把表中的所有数据都放到CSV中。所以我通过每一步从数据库中获取4000行来实现这一点。