Sql 如何优化tidb上的计数(*)查询
我有一个表,有大约3000000行,如下所述Sql 如何优化tidb上的计数(*)查询,sql,tidb,tikv,Sql,Tidb,Tikv,我有一个表,有大约3000000行,如下所述 Field |Type |Null|Key|Default|Extra | ------------------|------------|----|---|-------|--------------| id |bigint(20) |NO |PRI| |auto_increment| entity_name |varchar(50) |
Field |Type |Null|Key|Default|Extra |
------------------|------------|----|---|-------|--------------|
id |bigint(20) |NO |PRI| |auto_increment|
entity_name |varchar(50) |YES | | | |
field_type |varchar(50) |YES |MUL| | |
kv_key |text |YES |MUL| | |
kv_value |text |YES |MUL| | |
value_type |int(11) |YES | | | |
dataset_id |varchar(255)|YES |MUL| | |
dataset_version_id|varchar(255)|YES |MUL| | |
experiment_id |varchar(255)|YES |MUL| | |
experiment_run_id |varchar(255)|YES |MUL| | |
job_id |varchar(255)|YES |MUL| | |
project_id |varchar(255)|YES |MUL| | |
它有如下索引:
Table |Non_unique|Key_name |Seq_in_index|Column_name |Collation|Cardinality|Sub_part|Packed|Null|Index_type|Comment|Index_comment|
--------|----------|-------------|------------|------------------|---------|-----------|--------|------|----|----------|-------|-------------|
keyvalue| 0|PRIMARY | 1|id |A | 0| | | |BTREE | | |
keyvalue| 1|kv_dsv_id | 1|dataset_version_id|A | 0| | |YES |BTREE | | |
keyvalue| 1|kv_p_id | 1|project_id |A | 0| | |YES |BTREE | | |
keyvalue| 1|kv_j_id | 1|job_id |A | 0| | |YES |BTREE | | |
keyvalue| 1|kv_e_id | 1|experiment_id |A | 0| | |YES |BTREE | | |
keyvalue| 1|kv_d_id | 1|dataset_id |A | 0| | |YES |BTREE | | |
keyvalue| 1|kv_er_id | 1|experiment_run_id |A | 0| | |YES |BTREE | | |
keyvalue| 1|kv_field_type| 1|field_type |A | 0| | |YES |BTREE | | |
keyvalue| 1|kv_kv_val | 1|kv_value |A | 0| 255| |YES |BTREE | | |
keyvalue| 1|kv_kv_key | 1|kv_key |A | 0| 255| |YES |BTREE | | |
带计划的178ms选择计数(*)返回说明
id |count |task|operator info |
------------------|----------|----|-----------------------------------------------------------------------------|
StreamAgg_48 |1.00 |root|funcs:count(col_0) |
└─IndexReader_49 |1.00 |root|index:StreamAgg_8 |
└─StreamAgg_8 |1.00 |cop |funcs:count(1) |
└─IndexScan_39|2964754.00|cop |table:keyvalue, index:dataset_version_id, range:[NULL,+inf], keep order:false|
而实际查询大约需要2.6秒
trace format='row'从keyvalue中选择count(*)代码>
operation |startTS |duration |
---------------------|---------------|------------|
session.getTxnFuture |20:21:00.074939|6.455µs |
├─session.Execute |20:21:00.074937|999.484µs |
├─session.ParseSQL |20:21:00.074980|17.226µs |
├─executor.Compile |20:21:00.075010|340.281µs |
├─session.runStmt |20:21:00.075370|525.307µs |
├─session.CommitTxn|20:21:00.075882|3.542µs |
├─recordSet.Next |20:21:00.075946|2.585509798s|
├─streamAgg.Next |20:21:00.075948|2.585497556s|
├─tableReader.Next |20:21:00.075950|2.585418751s|
├─tableReader.Next |20:21:02.661433|2.77µs |
├─recordSet.Next |20:21:02.661488|11.319µs |
└─streamAgg.Next |20:21:02.661491|587ns |
我的tidb设置如下
storage--tidb-discovery-f96cbd845-kgbvx 1/1 Running 0 94d
storage--tidb-operator--controller-manager-fff86dd78-b7rmh 1/1 Running 0 3d19h
storage--tidb-pd-0 1/1 Running 0 3d18h
storage--tidb-pd-1 1/1 Running 0 3d18h
storage--tidb-pd-2 1/1 Running 0 3d18h
storage--tidb-tidb-0 2/2 Running 0 3d18h
storage--tidb-tidb-1 2/2 Running 0 3d18h
storage--tidb-tidb-initializer-9fff8f78d-gh4pr 1/1 Running 0 3d22h
storage--tidb-tikv-0 1/1 Running 0 3d18h
storage--tidb-tikv-1 1/1 Running 0 3d18h
storage--tidb-tikv-2 1/1 Running 0 3d18h
storage--tidb-tikv-3 1/1 Running 0 3d18h
TIDB版本
version() |
------------------|
5.7.25-TiDB-v3.0.4|
如何加快查询速度?我也很好奇为什么查询会选择它选择的索引。我是TiDB的开发人员。关于您的问题:
如何加快查询速度
表中有3000000行,SQL是一个非常简单的表。执行计划已经是最好的(部分聚合推送到TiKV)。因此,我建议您增加一些执行并发性,如下所示:
tidb(localhost:4000) > show variables like "%concurrency%";
+------------------------------------+-------+
| Variable_name | Value |
+------------------------------------+-------+
| innodb_commit_concurrency | 0 |
| innodb_concurrency_tickets | 5000 |
| innodb_thread_concurrency | 0 |
| thread_concurrency | 10 |
| tidb_build_stats_concurrency | 4 |
| tidb_checksum_table_concurrency | 4 |
| tidb_distsql_scan_concurrency | 15 |
| tidb_hash_join_concurrency | 5 |
| tidb_hashagg_final_concurrency | 4 |
| tidb_hashagg_partial_concurrency | 4 |
| tidb_index_lookup_concurrency | 4 |
| tidb_index_lookup_join_concurrency | 4 |
| tidb_index_serial_scan_concurrency | 1 |
| tidb_opt_concurrency_factor | 3 |
| tidb_projection_concurrency | 4 |
| tidb_window_concurrency | 4 |
+------------------------------------+-------+
16 rows in set (0.01 sec)
对于您的SQL,tidb\u distsql\u scan\u并发性
可能有效。最好将其设置为您的(CPU核心数/8*15)
。您可以使用set session/global tidb\u distsql\u scan\u concurrency=?
对其进行更改
为什么查询选择索引
因为count(*)
相当于count(1),索引键值对的字节小于表扫描的键值对。有一些博客供参考: