InfluxDB:特定碎片的高基数
我正在查询来自不同碎片的数据,并使用InfluxDB:特定碎片的高基数,influxdb,Influxdb,我正在查询来自不同碎片的数据,并使用EXPLAIN检查针对特定日期范围提取了多少序列 > SHOW SHARDS . . 658 mydb autogen 658 2019-07-22T00:00:00Z 2019-07-29T00:00:00Z 2020-07-27T00:00:00Z 676 mydb autogen 676 2019-07-29T00:00:00Z 2019-08-05T00:00:00Z 2020-
EXPLAIN
检查针对特定日期范围提取了多少序列
> SHOW SHARDS
.
.
658 mydb autogen 658 2019-07-22T00:00:00Z 2019-07-29T00:00:00Z 2020-07-27T00:00:00Z
676 mydb autogen 676 2019-07-29T00:00:00Z 2019-08-05T00:00:00Z 2020-08-03T00:00:00Z
.
.
对来自shard的数据执行EXPLAIN
658,就序列数而言给出了预期结果SensorId
是唯一的标记键,由于日期范围落入唯一的碎片,它给出的序列号为:1
> EXPLAIN select "kWh" from Reading where (SensorId =~ /^1186$/) AND time >= '2019-07-27 00:00:00' AND time <= '2019-07-28 00:00:00' limit 10;
QUERY PLAN
----------
EXPRESSION: <nil>
AUXILIARY FIELDS: "kWh"::float
NUMBER OF SHARDS: 1
NUMBER OF SERIES: 1
CACHED VALUES: 0
NUMBER OF FILES: 2
NUMBER OF BLOCKS: 4
SIZE OF BLOCKS: 32482
更新-2
我已经重建了索引,但基数仍然很高
更新-3
我发现shard有“SensorId”作为标记和字段,在使用“SensorId”过滤器进行查询时会导致高基数
但当我用键'SensorId'检查标记值时,它不会显示上面查询中存在的空字符串
> show tag values with key = "SensorId"
name: Reading
key value
--- -----
SensorId 10034
SensorId 10037
SensorId 10038
SensorId 10039
SensorId 10040
SensorId 10041
.
.
.
SensorId 9938
SensorId 9939
SensorId 9941
SensorId 9942
SensorId 9944
SensorId 9949
更新-4
使用influx\u inspect dumptsm
检查数据,并重新验证是否存在空标记值
$ influx_inspect dumptsm -index -filter-key "" /var/lib/influxdb/data/mydb/autogen/235/000008442-000000013.tsm
Index:
Pos Min Time Max Time Ofs Size Key Field
1 2019-08-01T01:46:31Z 2019-08-01T17:42:03Z 5 103 Reading 1001
2 2019-08-01T01:46:31Z 2019-08-01T17:42:03Z 108 275 Reading 2001
3 2019-08-01T01:46:31Z 2019-08-01T17:42:03Z 383 248 Reading 2002
4 2019-08-01T01:46:31Z 2019-08-01T17:42:03Z 631 278 Reading 2003
5 2019-08-01T01:46:31Z 2019-08-01T17:42:03Z 909 278 Reading 2004
6 2019-08-01T01:46:31Z 2019-08-01T17:42:03Z 1187 184 Reading 2005
7 2019-08-01T01:46:31Z 2019-08-01T17:42:03Z 1371 103 Reading 2006
8 2019-08-01T01:46:31Z 2019-08-01T17:42:03Z 1474 250 Reading 2007
9 2019-08-01T01:46:31Z 2019-08-01T17:42:03Z 1724 103 Reading 2008
10 2019-08-01T01:46:31Z 2019-08-01T17:42:03Z 1827 275 Reading 2012
11 2019-08-01T01:46:31Z 2019-08-01T17:42:03Z 2102 416 Reading 2101
12 2019-08-01T01:46:31Z 2019-08-01T17:42:03Z 2518 103 Reading 2692
13 2019-08-01T01:46:31Z 2019-08-01T17:42:03Z 2621 101 Reading SensorId
14 2019-07-29T00:00:05Z 2019-07-29T05:31:07Z 2722 1569 Reading,SensorId=10034 2005
15 2019-07-29T05:31:26Z 2019-07-29T11:03:54Z 4291 1467 Reading,SensorId=10034 2005
16 2019-07-29T11:04:14Z 2019-07-29T17:10:16Z 5758 1785 Reading,SensorId=10034 2005
如果需要减少大量时间序列的RAM使用,请查看其他时间序列数据库,如TimescaleDB或VictoriaMetrics。请参阅,它比较了不同基数级别的InfluxDB和VictoriaMetrics的RAM使用情况。这并不是时间序列的数量太多,而是一个错误,它在不同的碎片中显示了两个不同的序列基数(具有相等的数据分布)。
> SELECT COUNT("SensorId") from Reading GROUP BY "SensorId";
name: Reading
tags: SensorId=
time count
---- -----
1970-01-01T00:00:00Z 40
> show tag values with key = "SensorId"
name: Reading
key value
--- -----
SensorId 10034
SensorId 10037
SensorId 10038
SensorId 10039
SensorId 10040
SensorId 10041
.
.
.
SensorId 9938
SensorId 9939
SensorId 9941
SensorId 9942
SensorId 9944
SensorId 9949
$ influx_inspect dumptsm -index -filter-key "" /var/lib/influxdb/data/mydb/autogen/235/000008442-000000013.tsm
Index:
Pos Min Time Max Time Ofs Size Key Field
1 2019-08-01T01:46:31Z 2019-08-01T17:42:03Z 5 103 Reading 1001
2 2019-08-01T01:46:31Z 2019-08-01T17:42:03Z 108 275 Reading 2001
3 2019-08-01T01:46:31Z 2019-08-01T17:42:03Z 383 248 Reading 2002
4 2019-08-01T01:46:31Z 2019-08-01T17:42:03Z 631 278 Reading 2003
5 2019-08-01T01:46:31Z 2019-08-01T17:42:03Z 909 278 Reading 2004
6 2019-08-01T01:46:31Z 2019-08-01T17:42:03Z 1187 184 Reading 2005
7 2019-08-01T01:46:31Z 2019-08-01T17:42:03Z 1371 103 Reading 2006
8 2019-08-01T01:46:31Z 2019-08-01T17:42:03Z 1474 250 Reading 2007
9 2019-08-01T01:46:31Z 2019-08-01T17:42:03Z 1724 103 Reading 2008
10 2019-08-01T01:46:31Z 2019-08-01T17:42:03Z 1827 275 Reading 2012
11 2019-08-01T01:46:31Z 2019-08-01T17:42:03Z 2102 416 Reading 2101
12 2019-08-01T01:46:31Z 2019-08-01T17:42:03Z 2518 103 Reading 2692
13 2019-08-01T01:46:31Z 2019-08-01T17:42:03Z 2621 101 Reading SensorId
14 2019-07-29T00:00:05Z 2019-07-29T05:31:07Z 2722 1569 Reading,SensorId=10034 2005
15 2019-07-29T05:31:26Z 2019-07-29T11:03:54Z 4291 1467 Reading,SensorId=10034 2005
16 2019-07-29T11:04:14Z 2019-07-29T17:10:16Z 5758 1785 Reading,SensorId=10034 2005