InfluxDB:特定碎片的高基数

InfluxDB:特定碎片的高基数,influxdb,Influxdb,我正在查询来自不同碎片的数据,并使用EXPLAIN检查针对特定日期范围提取了多少序列 > SHOW SHARDS . . 658 mydb autogen 658 2019-07-22T00:00:00Z 2019-07-29T00:00:00Z 2020-07-27T00:00:00Z 676 mydb autogen 676 2019-07-29T00:00:00Z 2019-08-05T00:00:00Z 2020-

我正在查询来自不同碎片的数据,并使用
EXPLAIN
检查针对特定日期范围提取了多少序列

> SHOW SHARDS
.
.
658 mydb autogen          658         2019-07-22T00:00:00Z 2019-07-29T00:00:00Z 2020-07-27T00:00:00Z
676 mydb autogen          676         2019-07-29T00:00:00Z 2019-08-05T00:00:00Z 2020-08-03T00:00:00Z
.
.
对来自shard的数据执行
EXPLAIN
658,就序列数而言给出了预期结果
SensorId
是唯一的标记键,由于日期范围落入唯一的碎片,它给出的序列号为:1

> EXPLAIN select "kWh" from Reading where (SensorId =~ /^1186$/) AND time >= '2019-07-27 00:00:00' AND time <= '2019-07-28 00:00:00' limit 10;
QUERY PLAN
----------
EXPRESSION: <nil>
AUXILIARY FIELDS: "kWh"::float
NUMBER OF SHARDS: 1
NUMBER OF SERIES: 1
CACHED VALUES: 0
NUMBER OF FILES: 2
NUMBER OF BLOCKS: 4
SIZE OF BLOCKS: 32482
更新-2

我已经重建了索引,但基数仍然很高

更新-3

我发现shard有“SensorId”作为标记和字段,在使用“SensorId”过滤器进行查询时会导致高基数

但当我用键'SensorId'检查标记值时,它不会显示上面查询中存在的空字符串

> show tag values with key = "SensorId"
name: Reading
key      value
---      -----
SensorId 10034
SensorId 10037
SensorId 10038
SensorId 10039
SensorId 10040
SensorId 10041
.
.
.
SensorId 9938
SensorId 9939
SensorId 9941
SensorId 9942
SensorId 9944
SensorId 9949
更新-4

使用
influx\u inspect dumptsm
检查数据,并重新验证是否存在空标记值

$ influx_inspect dumptsm -index -filter-key "" /var/lib/influxdb/data/mydb/autogen/235/000008442-000000013.tsm

Index:

  Pos   Min Time                Max Time                Ofs     Size    Key                     Field
  1     2019-08-01T01:46:31Z    2019-08-01T17:42:03Z    5       103     Reading                 1001
  2     2019-08-01T01:46:31Z    2019-08-01T17:42:03Z    108     275     Reading                 2001
  3     2019-08-01T01:46:31Z    2019-08-01T17:42:03Z    383     248     Reading                 2002
  4     2019-08-01T01:46:31Z    2019-08-01T17:42:03Z    631     278     Reading                 2003
  5     2019-08-01T01:46:31Z    2019-08-01T17:42:03Z    909     278     Reading                 2004
  6     2019-08-01T01:46:31Z    2019-08-01T17:42:03Z    1187    184     Reading                 2005
  7     2019-08-01T01:46:31Z    2019-08-01T17:42:03Z    1371    103     Reading                 2006
  8     2019-08-01T01:46:31Z    2019-08-01T17:42:03Z    1474    250     Reading                 2007
  9     2019-08-01T01:46:31Z    2019-08-01T17:42:03Z    1724    103     Reading                 2008
  10    2019-08-01T01:46:31Z    2019-08-01T17:42:03Z    1827    275     Reading                 2012
  11    2019-08-01T01:46:31Z    2019-08-01T17:42:03Z    2102    416     Reading                 2101
  12    2019-08-01T01:46:31Z    2019-08-01T17:42:03Z    2518    103     Reading                 2692
  13    2019-08-01T01:46:31Z    2019-08-01T17:42:03Z    2621    101     Reading                 SensorId
  14    2019-07-29T00:00:05Z    2019-07-29T05:31:07Z    2722    1569    Reading,SensorId=10034  2005
  15    2019-07-29T05:31:26Z    2019-07-29T11:03:54Z    4291    1467    Reading,SensorId=10034  2005
  16    2019-07-29T11:04:14Z    2019-07-29T17:10:16Z    5758    1785    Reading,SensorId=10034  2005


如果需要减少大量时间序列的RAM使用,请查看其他时间序列数据库,如TimescaleDB或VictoriaMetrics。请参阅,它比较了不同基数级别的InfluxDB和VictoriaMetrics的RAM使用情况。这并不是时间序列的数量太多,而是一个错误,它在不同的碎片中显示了两个不同的序列基数(具有相等的数据分布)。
> SELECT COUNT("SensorId") from Reading GROUP BY "SensorId";
name: Reading
tags: SensorId=
time                 count
----                 -----
1970-01-01T00:00:00Z 40
> show tag values with key = "SensorId"
name: Reading
key      value
---      -----
SensorId 10034
SensorId 10037
SensorId 10038
SensorId 10039
SensorId 10040
SensorId 10041
.
.
.
SensorId 9938
SensorId 9939
SensorId 9941
SensorId 9942
SensorId 9944
SensorId 9949
$ influx_inspect dumptsm -index -filter-key "" /var/lib/influxdb/data/mydb/autogen/235/000008442-000000013.tsm

Index:

  Pos   Min Time                Max Time                Ofs     Size    Key                     Field
  1     2019-08-01T01:46:31Z    2019-08-01T17:42:03Z    5       103     Reading                 1001
  2     2019-08-01T01:46:31Z    2019-08-01T17:42:03Z    108     275     Reading                 2001
  3     2019-08-01T01:46:31Z    2019-08-01T17:42:03Z    383     248     Reading                 2002
  4     2019-08-01T01:46:31Z    2019-08-01T17:42:03Z    631     278     Reading                 2003
  5     2019-08-01T01:46:31Z    2019-08-01T17:42:03Z    909     278     Reading                 2004
  6     2019-08-01T01:46:31Z    2019-08-01T17:42:03Z    1187    184     Reading                 2005
  7     2019-08-01T01:46:31Z    2019-08-01T17:42:03Z    1371    103     Reading                 2006
  8     2019-08-01T01:46:31Z    2019-08-01T17:42:03Z    1474    250     Reading                 2007
  9     2019-08-01T01:46:31Z    2019-08-01T17:42:03Z    1724    103     Reading                 2008
  10    2019-08-01T01:46:31Z    2019-08-01T17:42:03Z    1827    275     Reading                 2012
  11    2019-08-01T01:46:31Z    2019-08-01T17:42:03Z    2102    416     Reading                 2101
  12    2019-08-01T01:46:31Z    2019-08-01T17:42:03Z    2518    103     Reading                 2692
  13    2019-08-01T01:46:31Z    2019-08-01T17:42:03Z    2621    101     Reading                 SensorId
  14    2019-07-29T00:00:05Z    2019-07-29T05:31:07Z    2722    1569    Reading,SensorId=10034  2005
  15    2019-07-29T05:31:26Z    2019-07-29T11:03:54Z    4291    1467    Reading,SensorId=10034  2005
  16    2019-07-29T11:04:14Z    2019-07-29T17:10:16Z    5758    1785    Reading,SensorId=10034  2005