GeoMesa HBase查询的奇怪行为

GeoMesa HBase查询的奇怪行为,hbase,geomesa,Hbase,Geomesa,我有一个关于HBase查询的问题。我看到很多数据被扫描用于小空间查询。我在OSMNodes表上启动了地理空间查询。下面是查询和表的详细信息。我看到HBase上的读取请求总数为5553421708,并且在大多数区域和区域服务器上看到了请求。我们为什么要扫描整个表中的每个区域来进行此查询 **Query**: "DWITHIN(geometry, POINT(-122.332426 47.607282), 50, meters) AND ingestionTimestamp <= '2020

我有一个关于HBase查询的问题。我看到很多数据被扫描用于小空间查询。我在OSMNodes表上启动了地理空间查询。下面是查询和表的详细信息。我看到HBase上的读取请求总数为5553421708,并且在大多数区域和区域服务器上看到了请求。我们为什么要扫描整个表中的每个区域来进行此查询

**Query**:

"DWITHIN(geometry, POINT(-122.332426 47.607282), 50, meters) AND ingestionTimestamp <= '2020-05-27 16:59:31' AND nextTimestamp > '2020-05-27 16:59:31'"
其他问题[1]

在做实验时,当我添加较低的时间戳时,延迟从2-3小时减少到了10-20分钟

geomesa-hbase explain -c atlas -f OSMNodes -q "DWITHIN(geometry, POINT(-122.332426 47.607282), 50, meters) AND ingestionTimestamp <= '2020-05-27 16:59:31' AND ingestionTimestamp >= '2019-05-27 16:59:31' AND nextTimestamp > '2020-05-27 16:59:31'"


Planning 'OSMNodes' ((DWITHIN(geometry, POINT (-122.332426 47.607282), 50.0, meters) AND ingestionTimestamp <= 2020-05-27T16:59:31+00:00) AND ingestionTimestamp >= 2019-05-27T16:59:31+00:00) AND nextTimestamp > 2020-05-27T16:59:31+00:00
  Original filter: ((DWITHIN(geometry, POINT (-122.332426 47.607282), 50.0, meters) AND ingestionTimestamp <= '2020-05-27 16:59:31') AND ingestionTimestamp >= '2019-05-27 16:59:31') AND nextTimestamp > '2020-05-27 16:59:31'
  Hints: bin[false] arrow[false] density[false] stats[false] sampling[none]
  Sort: none
  Transforms: none
  Strategy selection:
    Query processing took 24ms for 1 options
    Filter plan: FilterPlan[Z3Index(geometry,ingestionTimestamp)[DWITHIN(geometry, POINT (-122.332426 47.607282), 50.0, meters) AND (ingestionTimestamp >= 2019-05-27T16:59:31+00:00 AND ingestionTimestamp <= 2020-05-27T16:59:31+00:00)][nextTimestamp > 2020-05-27T16:59:31+00:00]]
    Strategy selection took 2ms for 1 options
  Strategy 1 of 1: Z3Index(geometry,ingestionTimestamp)
    Strategy filter: Z3Index(geometry,ingestionTimestamp)[DWITHIN(geometry, POINT (-122.332426 47.607282), 50.0, meters) AND (ingestionTimestamp >= 2019-05-27T16:59:31+00:00 AND ingestionTimestamp <= 2020-05-27T16:59:31+00:00)][nextTimestamp > 2020-05-27T16:59:31+00:00]
    Geometries: FilterValues(List(POLYGON ((-122.3317610175119 47.607282, -122.33177379496394 47.60715226835226, -122.33181163628976 47.607027522218985, -122.33187308726842 47.606912555524126, -122.33195578637329 47.606811786373285, -122.33205655552413 47.60672908726843, -122.33217152221899 47.60666763628976, -122.33229626835225 47.606629794963936, -122.332426 47.606617017511894, -122.33255573164774 47.606629794963936, -122.33268047778101 47.60666763628976, -122.33279544447586 47.60672908726843, -122.33289621362671 47.606811786373285, -122.33297891273158 47.606912555524126, -122.33304036371024 47.607027522218985, -122.33307820503606 47.60715226835226, -122.3330909824881 47.607282, -122.33307820503606 47.60741173164774, -122.33304036371024 47.60753647778101, -122.33297891273158 47.60765144447587, -122.33289621362671 47.60775221362671, -122.33279544447586 47.60783491273157, -122.33268047778101 47.60789636371024, -122.33255573164774 47.60793420503606, -122.332426 47.6079469824881, -122.33229626835225 47.60793420503606, -122.33217152221899 47.60789636371024, -122.33205655552413 47.60783491273157, -122.33195578637329 47.60775221362671, -122.33187308726842 47.60765144447587, -122.33181163628976 47.60753647778101, -122.33177379496394 47.60741173164774, -122.3317610175119 47.607282))),true,false)
    Intervals: FilterValues(List([2019-05-27T16:59:31Z,2020-05-27T16:59:31Z]),true,false)
    Plan: ScanPlan
      Tables: atlas_OSMNodes_z3_geometry_ingestionTimestamp_v6
      Ranges (404100): [%00;%0a;4$A%08;%00;%00;%00;%00;%00;::%00;%0a;4$A%0c;], [%01;%0a;4$A%08;%00;%00;%00;%00;%00;::%01;%0a;4$A%0c;], [%02;%0a;4$A%08;%00;%00;%00;%00;%00;::%02;%0a;4$A%0c;], [%03;%0a;4$A%08;%00;%00;%00;%00;%00;::%03;%0a;4$A%0c;], [%04;%0a;4$A%08;%00;%00;%00;%00;%00;::%04;%0a;4$A%0c;]
      Scans (4080): [2%0a;/la%08;%00;%00;%00;%00;%00;::2%0a;0da%9c;], [%18;%0a;$mA%08;%00;%00;%00;%00;%00;::%18;%0a;%eA%9c;], [%03;%0a;@%e%08;%00;%00;%00;%00;%00;::%03;%0a;@me%9c;], [%0f;%0a;%eE%08;%00;%00;%00;%00;%00;::%0f;%0a;&-E%9c;], ['%0a;Bda%08;%00;%00;%00;%00;%00;::'%0a;C,a%9c;]
      Column families: d
      Remote filters: MultiRowRangeFilter, Z3HBaseFilter[(epoch,2577:2629),(zt,1410483:2097151,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0:2009670),(zxy,335934:1603233:335941:1603248)], CqlFilter[(DWITHIN(geometry, POINT (-122.332426 47.607282), 50.0, meters) AND (ingestionTimestamp >= 2019-05-27T16:59:31+00:00 AND ingestionTimestamp <= 2020-05-27T16:59:31+00:00)) AND nextTimestamp > 2020-05-27T16:59:31+00:00]
    Plan creation took 475ms
  Query planning took 813ms

有这么大差异的原因是什么?

首先,感谢您提供索引信息和查询解释程序输出。这有助于我们更容易回答

当使用z3索引时,如果日期范围上只有一个上限或类似的下限,则索引空间中的范围将被牵连。对于每个分割,将扫描相同模式的z3范围,因此有60个分割将导致大量范围需要扫描,并且这些范围可能会分布在HBase集群中

有几种可能的方法可以尝试: 1.用较少的范围重新调整 2.添加z2空间索引以帮助处理这些类型的查询。空间谓词将返回少量记录,这些记录将被进一步过滤
3.如果你能增加一个较低的时间界限,这可能是不可能的。在某些用例中,它确实是有意义的。

如果我运行OtherQuery[1]而不是上面的一个,其中摄入时间戳始终是摄入时间戳>='2019-05-27 16:59:31',会怎么样?它会提高性能吗?查询计划附在上面。ingestionTimestamp='2019-05-27 16:59:31'逻辑上意味着ingestionTimestamp正好是'2020-05-27 16:59:31'。如果您想通过这样的确切日期查询,您可能会考虑在InestOnTimeStand上有一个属性索引。您可以将z2作为复合“次要”索引。这可以让您尽可能快地查询精确的时间和几何体。除此之外,我猜OtherQuery[1]的性能可能会更快,因为它应该扫描更少的数据。也就是说,与扫描的数量相比,范围的数量非常高。这意味着可能会扫描一些不必要的数据。GeoMesa试图避免创建过多的扫描。它加入了实现这一目标的范围。在这些情况下,60次分裂可能会导致速度减慢。这就是说,这只是一个变量的实验!
**Query Plan (through GeoMesa Cli):**
geomesa-hbase explain -c atlas -f OSMNodes -q "DWITHIN(geometry, POINT(-122.332426 47.607282), 50, meters) AND ingestionTimestamp <= '2020-05-27 16:59:31' AND nextTimestamp > '2020-05-27 16:59:31'"

Planning 'OSMNodes' (DWITHIN(geometry, POINT (-122.332426 47.607282), 50.0, meters) AND ingestionTimestamp <= 2020-05-27T16:59:31+00:00) AND nextTimestamp > 2020-05-27T16:59:31+00:00
  Original filter: (DWITHIN(geometry, POINT (-122.332426 47.607282), 50.0, meters) AND ingestionTimestamp <= '2020-05-27 16:59:31') AND nextTimestamp > '2020-05-27 16:59:31'
  Hints: bin[false] arrow[false] density[false] stats[false] sampling[none]
  Sort: none
  Transforms: none
  Strategy selection:
    Query processing took 17ms for 1 options
    Filter plan: FilterPlan[Z3Index(geometry,ingestionTimestamp)[DWITHIN(geometry, POINT (-122.332426 47.607282), 50.0, meters) AND ingestionTimestamp <= 2020-05-27T16:59:31+00:00][nextTimestamp > 2020-05-27T16:59:31+00:00]]
    Strategy selection took 1ms for 1 options
  Strategy 1 of 1: Z3Index(geometry,ingestionTimestamp)
    Strategy filter: Z3Index(geometry,ingestionTimestamp)[DWITHIN(geometry, POINT (-122.332426 47.607282), 50.0, meters) AND ingestionTimestamp <= 2020-05-27T16:59:31+00:00][nextTimestamp > 2020-05-27T16:59:31+00:00]
    Geometries: FilterValues(List(POLYGON ((-122.3317610175119 47.607282, -122.33177379496394 47.60715226835226, -122.33181163628976 47.607027522218985, -122.33187308726842 47.606912555524126, -122.33195578637329 47.606811786373285, -122.33205655552413 47.60672908726843, -122.33217152221899 47.60666763628976, -122.33229626835225 47.606629794963936, -122.332426 47.606617017511894, -122.33255573164774 47.606629794963936, -122.33268047778101 47.60666763628976, -122.33279544447586 47.60672908726843, -122.33289621362671 47.606811786373285, -122.33297891273158 47.606912555524126, -122.33304036371024 47.607027522218985, -122.33307820503606 47.60715226835226, -122.3330909824881 47.607282, -122.33307820503606 47.60741173164774, -122.33304036371024 47.60753647778101, -122.33297891273158 47.60765144447587, -122.33289621362671 47.60775221362671, -122.33279544447586 47.60783491273157, -122.33268047778101 47.60789636371024, -122.33255573164774 47.60793420503606, -122.332426 47.6079469824881, -122.33229626835225 47.60793420503606, -122.33217152221899 47.60789636371024, -122.33205655552413 47.60783491273157, -122.33195578637329 47.60775221362671, -122.33187308726842 47.60765144447587, -122.33181163628976 47.60753647778101, -122.33177379496394 47.60741173164774, -122.3317610175119 47.607282))),true,false)
    Intervals: FilterValues(List((-∞,2020-05-27T16:59:31Z]),true,false)
    Plan: ScanPlan
      Tables: atlas_OSMNodes_z3_geometry_ingestionTimestamp_v6
      Ranges (7440): [%00;%0a;E$A%08;%00;%00;%00;%00;%00;::%00;%0a;E$A%0c;], [%01;%0a;E$A%08;%00;%00;%00;%00;%00;::%01;%0a;E$A%0c;], [%02;%0a;E$A%08;%00;%00;%00;%00;%00;::%02;%0a;E$A%0c;], [%03;%0a;E$A%08;%00;%00;%00;%00;%00;::%03;%0a;E$A%0c;], [%04;%0a;E$A%08;%00;%00;%00;%00;%00;::%04;%0a;E$A%0c;]
      Scans (120): ['%0a;ElA%98;%00;%00;%00;%00;%00;::'%0a;Ema%8c;], [:%0a;ElA%98;%00;%00;%00;%00;%00;:::%0a;Ema%8c;], [%14;::%14;%0a;ElA%8c;], [(%0a;ElA%98;%00;%00;%00;%00;%00;::(%0a;Ema%8c;], [%12;%0a;ElA%98;%00;%00;%00;%00;%00;::%12;%0a;Ema%8c;]
      Column families: d
      Remote filters: MultiRowRangeFilter, Z3HBaseFilter[(epoch,2629:2629),(zt,0:2009670),(zxy,335934:1603233:335941:1603248)], CqlFilter[(DWITHIN(geometry, POINT (-122.332426 47.607282), 50.0, meters) AND ingestionTimestamp <= 2020-05-27T16:59:31+00:00) AND nextTimestamp > 2020-05-27T16:59:31+00:00]
    Plan creation took 135ms
  Query planning took 433ms
geomesa-hbase explain -c atlas -f OSMNodes -q "DWITHIN(geometry, POINT(-122.332426 47.607282), 50, meters) AND ingestionTimestamp <= '2020-05-27 16:59:31' AND ingestionTimestamp >= '2019-05-27 16:59:31' AND nextTimestamp > '2020-05-27 16:59:31'"


Planning 'OSMNodes' ((DWITHIN(geometry, POINT (-122.332426 47.607282), 50.0, meters) AND ingestionTimestamp <= 2020-05-27T16:59:31+00:00) AND ingestionTimestamp >= 2019-05-27T16:59:31+00:00) AND nextTimestamp > 2020-05-27T16:59:31+00:00
  Original filter: ((DWITHIN(geometry, POINT (-122.332426 47.607282), 50.0, meters) AND ingestionTimestamp <= '2020-05-27 16:59:31') AND ingestionTimestamp >= '2019-05-27 16:59:31') AND nextTimestamp > '2020-05-27 16:59:31'
  Hints: bin[false] arrow[false] density[false] stats[false] sampling[none]
  Sort: none
  Transforms: none
  Strategy selection:
    Query processing took 24ms for 1 options
    Filter plan: FilterPlan[Z3Index(geometry,ingestionTimestamp)[DWITHIN(geometry, POINT (-122.332426 47.607282), 50.0, meters) AND (ingestionTimestamp >= 2019-05-27T16:59:31+00:00 AND ingestionTimestamp <= 2020-05-27T16:59:31+00:00)][nextTimestamp > 2020-05-27T16:59:31+00:00]]
    Strategy selection took 2ms for 1 options
  Strategy 1 of 1: Z3Index(geometry,ingestionTimestamp)
    Strategy filter: Z3Index(geometry,ingestionTimestamp)[DWITHIN(geometry, POINT (-122.332426 47.607282), 50.0, meters) AND (ingestionTimestamp >= 2019-05-27T16:59:31+00:00 AND ingestionTimestamp <= 2020-05-27T16:59:31+00:00)][nextTimestamp > 2020-05-27T16:59:31+00:00]
    Geometries: FilterValues(List(POLYGON ((-122.3317610175119 47.607282, -122.33177379496394 47.60715226835226, -122.33181163628976 47.607027522218985, -122.33187308726842 47.606912555524126, -122.33195578637329 47.606811786373285, -122.33205655552413 47.60672908726843, -122.33217152221899 47.60666763628976, -122.33229626835225 47.606629794963936, -122.332426 47.606617017511894, -122.33255573164774 47.606629794963936, -122.33268047778101 47.60666763628976, -122.33279544447586 47.60672908726843, -122.33289621362671 47.606811786373285, -122.33297891273158 47.606912555524126, -122.33304036371024 47.607027522218985, -122.33307820503606 47.60715226835226, -122.3330909824881 47.607282, -122.33307820503606 47.60741173164774, -122.33304036371024 47.60753647778101, -122.33297891273158 47.60765144447587, -122.33289621362671 47.60775221362671, -122.33279544447586 47.60783491273157, -122.33268047778101 47.60789636371024, -122.33255573164774 47.60793420503606, -122.332426 47.6079469824881, -122.33229626835225 47.60793420503606, -122.33217152221899 47.60789636371024, -122.33205655552413 47.60783491273157, -122.33195578637329 47.60775221362671, -122.33187308726842 47.60765144447587, -122.33181163628976 47.60753647778101, -122.33177379496394 47.60741173164774, -122.3317610175119 47.607282))),true,false)
    Intervals: FilterValues(List([2019-05-27T16:59:31Z,2020-05-27T16:59:31Z]),true,false)
    Plan: ScanPlan
      Tables: atlas_OSMNodes_z3_geometry_ingestionTimestamp_v6
      Ranges (404100): [%00;%0a;4$A%08;%00;%00;%00;%00;%00;::%00;%0a;4$A%0c;], [%01;%0a;4$A%08;%00;%00;%00;%00;%00;::%01;%0a;4$A%0c;], [%02;%0a;4$A%08;%00;%00;%00;%00;%00;::%02;%0a;4$A%0c;], [%03;%0a;4$A%08;%00;%00;%00;%00;%00;::%03;%0a;4$A%0c;], [%04;%0a;4$A%08;%00;%00;%00;%00;%00;::%04;%0a;4$A%0c;]
      Scans (4080): [2%0a;/la%08;%00;%00;%00;%00;%00;::2%0a;0da%9c;], [%18;%0a;$mA%08;%00;%00;%00;%00;%00;::%18;%0a;%eA%9c;], [%03;%0a;@%e%08;%00;%00;%00;%00;%00;::%03;%0a;@me%9c;], [%0f;%0a;%eE%08;%00;%00;%00;%00;%00;::%0f;%0a;&-E%9c;], ['%0a;Bda%08;%00;%00;%00;%00;%00;::'%0a;C,a%9c;]
      Column families: d
      Remote filters: MultiRowRangeFilter, Z3HBaseFilter[(epoch,2577:2629),(zt,1410483:2097151,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0:2009670),(zxy,335934:1603233:335941:1603248)], CqlFilter[(DWITHIN(geometry, POINT (-122.332426 47.607282), 50.0, meters) AND (ingestionTimestamp >= 2019-05-27T16:59:31+00:00 AND ingestionTimestamp <= 2020-05-27T16:59:31+00:00)) AND nextTimestamp > 2020-05-27T16:59:31+00:00]
    Plan creation took 475ms
  Query planning took 813ms