Database design Cassandra数据模型选项,所有潜在阅读类型的大量列,还是地图集合?

Database design Cassandra数据模型选项,所有潜在阅读类型的大量列,还是地图集合?,database-design,cassandra,cql,Database Design,Cassandra,Cql,我们计划在卡桑德拉存储时间序列传感器数据。每个传感器在每个采样时间点可以有多个数据点。我想将每个设备的所有数据点存储在一起 我的一个想法是为我们可能收集的各种数据类型创建所有可能的列: CREATE TABLE ddata ( deviceID int, day timestamp, timepoint timestamp, aparentPower int, actualPower int, actualEnergy int, temperature float,

我们计划在卡桑德拉存储时间序列传感器数据。每个传感器在每个采样时间点可以有多个数据点。我想将每个设备的所有数据点存储在一起

我的一个想法是为我们可能收集的各种数据类型创建所有可能的列:

CREATE TABLE ddata (
  deviceID int,
  day timestamp,
  timepoint timestamp, 
  aparentPower int,
  actualPower int,
  actualEnergy int,
  temperature float,
  humidity float,
  ppmCO2 int,
  etc, etc, etc...
  PRIMARY KEY ((deviceID,day),timepoint)
) WITH
  clustering order by (timepoint DESC);

insert into ddata (deviceID,day,timepoint,temperature,humidity) values (1000001,'2013-09-02','2013-09-02 00:00:04',93,97.3);

 deviceid | day                      | timepoint                | actualenergy | actualpower | aparentpower | event | humidity | ppmco2 | temperature
----------+--------------------------+--------------------------+--------------+-------------+--------------+-------+----------+--------+-------------
  1000001 | 2013-09-02 00:00:00-0700 | 2013-09-02 00:00:04-0700 |         null |        null |         null |  null |     97.3 |   null |          93
  1000001 | 2013-09-02 00:00:00-0700 | 2013-09-02 00:00:03-0700 |         null |        null |         null |  null |     null |   null |          92
  1000001 | 2013-09-02 00:00:00-0700 | 2013-09-02 00:00:02-0700 |         null |        null |         null |  null |     null |   null |          91
  1000001 | 2013-09-02 00:00:00-0700 | 2013-09-02 00:00:01-0700 |         null |        null |         null |  null |     null |   null |          90
CREATE TABLE ddata (
  deviceID int,
  day timestamp,
  timepoint timestamp, 
  feeds map<text,int>,
  PRIMARY KEY ((deviceID,day),timepoint)
) WITH
  clustering order by (timepoint DESC);

insert into ddata (deviceID,day,timepoint,feeds) values (1000001,'2013-09-01','2013-09-01 00:00:04',{'temp':73,'humidity':99});

 deviceid | day                      | timepoint                | event      | feeds
----------+--------------------------+--------------------------+------------+----------------------------------------------------------
  1000001 | 2013-09-02 00:00:00-0700 | 2013-09-02 00:00:04-0700 |       null |                             {'humidity': 97, 'temp': 93}
  1000001 | 2013-09-02 00:00:00-0700 | 2013-09-02 00:00:03-0700 |       null |                                             {'temp': 92}
  1000001 | 2013-09-02 00:00:00-0700 | 2013-09-02 00:00:02-0700 |       null |                                             {'temp': 91}
  1000001 | 2013-09-02 00:00:00-0700 | 2013-09-02 00:00:01-0700 |       null |                                             {'temp': 90}

另一个想法是创建给定设备可能报告的各种数据点的地图集合:

CREATE TABLE ddata (
  deviceID int,
  day timestamp,
  timepoint timestamp, 
  aparentPower int,
  actualPower int,
  actualEnergy int,
  temperature float,
  humidity float,
  ppmCO2 int,
  etc, etc, etc...
  PRIMARY KEY ((deviceID,day),timepoint)
) WITH
  clustering order by (timepoint DESC);

insert into ddata (deviceID,day,timepoint,temperature,humidity) values (1000001,'2013-09-02','2013-09-02 00:00:04',93,97.3);

 deviceid | day                      | timepoint                | actualenergy | actualpower | aparentpower | event | humidity | ppmco2 | temperature
----------+--------------------------+--------------------------+--------------+-------------+--------------+-------+----------+--------+-------------
  1000001 | 2013-09-02 00:00:00-0700 | 2013-09-02 00:00:04-0700 |         null |        null |         null |  null |     97.3 |   null |          93
  1000001 | 2013-09-02 00:00:00-0700 | 2013-09-02 00:00:03-0700 |         null |        null |         null |  null |     null |   null |          92
  1000001 | 2013-09-02 00:00:00-0700 | 2013-09-02 00:00:02-0700 |         null |        null |         null |  null |     null |   null |          91
  1000001 | 2013-09-02 00:00:00-0700 | 2013-09-02 00:00:01-0700 |         null |        null |         null |  null |     null |   null |          90
CREATE TABLE ddata (
  deviceID int,
  day timestamp,
  timepoint timestamp, 
  feeds map<text,int>,
  PRIMARY KEY ((deviceID,day),timepoint)
) WITH
  clustering order by (timepoint DESC);

insert into ddata (deviceID,day,timepoint,feeds) values (1000001,'2013-09-01','2013-09-01 00:00:04',{'temp':73,'humidity':99});

 deviceid | day                      | timepoint                | event      | feeds
----------+--------------------------+--------------------------+------------+----------------------------------------------------------
  1000001 | 2013-09-02 00:00:00-0700 | 2013-09-02 00:00:04-0700 |       null |                             {'humidity': 97, 'temp': 93}
  1000001 | 2013-09-02 00:00:00-0700 | 2013-09-02 00:00:03-0700 |       null |                                             {'temp': 92}
  1000001 | 2013-09-02 00:00:00-0700 | 2013-09-02 00:00:02-0700 |       null |                                             {'temp': 91}
  1000001 | 2013-09-02 00:00:00-0700 | 2013-09-02 00:00:01-0700 |       null |                                             {'temp': 90}
创建表数据(
设备ID int,
日期时间戳,
时间点时间戳,
提供地图,
主键((设备ID,天),时间点)
)与
聚类顺序(时间点描述);
将值(1000001、'2013-09-01'、'2013-09-01 00:00:04'、{'temp':73、'湿度]:99}插入ddata(设备ID、日期、时间点、提要)中;
设备ID |天|时间点|事件|源
----------+--------------------------+--------------------------+------------+----------------------------------------------------------
1000001 | 2013-09-02 00:00:00-0700 | 2013-09-02 00:00:04-0700 |空|{“湿度”:97,“温度”:93}
1000001 | 2013-09-02 00:00:00-0700 | 2013-09-02 00:00:03-0700 |空|{'temp':92}
1000001 | 2013-09-02 00:00:00-0700 | 2013-09-02 00:00:02-0700 |空|{'temp':91}
1000001 | 2013-09-02 00:00:00-0700 | 2013-09-02 00:00:01-0700 |空|{'temp':90}
人们对这两种选择有什么想法

  • 从我所看到的情况来看,第一个选项将允许更好地键入不同的数据类型(int与float),但会使表有点难看
  • 如果我避免使用集合类型,性能会更好吗
  • 在添加新的传感器数据类型时,是否不断添加额外的列

  • 我还应该考虑哪些因素呢?
  • 对于这个场景,人们还有哪些其他的数据建模想法
谢谢,
Chris

本质上,由于我们不知道会有多少个测量值到达,我们需要一种动态的方式来描述列族中的情况

正如您在第二个示例中所指出的,CQL提供了用于保存动态集合的映射数据类型


第二个是首选。但也取决于您可能发出的查询。要从“feed”中获取“temp”,应用程序必须解析映射输出。

我可以看到的直接优点和缺点:

    • 使用
      map
      列将允许您拥有“无限”指标。(注意,我认为在
      地图中可以存储多少数据是有限制的)
    • 您将无法从
      映射中读取单个值;如果每个度量都有列,那么一次只能读取一个值;您仍然可以更新
      映射中的单个值
  • 正如您在问题中提到的,您在
    map
  • 这些是我能看到的最明显的区别