Database design Cassandra数据模型选项，所有潜在阅读类型的大量列，还是地图集合？_Database Design_Cassandra_Cql

Database design Cassandra数据模型选项，所有潜在阅读类型的大量列，还是地图集合？

database-design cassandra

Database design Cassandra数据模型选项，所有潜在阅读类型的大量列，还是地图集合？,database-design,cassandra,cql,Database Design,Cassandra,Cql,我们计划在卡桑德拉存储时间序列传感器数据。每个传感器在每个采样时间点可以有多个数据点。我想将每个设备的所有数据点存储在一起我的一个想法是为我们可能收集的各种数据类型创建所有可能的列： CREATE TABLE ddata ( deviceID int, day timestamp, timepoint timestamp, aparentPower int, actualPower int, actualEnergy int, temperature float,

我们计划在卡桑德拉存储时间序列传感器数据。每个传感器在每个采样时间点可以有多个数据点。我想将每个设备的所有数据点存储在一起

我的一个想法是为我们可能收集的各种数据类型创建所有可能的列：

CREATE TABLE ddata (
  deviceID int,
  day timestamp,
  timepoint timestamp, 
  aparentPower int,
  actualPower int,
  actualEnergy int,
  temperature float,
  humidity float,
  ppmCO2 int,
  etc, etc, etc...
  PRIMARY KEY ((deviceID,day),timepoint)
) WITH
  clustering order by (timepoint DESC);

insert into ddata (deviceID,day,timepoint,temperature,humidity) values (1000001,'2013-09-02','2013-09-02 00:00:04',93,97.3);

 deviceid | day                      | timepoint                | actualenergy | actualpower | aparentpower | event | humidity | ppmco2 | temperature
----------+--------------------------+--------------------------+--------------+-------------+--------------+-------+----------+--------+-------------
  1000001 | 2013-09-02 00:00:00-0700 | 2013-09-02 00:00:04-0700 |         null |        null |         null |  null |     97.3 |   null |          93
  1000001 | 2013-09-02 00:00:00-0700 | 2013-09-02 00:00:03-0700 |         null |        null |         null |  null |     null |   null |          92
  1000001 | 2013-09-02 00:00:00-0700 | 2013-09-02 00:00:02-0700 |         null |        null |         null |  null |     null |   null |          91
  1000001 | 2013-09-02 00:00:00-0700 | 2013-09-02 00:00:01-0700 |         null |        null |         null |  null |     null |   null |          90

CREATE TABLE ddata (
  deviceID int,
  day timestamp,
  timepoint timestamp, 
  feeds map<text,int>,
  PRIMARY KEY ((deviceID,day),timepoint)
) WITH
  clustering order by (timepoint DESC);

insert into ddata (deviceID,day,timepoint,feeds) values (1000001,'2013-09-01','2013-09-01 00:00:04',{'temp':73,'humidity':99});

 deviceid | day                      | timepoint                | event      | feeds
----------+--------------------------+--------------------------+------------+----------------------------------------------------------
  1000001 | 2013-09-02 00:00:00-0700 | 2013-09-02 00:00:04-0700 |       null |                             {'humidity': 97, 'temp': 93}
  1000001 | 2013-09-02 00:00:00-0700 | 2013-09-02 00:00:03-0700 |       null |                                             {'temp': 92}
  1000001 | 2013-09-02 00:00:00-0700 | 2013-09-02 00:00:02-0700 |       null |                                             {'temp': 91}
  1000001 | 2013-09-02 00:00:00-0700 | 2013-09-02 00:00:01-0700 |       null |                                             {'temp': 90}

另一个想法是创建给定设备可能报告的各种数据点的地图集合：

CREATE TABLE ddata (
  deviceID int,
  day timestamp,
  timepoint timestamp, 
  aparentPower int,
  actualPower int,
  actualEnergy int,
  temperature float,
  humidity float,
  ppmCO2 int,
  etc, etc, etc...
  PRIMARY KEY ((deviceID,day),timepoint)
) WITH
  clustering order by (timepoint DESC);

insert into ddata (deviceID,day,timepoint,temperature,humidity) values (1000001,'2013-09-02','2013-09-02 00:00:04',93,97.3);

 deviceid | day                      | timepoint                | actualenergy | actualpower | aparentpower | event | humidity | ppmco2 | temperature
----------+--------------------------+--------------------------+--------------+-------------+--------------+-------+----------+--------+-------------
  1000001 | 2013-09-02 00:00:00-0700 | 2013-09-02 00:00:04-0700 |         null |        null |         null |  null |     97.3 |   null |          93
  1000001 | 2013-09-02 00:00:00-0700 | 2013-09-02 00:00:03-0700 |         null |        null |         null |  null |     null |   null |          92
  1000001 | 2013-09-02 00:00:00-0700 | 2013-09-02 00:00:02-0700 |         null |        null |         null |  null |     null |   null |          91
  1000001 | 2013-09-02 00:00:00-0700 | 2013-09-02 00:00:01-0700 |         null |        null |         null |  null |     null |   null |          90

CREATE TABLE ddata (
  deviceID int,
  day timestamp,
  timepoint timestamp, 
  feeds map<text,int>,
  PRIMARY KEY ((deviceID,day),timepoint)
) WITH
  clustering order by (timepoint DESC);

insert into ddata (deviceID,day,timepoint,feeds) values (1000001,'2013-09-01','2013-09-01 00:00:04',{'temp':73,'humidity':99});

 deviceid | day                      | timepoint                | event      | feeds
----------+--------------------------+--------------------------+------------+----------------------------------------------------------
  1000001 | 2013-09-02 00:00:00-0700 | 2013-09-02 00:00:04-0700 |       null |                             {'humidity': 97, 'temp': 93}
  1000001 | 2013-09-02 00:00:00-0700 | 2013-09-02 00:00:03-0700 |       null |                                             {'temp': 92}
  1000001 | 2013-09-02 00:00:00-0700 | 2013-09-02 00:00:02-0700 |       null |                                             {'temp': 91}
  1000001 | 2013-09-02 00:00:00-0700 | 2013-09-02 00:00:01-0700 |       null |                                             {'temp': 90}

创建表数据(
设备ID int，
日期时间戳，
时间点时间戳，
提供地图，
主键（（设备ID，天），时间点）
)与
聚类顺序（时间点描述）；
将值（1000001、'2013-09-01'、'2013-09-01 00:00:04'、{'temp'：73、'湿度]：99}插入ddata（设备ID、日期、时间点、提要）中；
设备ID |天|时间点|事件|源
----------+--------------------------+--------------------------+------------+----------------------------------------------------------
1000001 | 2013-09-02 00:00:00-0700 | 2013-09-02 00:00:04-0700 |空|{“湿度”：97，“温度”：93}
1000001 | 2013-09-02 00:00:00-0700 | 2013-09-02 00:00:03-0700 |空|{'temp'：92}
1000001 | 2013-09-02 00:00:00-0700 | 2013-09-02 00:00:02-0700 |空|{'temp'：91}
1000001 | 2013-09-02 00:00:00-0700 | 2013-09-02 00:00:01-0700 |空|{'temp'：90}

人们对这两种选择有什么想法

从我所看到的情况来看，第一个选项将允许更好地键入不同的数据类型（int与float），但会使表有点难看
如果我避免使用集合类型，性能会更好吗
在添加新的传感器数据类型时，是否不断添加额外的列
对于这个场景，人们还有哪些其他的数据建模想法

谢谢，

Chris

本质上，由于我们不知道会有多少个测量值到达，我们需要一种动态的方式来描述列族中的情况

正如您在第二个示例中所指出的，CQL提供了用于保存动态集合的映射数据类型

第二个是首选。但也取决于您可能发出的查询。要从“feed”中获取“temp”，应用程序必须解析映射输出。

我可以看到的直接优点和缺点：

使用
```
map
```
列将允许您拥有“无限”指标。（注意，我认为在
```
地图中可以存储多少数据是有限制的）
```



您将无法从映射中读取单个值；如果每个度量都有列，那么一次只能读取一个值；您仍然可以更新映射中的单个值


正如您在问题中提到的，您在map
这些是我能看到的最明显的区别