Database design Cassandra数据模型选项,所有潜在阅读类型的大量列,还是地图集合?
我们计划在卡桑德拉存储时间序列传感器数据。每个传感器在每个采样时间点可以有多个数据点。我想将每个设备的所有数据点存储在一起 我的一个想法是为我们可能收集的各种数据类型创建所有可能的列:Database design Cassandra数据模型选项,所有潜在阅读类型的大量列,还是地图集合?,database-design,cassandra,cql,Database Design,Cassandra,Cql,我们计划在卡桑德拉存储时间序列传感器数据。每个传感器在每个采样时间点可以有多个数据点。我想将每个设备的所有数据点存储在一起 我的一个想法是为我们可能收集的各种数据类型创建所有可能的列: CREATE TABLE ddata ( deviceID int, day timestamp, timepoint timestamp, aparentPower int, actualPower int, actualEnergy int, temperature float,
CREATE TABLE ddata (
deviceID int,
day timestamp,
timepoint timestamp,
aparentPower int,
actualPower int,
actualEnergy int,
temperature float,
humidity float,
ppmCO2 int,
etc, etc, etc...
PRIMARY KEY ((deviceID,day),timepoint)
) WITH
clustering order by (timepoint DESC);
insert into ddata (deviceID,day,timepoint,temperature,humidity) values (1000001,'2013-09-02','2013-09-02 00:00:04',93,97.3);
deviceid | day | timepoint | actualenergy | actualpower | aparentpower | event | humidity | ppmco2 | temperature
----------+--------------------------+--------------------------+--------------+-------------+--------------+-------+----------+--------+-------------
1000001 | 2013-09-02 00:00:00-0700 | 2013-09-02 00:00:04-0700 | null | null | null | null | 97.3 | null | 93
1000001 | 2013-09-02 00:00:00-0700 | 2013-09-02 00:00:03-0700 | null | null | null | null | null | null | 92
1000001 | 2013-09-02 00:00:00-0700 | 2013-09-02 00:00:02-0700 | null | null | null | null | null | null | 91
1000001 | 2013-09-02 00:00:00-0700 | 2013-09-02 00:00:01-0700 | null | null | null | null | null | null | 90
CREATE TABLE ddata (
deviceID int,
day timestamp,
timepoint timestamp,
feeds map<text,int>,
PRIMARY KEY ((deviceID,day),timepoint)
) WITH
clustering order by (timepoint DESC);
insert into ddata (deviceID,day,timepoint,feeds) values (1000001,'2013-09-01','2013-09-01 00:00:04',{'temp':73,'humidity':99});
deviceid | day | timepoint | event | feeds
----------+--------------------------+--------------------------+------------+----------------------------------------------------------
1000001 | 2013-09-02 00:00:00-0700 | 2013-09-02 00:00:04-0700 | null | {'humidity': 97, 'temp': 93}
1000001 | 2013-09-02 00:00:00-0700 | 2013-09-02 00:00:03-0700 | null | {'temp': 92}
1000001 | 2013-09-02 00:00:00-0700 | 2013-09-02 00:00:02-0700 | null | {'temp': 91}
1000001 | 2013-09-02 00:00:00-0700 | 2013-09-02 00:00:01-0700 | null | {'temp': 90}
另一个想法是创建给定设备可能报告的各种数据点的地图集合:
CREATE TABLE ddata (
deviceID int,
day timestamp,
timepoint timestamp,
aparentPower int,
actualPower int,
actualEnergy int,
temperature float,
humidity float,
ppmCO2 int,
etc, etc, etc...
PRIMARY KEY ((deviceID,day),timepoint)
) WITH
clustering order by (timepoint DESC);
insert into ddata (deviceID,day,timepoint,temperature,humidity) values (1000001,'2013-09-02','2013-09-02 00:00:04',93,97.3);
deviceid | day | timepoint | actualenergy | actualpower | aparentpower | event | humidity | ppmco2 | temperature
----------+--------------------------+--------------------------+--------------+-------------+--------------+-------+----------+--------+-------------
1000001 | 2013-09-02 00:00:00-0700 | 2013-09-02 00:00:04-0700 | null | null | null | null | 97.3 | null | 93
1000001 | 2013-09-02 00:00:00-0700 | 2013-09-02 00:00:03-0700 | null | null | null | null | null | null | 92
1000001 | 2013-09-02 00:00:00-0700 | 2013-09-02 00:00:02-0700 | null | null | null | null | null | null | 91
1000001 | 2013-09-02 00:00:00-0700 | 2013-09-02 00:00:01-0700 | null | null | null | null | null | null | 90
CREATE TABLE ddata (
deviceID int,
day timestamp,
timepoint timestamp,
feeds map<text,int>,
PRIMARY KEY ((deviceID,day),timepoint)
) WITH
clustering order by (timepoint DESC);
insert into ddata (deviceID,day,timepoint,feeds) values (1000001,'2013-09-01','2013-09-01 00:00:04',{'temp':73,'humidity':99});
deviceid | day | timepoint | event | feeds
----------+--------------------------+--------------------------+------------+----------------------------------------------------------
1000001 | 2013-09-02 00:00:00-0700 | 2013-09-02 00:00:04-0700 | null | {'humidity': 97, 'temp': 93}
1000001 | 2013-09-02 00:00:00-0700 | 2013-09-02 00:00:03-0700 | null | {'temp': 92}
1000001 | 2013-09-02 00:00:00-0700 | 2013-09-02 00:00:02-0700 | null | {'temp': 91}
1000001 | 2013-09-02 00:00:00-0700 | 2013-09-02 00:00:01-0700 | null | {'temp': 90}
创建表数据(
设备ID int,
日期时间戳,
时间点时间戳,
提供地图,
主键((设备ID,天),时间点)
)与
聚类顺序(时间点描述);
将值(1000001、'2013-09-01'、'2013-09-01 00:00:04'、{'temp':73、'湿度]:99}插入ddata(设备ID、日期、时间点、提要)中;
设备ID |天|时间点|事件|源
----------+--------------------------+--------------------------+------------+----------------------------------------------------------
1000001 | 2013-09-02 00:00:00-0700 | 2013-09-02 00:00:04-0700 |空|{“湿度”:97,“温度”:93}
1000001 | 2013-09-02 00:00:00-0700 | 2013-09-02 00:00:03-0700 |空|{'temp':92}
1000001 | 2013-09-02 00:00:00-0700 | 2013-09-02 00:00:02-0700 |空|{'temp':91}
1000001 | 2013-09-02 00:00:00-0700 | 2013-09-02 00:00:01-0700 |空|{'temp':90}
人们对这两种选择有什么想法
- 从我所看到的情况来看,第一个选项将允许更好地键入不同的数据类型(int与float),但会使表有点难看李>
- 如果我避免使用集合类型,性能会更好吗李>
- 在添加新的传感器数据类型时,是否不断添加额外的列 我还应该考虑哪些因素呢?
- 对于这个场景,人们还有哪些其他的数据建模想法
Chris本质上,由于我们不知道会有多少个测量值到达,我们需要一种动态的方式来描述列族中的情况 正如您在第二个示例中所指出的,CQL提供了用于保存动态集合的映射数据类型
第二个是首选。但也取决于您可能发出的查询。要从“feed”中获取“temp”,应用程序必须解析映射输出。我可以看到的直接优点和缺点:
- 使用
列将允许您拥有“无限”指标。(注意,我认为在map
地图中可以存储多少数据是有限制的)
- 您将无法从
映射中读取单个值;如果每个度量都有列,那么一次只能读取一个值;您仍然可以更新
映射中的单个值
map