MongoDB:如何查询数据不完整的时间序列?
我将时间序列数据存储在mongoDB集合中,每15分钟存储一个数据点。但有时,由于恶劣条件,一些数据点会丢失。我的数据集如下:MongoDB:如何查询数据不完整的时间序列?,mongodb,mongodb-query,pymongo,Mongodb,Mongodb Query,Pymongo,我将时间序列数据存储在mongoDB集合中,每15分钟存储一个数据点。但有时,由于恶劣条件,一些数据点会丢失。我的数据集如下: {"device_id": "ABC","temp": 12,"timestamp": 2020-01-04T17:48:09.000+00:00} {"device_id": "ABC","temp": 10,"timestamp": 2020-01-04T18:03:09.000+00:00} {"device_id": "ABC","temp": 14,"time
{"device_id": "ABC","temp": 12,"timestamp": 2020-01-04T17:48:09.000+00:00}
{"device_id": "ABC","temp": 10,"timestamp": 2020-01-04T18:03:09.000+00:00}
{"device_id": "ABC","temp": 14,"timestamp": 2020-01-04T18:18:09.000+00:00}
missing frame
missing frame
{"device_id": "ABC","temp": 13,"timestamp": 2020-01-04T19:03:09.000+00:00}
{"device_id": "ABC","temp": 15,"timestamp": 2020-01-04T19:18:09.000+00:00}
missing frame
{"device_id": "ABC","temp": 10,"timestamp": 2020-01-04T19:48:09.000+00:00}
{"device_id": "ABC","temp": 11,"timestamp": 2020-01-04T20:03:09.000+00:00}
...
我不知道如何查询此集合,以便每15分钟生成一个连续的值列表,以便绘制它并显示丢失的消息(如果消息丢失,请更改图形的背景色)。我希望每15分钟有一个结果对齐(将t和t+15分钟之间的值相加),如下所示:
{"timestamp": 2020-01-04T17:45:00.000+00:00, "temp": 12, missing: false}
{"timestamp": 2020-01-04T18:00:00.000+00:00, "temp": 10, missing: false}
{"timestamp": 2020-01-04T18:15:00.000+00:00, "temp": 14, missing: false}
{"timestamp": 2020-01-04T18:30:00.000+00:00, "temp": 0, missing: true}
{"timestamp": 2020-01-04T18:45:00.000+00:00, "temp": 0, missing: true}
{"timestamp": 2020-01-04T19:00:00.000+00:00, "temp": 13, missing: false}
{"timestamp": 2020-01-04T19:15:00.000+00:00, "temp": 15, missing: false}
{"timestamp": 2020-01-04T19:30:00.000+00:00, "temp": 0, missing: true}
{"timestamp": 2020-01-04T19:45:00.000+00:00, "temp": 10, missing: false}
{"timestamp": 2020-01-04T20:00:00.000+00:00, "temp": 11, missing: false}
有什么想法吗?提前感谢您的帮助 以下是我在第一次评论中提到的方法的汇总:
db.collection.aggregate( [
{
$sort: { timestamp: 1 }
},
{
$group: {
_id: null,
docs: { $push: { timestamp: "$timestamp", device_id: "$device_id", temp: "$temp", missing: false } },
device_id: { $first: "$device_id" },
start: { $first: { $toInt: { $divide: [ { "$toLong": "$timestamp" }, 1000 ] } } },
end: { $last: { $toInt: { $divide: [ { "$toLong": "$timestamp" }, 1000 ] } } }
}
},
{
$addFields: {
docs: {
$map: {
input: { $range: [ { $toInt: "$start" }, { $add: [ { $toInt: "$end" }, 900 ] }, 900 ] },
as: "ts",
in: {
ts_exists: { $arrayElemAt: [
{ $filter: {
input: "$docs", as: "d",
cond: { $eq: [ { $toInt: { $divide: [ { "$toLong": "$$d.timestamp" }, 1000 ] } },
"$$ts"
] }
}},
0 ] },
ts: "$$ts"
}
}
}
}
},
{
$unwind: "$docs"
},
{
$addFields: {
docs: {
$ifNull: [ "$docs.ts_exists", { timestamp: { $toDate: { $multiply: [ "$docs.ts", 1000 ] } },
temp: 0, device_id: "$device_id", missing: true
}
]
}
}
},
{
$replaceRoot: { newRoot: "$docs" }
}
] ).pretty()
使用以下输入文档:
{"device_id": "ABC","temp": 12,"timestamp": ISODate("2020-01-04T17:45:00.000+00:00") },
{"device_id": "ABC","temp": 10,"timestamp": ISODate("2020-01-04T18:00:00.000+00:00") },
{"device_id": "ABC","temp": 4,"timestamp": ISODate("2020-01-04T18:30:00.000+00:00") },
{"device_id": "ABC","temp": 23,"timestamp": ISODate("2020-01-04T18:45:00.000+00:00") }
{
"timestamp" : ISODate("2020-01-04T17:45:00Z"),
"device_id" : "ABC",
"temp" : 12,
"missing" : false
}
{
"timestamp" : ISODate("2020-01-04T18:00:00Z"),
"device_id" : "ABC",
"temp" : 10,
"missing" : false
}
{
"timestamp" : ISODate("2020-01-04T18:15:00Z"),
"temp" : 0,
"device_id" : "ABC",
"missing" : true
}
{
"timestamp" : ISODate("2020-01-04T18:30:00Z"),
"device_id" : "ABC",
"temp" : 4,
"missing" : false
}
{
"timestamp" : ISODate("2020-01-04T18:45:00Z"),
"device_id" : "ABC",
"temp" : 23,
"missing" : false
}
结果:
{"device_id": "ABC","temp": 12,"timestamp": ISODate("2020-01-04T17:45:00.000+00:00") },
{"device_id": "ABC","temp": 10,"timestamp": ISODate("2020-01-04T18:00:00.000+00:00") },
{"device_id": "ABC","temp": 4,"timestamp": ISODate("2020-01-04T18:30:00.000+00:00") },
{"device_id": "ABC","temp": 23,"timestamp": ISODate("2020-01-04T18:45:00.000+00:00") }
{
"timestamp" : ISODate("2020-01-04T17:45:00Z"),
"device_id" : "ABC",
"temp" : 12,
"missing" : false
}
{
"timestamp" : ISODate("2020-01-04T18:00:00Z"),
"device_id" : "ABC",
"temp" : 10,
"missing" : false
}
{
"timestamp" : ISODate("2020-01-04T18:15:00Z"),
"temp" : 0,
"device_id" : "ABC",
"missing" : true
}
{
"timestamp" : ISODate("2020-01-04T18:30:00Z"),
"device_id" : "ABC",
"temp" : 4,
"missing" : false
}
{
"timestamp" : ISODate("2020-01-04T18:45:00Z"),
"device_id" : "ABC",
"temp" : 23,
"missing" : false
}
您可以通过聚合查询来实现这一点。查询可以输入开始和结束时间戳、时间间隔(即15分钟),并返回您发布的输出。时间戳可以转换为毫秒,在给定的时间戳和间隔范围内,找到所有可能的时间戳,并从可用数据中找到缺失的时间戳。谢谢!是的,我知道这种方法是进行聚合查询。但是我不知道如何在没有找到数据的情况下,在时间戳上加上条件和一个特殊的值。它给出了期望的结果。我可能会对查询做一些改进(稍后更新帖子)。