使用MongoDB聚合器框架格式化与图表匹配的数据_Mongodb_Charts_Aggregation Framework_Pymongo

使用MongoDB聚合器框架格式化与图表匹配的数据

mongodb charts

使用MongoDB聚合器框架格式化与图表匹配的数据,mongodb,charts,aggregation-framework,pymongo,Mongodb,Charts,Aggregation Framework,Pymongo,我是MongoDB中使用聚合器的新手，我想知道是否有一种方法可以在MongoDB级别格式化数据，以准备打印结果因此，我有一个时间序列数据集（我不确定这是否是存储时间序列数据的最佳方式，因此欢迎提供任何建议），如下所示： { "dataset_id" : "850919c9-30e4-46f1-b962-e6b16cd30c60", "time_stamp" : 1600624542, "series&

我是MongoDB中使用聚合器的新手，我想知道是否有一种方法可以在MongoDB级别格式化数据，以准备打印结果

因此，我有一个时间序列数据集（我不确定这是否是存储时间序列数据的最佳方式，因此欢迎提供任何建议），如下所示：

{
    "dataset_id" : "850919c9-30e4-46f1-b962-e6b16cd30c60",
    "time_stamp" : 1600624542,
    "series" : [ 
        {
            "name" : "serie_0",
            "value" : 935.0
        }, 
        {
            "name" : "serie_1",
            "value" : 780.0
        }, 
        ...
        {
            "name" : "serie_n",
            "value" : <value_n>
        }, 
     ]
}

但是考虑到性能，我认为将python格式化代码移动到MongoDb端可以提高查询的性能。正如我之前所说，这个项目是从零开始的，所以我愿意接受关于数据库文档样式的建议，认为我需要良好的阅读性能，但我现在不关心写作性能。

我以这种方式做到了，我仍然不确定这是否是最有效的方法，但现在的性能非常好

def dataset_info(dataset_id, sample_size):
    stages = [
        {"$match": {"dataset_id": str(dataset_id)}},
        {"$sample": {"size": sample_size}},
        {"$sort": {"time_stamp": 1}},
        {"$unwind": {"path": "$series"}},
        {
            "$project": {
                "label": "$series.name",
                "data": {
                    "x": {"$multiply": ["$time_stamp", 1000]},
                    "y": "$series.value",
                },
            }
        },
        {"$group": {"_id": "$label", "data": {"$push": "$data"}}},
        {
            "$project": {
                "_id": 0,
                "label": "$_id",
                "data": 1,
                "showLine": {"$toBool": 1},
                "fill": {"$toBool": 0},
            }
        },
        {"$sort": {"label": 1}},
    ]

    data = DB_POOL_COLLECTION.aggregate(stages)
    return list(data)

这将返回可用于ChartJS的数据结构，如：

[
    {
      label: '<series_0_name>',
      showLine: true,
      fill: false,
      data: [
        {x: <timestamp in ms>, y: <value>},
        ...
        {x: <timestamp in ms>, y: <value>},
       ]
    },
    ...
    {
      label: '<series_n_name>',
      showLine: true,
      fill: false,
      data: [
        {x: <timestamp in ms>, y: <value>},
        ...
        {x: <timestamp in ms>, y: <value>},
       ]
    },
]

[
{
标签：“”，
秀行：没错，
填充：假，
数据：[
{x:，y:}，
...
{x:，y:}，
]
},
...
{
标签：“”，
秀行：没错，
填充：假，
数据：[
{x:，y:}，
...
{x:，y:}，
]
},
]

def dataset_info(dataset_id, sample_size):
    stages = [
        {"$match": {"dataset_id": str(dataset_id)}},
        {"$sample": {"size": sample_size}},
        {"$sort": {"time_stamp": 1}},
        {"$unwind": {"path": "$series"}},
        {
            "$project": {
                "label": "$series.name",
                "data": {
                    "x": {"$multiply": ["$time_stamp", 1000]},
                    "y": "$series.value",
                },
            }
        },
        {"$group": {"_id": "$label", "data": {"$push": "$data"}}},
        {
            "$project": {
                "_id": 0,
                "label": "$_id",
                "data": 1,
                "showLine": {"$toBool": 1},
                "fill": {"$toBool": 0},
            }
        },
        {"$sort": {"label": 1}},
    ]

    data = DB_POOL_COLLECTION.aggregate(stages)
    return list(data)

[
    {
      label: '<series_0_name>',
      showLine: true,
      fill: false,
      data: [
        {x: <timestamp in ms>, y: <value>},
        ...
        {x: <timestamp in ms>, y: <value>},
       ]
    },
    ...
    {
      label: '<series_n_name>',
      showLine: true,
      fill: false,
      data: [
        {x: <timestamp in ms>, y: <value>},
        ...
        {x: <timestamp in ms>, y: <value>},
       ]
    },
]