Python 用MongoDB聚合框架计算一阶导数_Python_Mongodb_Mapreduce_Pymongo_Aggregation Framework

Python 用MongoDB聚合框架计算一阶导数

python mongodb mapreduce

Python 用MongoDB聚合框架计算一阶导数,python,mongodb,mapreduce,pymongo,aggregation-framework,Python,Mongodb,Mapreduce,Pymongo,Aggregation Framework,是否可以使用聚合框架计算一阶导数例如，我有以下数据： {time_series : [10,20,40,70,110]} 我试图获得如下输出： {derivative : [10,20,30,40]} 有点脏，但也许是这样的 use test_db db['data'].remove({}) db['data'].insert({id: 1, time_series: [10,20,40,70,110]}) var mapF = function() { emit(this.id

是否可以使用聚合框架计算一阶导数

例如，我有以下数据：

{time_series : [10,20,40,70,110]}

我试图获得如下输出：

{derivative : [10,20,30,40]}

有点脏，但也许是这样的

use test_db
db['data'].remove({})
db['data'].insert({id: 1, time_series: [10,20,40,70,110]})

var mapF = function() {
    emit(this.id, this.time_series);
    emit(this.id, this.time_series);
};

var reduceF = function(key, values){
    var n = values[0].length;
    var ret = [];
    for(var i = 0; i < n-1; i++){
        ret.push( values[0][i+1] - values[0][i] );
    }
    return {'gradient': ret};
};

var finalizeF = function(key, val){
    return val.gradient;
}

db['data'].mapReduce(
    mapF,
    reduceF,
    { out: 'data_d1', finalize: finalizeF }
)

db['data_d1'].find({})

或者，可以将所有处理移动到终结器中（

reduceF

在此不被调用，因为

mapF

被假定为发出唯一键）：

使用测试数据库
db['data'].remove（{}）
db['data'].insert（{id:1，时间序列：[10,20,40,70110]}）
var mapF=函数（）{
发射（this.id，this.time_系列）；
};
var reduceF=函数（键、值）{
};
var finalizeF=函数（键，val）{
var x=val；
var n=x.长度；
var-ret=[]；
对于（变量i=0；i


我们可以使用上面版本3.4+中的管道来实现这一点。
在管道中，我们使用管道阶段。运算符添加“time_series”元素索引的数组以创建文档，我们还反转了时间序列数组，并分别使用和运算符将其添加到文档中
我们在这里反转了数组，因为数组中位于p
位置的元素总是大于位于p+1
位置的元素，这意味着[p]-[p+1]<0
，我们不想在这里使用（请参阅管道了解3.2版）
接下来，我们使用索引数组对时间序列数据进行压缩，并使用$map
操作符将表达式应用于结果数组
然后，我们将$slice
结果从数组中丢弃null/None
值，并重新反转结果

在3.2中，通过将文档指定为操作数，而不是以$作为前缀的传统“路径”，我们可以使用运算符展开数组并包含数组中每个元素的索引
接下来，在管道中，我们需要创建文档，并使用累加器操作符返回子文档数组，如下所示：
{
    "_id" : ObjectId("57c11ddbe860bd0b5df6bc64"),
    "time_series" : [
        { "value" : 10, "index" : NumberLong(0) },
        { "value" : 20, "index" : NumberLong(1) },
        { "value" : 40, "index" : NumberLong(2) },
        { "value" : 70, "index" : NumberLong(3) },
        { "value" : 110, "index" : NumberLong(4) }
    ]
}

import numpy as np


for document in collection.find({}, {'time_series': 1}):
    result = np.diff(document['time_series']) 


终于上台了。在此阶段中，我们需要使用运算符将一系列表达式应用于$group
阶段中新计算的数组中的每个元素
以下是$map
表达式中的$map
内部发生的情况（请参见$map
作为for循环）：
对于每个子文档，我们使用变量运算符将值字段分配给一个变量。然后我们从数组中下一个元素的“value”字段的值中减去它的值
由于数组中的下一个元素是当前索引处的元素加上一个，因此我们只需要运算符的帮助以及当前元素索引和1
的简单定义
表达式返回负值，因此我们需要使用运算符将该值乘以-1

我们还需要返回结果数组，因为它是最后一个元素None
或null
。原因是当当前元素是最后一个元素时，$subtract
返回None
，因为下一个元素的索引等于数组的大小
db.collection.aggregate([
  {
    "$unwind": {
      "path": "$time_series",
      "includeArrayIndex": "index"
    }
  },
  {
    "$group": {
      "_id": "$_id",
      "time_series": {
        "$push": {
          "value": "$time_series",
          "index": "$index"
        }
      }
    }
  },
  {
    "$project": {
      "time_series": {
        "$filter": {
          "input": {
            "$map": {
              "input": "$time_series",
              "as": "el",
              "in": {
                "$multiply": [
                  {
                    "$subtract": [
                      "$$el.value",
                      {
                        "$let": {
                          "vars": {
                            "nextElement": {
                              "$arrayElemAt": [
                                "$time_series",
                                {
                                  "$add": [
                                    "$$el.index",
                                    1
                                  ]
                                }
                              ]
                            }
                          },
                          "in": "$$nextElement.value"
                        }
                      }
                    ]
                  },
                  -1
                ]
              }
            }
          },
          "as": "item",
          "cond": {
            "$gte": [
              "$$item",
              0
            ]
          }
        }
      }
    }
  }
])


我认为效率较低的另一个选项是使用该方法对集合执行map/reduce操作
与使用健壮的python库实现相比，您希望在聚合框架中实现这一点有什么原因吗？@johnyhk-您能给我一个python库实现的示例吗？我目前的解决方法是使用pymongo获取所有字段，并使用python进行派生。结果是速度非常慢（受网络带宽限制？），这让我四处寻找替代方案。@JohnnyHK我认为聚合框架是这里的最佳选择。甚至比我还快。我将基准测试结果添加到我的answer@Styvane别误会，我是第一个在这两个答案上都投赞成票的人，因为它们都很棒，但“最好”的选择不仅仅是性能。经过良好测试的库调用比复杂的聚合管道更简单/更容易理解/更干净。@JohnnyHK我完全同意。并非编程中的一切都与性能有关。MongoDB没有为此提供运营商，这真是太遗憾了。顺便说一句，我在写作时多次忘记了花括号。
{
    "_id" : ObjectId("57c11ddbe860bd0b5df6bc64"),
    "time_series" : [
        { "value" : 10, "index" : NumberLong(0) },
        { "value" : 20, "index" : NumberLong(1) },
        { "value" : 40, "index" : NumberLong(2) },
        { "value" : 70, "index" : NumberLong(3) },
        { "value" : 110, "index" : NumberLong(4) }
    ]
}

db.collection.aggregate([
  {
    "$unwind": {
      "path": "$time_series",
      "includeArrayIndex": "index"
    }
  },
  {
    "$group": {
      "_id": "$_id",
      "time_series": {
        "$push": {
          "value": "$time_series",
          "index": "$index"
        }
      }
    }
  },
  {
    "$project": {
      "time_series": {
        "$filter": {
          "input": {
            "$map": {
              "input": "$time_series",
              "as": "el",
              "in": {
                "$multiply": [
                  {
                    "$subtract": [
                      "$$el.value",
                      {
                        "$let": {
                          "vars": {
                            "nextElement": {
                              "$arrayElemAt": [
                                "$time_series",
                                {
                                  "$add": [
                                    "$$el.index",
                                    1
                                  ]
                                }
                              ]
                            }
                          },
                          "in": "$$nextElement.value"
                        }
                      }
                    ]
                  },
                  -1
                ]
              }
            }
          },
          "as": "item",
          "cond": {
            "$gte": [
              "$$item",
              0
            ]
          }
        }
      }
    }
  }
])

>>> import pymongo
>>> from bson.code import Code
>>> client = pymongo.MongoClient()
>>> db = client.test
>>> collection = db.collection
>>> mapper = Code("""
...               function() {
...                 var derivatives = [];
...                 for (var index=1; index<this.time_series.length; index++) {
...                   derivatives.push(this.time_series[index] - this.time_series[index-1]);
...                 }
...                 emit(this._id, derivatives);
...               }
...               """)
>>> reducer = Code("""
...                function(key, value) {}
...                """)
>>> for res in collection.map_reduce(mapper, reducer, out={'inline': 1})['results']:
...     print(res)  # or do something with the document.
... 
{'value': [10.0, 20.0, 30.0, 40.0], '_id': ObjectId('57c11ddbe860bd0b5df6bc64')}

import numpy as np


for document in collection.find({}, {'time_series': 1}):
    result = np.diff(document['time_series'])