Python PyMongo的MapReduce

Python PyMongo的MapReduce,python,mongodb,mapreduce,pymongo,aggregation-framework,Python,Mongodb,Mapreduce,Pymongo,Aggregation Framework,My Mongo collection:Impressions包含以下格式的文档:- { _uid: 10, "impressions": [ { "pos": 6, "id": 123, "service": "furniture" }, { "pos": 0

My Mongo collection:
Impressions
包含以下格式的文档:-

   {
        _uid: 10,
        "impressions": [
            {
                "pos": 6,
                "id": 123,
                "service": "furniture"
            },
            {
                "pos": 0,
                "id": 128,
                "service": "electronics"
            },
            {
                "pos": 2,
                "id": 127,
                "service": "furniture"
            },
            {
                "pos": 2,
                "id": 125,
                "service": "electronics"
            },
            {
                "pos": 10,
                "id": 124,
                "service": "electronics"
            }
        ]
      },
     {
        _uid: 11,
        "impressions": [
            {
                "pos": 1,
                "id": 124,
                "service": "furniture"
            },
            {
                "pos": 10,
                "id": 124,
                "service": "electronics"
            },
            {
                "pos": 1,
                "id": 123,
                "service": "furniture"
            },
            {
                "pos": 21,
                "id": 122,
                "service": "furniture"
            },
            {
                "pos": 3,
                "id": 125,
                "service": "electronics"
            },
            {
                "pos": 10,
                "id": 121,
                "service": "electronics"
            }
            ]
         },
            .
            .
            .
            .
            .
集合中的每个文档都有
“impressions”
键,这是一个字典数组。在每个词典中,
“id”
是实体的id,
“service”
是服务类型,
“pos”
是项目在搜索页面结果中的位置。我的目标是找出每个类别中每个
“id”
的印象数。 因此,对于
“服务”
=“家具”的上述数据,我希望将其作为我的聚合结果:-

[
{"id": 123,"impressions_count":2},
{"id": 127,"impressions_count":1},
{"id": 124,"impressions_count":1},
{"id": 122,"impressions_count":1}
]
我试图通过python脚本中的以下函数使用MAPREDUCE聚合“id”

def fetch_impressions():
    try:
        imp_collection = get_mongo_connection('Impressions')
        map = Code("""
                function(){
                    for( x in this.impressions){
                        var flat_id = x['id'];
                        var service_type = x['service']
                        emit(parseInt(flat_id),1);
                        }
                    };
                """)

                        """)
        reduce = Code("""
                        function(a,b){
                            return Array.sum(b);
                            };
                        """)

        results = imp_collection.map_reduce(map, reduce, 'aggregation_result')
        return results
    except Exception as e:
        raise Exception(e)
但我得到的结果是无,可能是因为映射函数有问题。我是Javascript新手,Mongo请帮助

您可以使用

或者更有效地使用和运算符

col.aggregate([
    { "$project": { "impressions": {"$setDifference": [{ "$map": { "input": "$impressions", "as": "imp", "in": { "$cond": { "if": { "$eq": [ "$$imp.service", "furniture" ] }, "then": "$$imp.id", "else": 0 }}}}, [0]]}}}, 
    { "$unwind": "$impressions" }, 
    { "$group": { "_id": "$impressions", "impressions_count": { "$sum": 1 }}}
])
这将产生:

{'_id': 122.0, 'impressions_count': 1}
{'_id': 124.0, 'impressions_count': 1}
{'_id': 127.0, 'impressions_count': 1}
{'_id': 123.0, 'impressions_count': 2}

我制作了一个工具,可以让您在Python中运行MongoDB Map/Reduce


你想干什么?预期结果是什么?@user3100115更新了问题,抱歉耽搁了!这仍然不能回答我们将如何使用
map\u reduce()
api解决这个问题,当有更好的方法时,为什么要使用
map\u reduce
@Ayushksinging完全同意这一点,但在提供更好的解决方案之前,我们还应该使用map reduce回答问题,这样提出问题的用户离开网站时不仅可以了解一种新的更好的方法,还可以在他们的解决方案中了解问题及其相应的修复。欢迎使用Stack Overflow!仅仅链接到您自己的库或教程并不是一个好的答案。链接到它,解释它解决问题的原因,提供关于如何解决问题的代码,并否认您编写了它,这有助于获得更好的答案。请看:我确实解释了它解决问题的原因,它允许您像OP所要求的那样在Python中运行map/reduce。我也说过我写了tool@Dharman厄尔沃尔顿是我的另一个帐户,我是无意中登录的。我还更新了代码,这样就解决了OP的具体问题,每个ID的总印象
{'_id': 122.0, 'impressions_count': 1}
{'_id': 124.0, 'impressions_count': 1}
{'_id': 127.0, 'impressions_count': 1}
{'_id': 123.0, 'impressions_count': 2}
import random
import threading

import bson
import pymongo

import mreduce


mongo_client = pymongo.MongoClient("mongodb://your_mongodb_server")

def map_func(document):
    for impression in document["impressions"]:
        yield document["id"], 1

def reduce_func(id, prices):
    return sum(prices)

worker_functions = {
    "exampleMap": map_func,
    "exampleReduce": reduce_func
}

api = mreduce.API(
    api_key = "...",
    mongo_client = mongo_client
)

project_id = "..."

thread = threading.Thread(
    target=api.run,
    args=[project_id, worker_functions]
)
thread.start()

job = api.submit_job(
    projectId=project["_id"],
    mapFunctionName="exampleMap",
    reduceFunctionName="exampleReduce",
    inputDatabase="db",
    inputCollection="impressions",
    outputDatabase="db",
    outputCollection="impressions_results"
)
result = job.wait_for_result()
for key, value in result:
    print("Key: " + key, ", Value: " + str(value))