Python PyMongo的MapReduce
My Mongo collection:Python PyMongo的MapReduce,python,mongodb,mapreduce,pymongo,aggregation-framework,Python,Mongodb,Mapreduce,Pymongo,Aggregation Framework,My Mongo collection:Impressions包含以下格式的文档:- { _uid: 10, "impressions": [ { "pos": 6, "id": 123, "service": "furniture" }, { "pos": 0
Impressions
包含以下格式的文档:-
{
_uid: 10,
"impressions": [
{
"pos": 6,
"id": 123,
"service": "furniture"
},
{
"pos": 0,
"id": 128,
"service": "electronics"
},
{
"pos": 2,
"id": 127,
"service": "furniture"
},
{
"pos": 2,
"id": 125,
"service": "electronics"
},
{
"pos": 10,
"id": 124,
"service": "electronics"
}
]
},
{
_uid: 11,
"impressions": [
{
"pos": 1,
"id": 124,
"service": "furniture"
},
{
"pos": 10,
"id": 124,
"service": "electronics"
},
{
"pos": 1,
"id": 123,
"service": "furniture"
},
{
"pos": 21,
"id": 122,
"service": "furniture"
},
{
"pos": 3,
"id": 125,
"service": "electronics"
},
{
"pos": 10,
"id": 121,
"service": "electronics"
}
]
},
.
.
.
.
.
集合中的每个文档都有“impressions”
键,这是一个字典数组。在每个词典中,“id”
是实体的id,“service”
是服务类型,“pos”
是项目在搜索页面结果中的位置。我的目标是找出每个类别中每个“id”
的印象数。
因此,对于“服务”
=“家具”的上述数据,我希望将其作为我的聚合结果:-
[
{"id": 123,"impressions_count":2},
{"id": 127,"impressions_count":1},
{"id": 124,"impressions_count":1},
{"id": 122,"impressions_count":1}
]
我试图通过python脚本中的以下函数使用MAPREDUCE聚合“id”
def fetch_impressions():
try:
imp_collection = get_mongo_connection('Impressions')
map = Code("""
function(){
for( x in this.impressions){
var flat_id = x['id'];
var service_type = x['service']
emit(parseInt(flat_id),1);
}
};
""")
""")
reduce = Code("""
function(a,b){
return Array.sum(b);
};
""")
results = imp_collection.map_reduce(map, reduce, 'aggregation_result')
return results
except Exception as e:
raise Exception(e)
但我得到的结果是无,可能是因为映射函数有问题。我是Javascript新手,Mongo请帮助 您可以使用
或者更有效地使用和运算符
col.aggregate([
{ "$project": { "impressions": {"$setDifference": [{ "$map": { "input": "$impressions", "as": "imp", "in": { "$cond": { "if": { "$eq": [ "$$imp.service", "furniture" ] }, "then": "$$imp.id", "else": 0 }}}}, [0]]}}},
{ "$unwind": "$impressions" },
{ "$group": { "_id": "$impressions", "impressions_count": { "$sum": 1 }}}
])
这将产生:
{'_id': 122.0, 'impressions_count': 1}
{'_id': 124.0, 'impressions_count': 1}
{'_id': 127.0, 'impressions_count': 1}
{'_id': 123.0, 'impressions_count': 2}
我制作了一个工具,可以让您在Python中运行MongoDB Map/Reduce
你想干什么?预期结果是什么?@user3100115更新了问题,抱歉耽搁了!这仍然不能回答我们将如何使用
map\u reduce()
api解决这个问题,当有更好的方法时,为什么要使用map\u reduce
@Ayushksinging完全同意这一点,但在提供更好的解决方案之前,我们还应该使用map reduce回答问题,这样提出问题的用户离开网站时不仅可以了解一种新的更好的方法,还可以在他们的解决方案中了解问题及其相应的修复。欢迎使用Stack Overflow!仅仅链接到您自己的库或教程并不是一个好的答案。链接到它,解释它解决问题的原因,提供关于如何解决问题的代码,并否认您编写了它,这有助于获得更好的答案。请看:我确实解释了它解决问题的原因,它允许您像OP所要求的那样在Python中运行map/reduce。我也说过我写了tool@Dharman厄尔沃尔顿是我的另一个帐户,我是无意中登录的。我还更新了代码,这样就解决了OP的具体问题,每个ID的总印象
{'_id': 122.0, 'impressions_count': 1}
{'_id': 124.0, 'impressions_count': 1}
{'_id': 127.0, 'impressions_count': 1}
{'_id': 123.0, 'impressions_count': 2}
import random
import threading
import bson
import pymongo
import mreduce
mongo_client = pymongo.MongoClient("mongodb://your_mongodb_server")
def map_func(document):
for impression in document["impressions"]:
yield document["id"], 1
def reduce_func(id, prices):
return sum(prices)
worker_functions = {
"exampleMap": map_func,
"exampleReduce": reduce_func
}
api = mreduce.API(
api_key = "...",
mongo_client = mongo_client
)
project_id = "..."
thread = threading.Thread(
target=api.run,
args=[project_id, worker_functions]
)
thread.start()
job = api.submit_job(
projectId=project["_id"],
mapFunctionName="exampleMap",
reduceFunctionName="exampleReduce",
inputDatabase="db",
inputCollection="impressions",
outputDatabase="db",
outputCollection="impressions_results"
)
result = job.wait_for_result()
for key, value in result:
print("Key: " + key, ", Value: " + str(value))