Ruby on rails Mongodb查询具有多个值计数的聚合
我在我的一个rails应用程序中使用Mongoid来为mongodb提供支持Ruby on rails Mongodb查询具有多个值计数的聚合,ruby-on-rails,mongodb,mongoid,mongodb-query,aggregation-framework,Ruby On Rails,Mongodb,Mongoid,Mongodb Query,Aggregation Framework,我在我的一个rails应用程序中使用Mongoid来为mongodb提供支持 class Tracking include Mongoid::Document include Mongoid::Timestamps field :article_id, type: String field :action, type: String # like | comment field :actor_gender, type: String # m
class Tracking
include Mongoid::Document
include Mongoid::Timestamps
field :article_id, type: String
field :action, type: String # like | comment
field :actor_gender, type: String # male | female | unknown
field :city, type: String
field :state, type: String
field :country, type: String
end
这里我想用表格的形式来记录
article_id | state | male_like_count | female_like_count | unknown_gender_like_count | date
juhkwu2367 | California | 21 | 7 | 1 | 11-20-2015
juhkwu2367 | New York | 62 | 23 | 3 | 11-20-2015
juhkwu2367 | Vermont | 48 | 27 | 3 | 11-20-2015
juhkwu2367 | California | 21 | 7 | 1 | 11-21-2015
juhkwu2367 | New York | 62 | 23 | 3 | 11-21-2015
juhkwu2367 | Vermont | 48 | 27 | 3 | 11-21-2015
此处,查询的输入为:
article_id
country
date range (from and to)
action (is `like` in this scenario)
sort_by [ date | state | male_like_count | female_like_count ]
Tracking.collection.aggregate([
{ "$match" => {
"created_at" => { "$gte" => startDate, "$lt" => endDate },
"country" => "US",
"action" => "like"
}},
{ "$group" => {
"_id" => {
"date" => {
"$add" => [
{ "$subtract" => [
{ "$subtract" => [ "$created_at", Time.at(0).utc.to_datetime ] },
{ "$mod" => [
{ "$subtract" => [ "$created_at", Time.at(0).utc.to_datetime ] },
1000 * 60 * 60 * 24
]}
]},
Time.at(0).utc.to_datetime
]
},
"article_id" => "$article_id",
"state" => "$state"
},
"male_like_count" => {
"$sum" => {
"$cond" => [
{ "$eq" => [ "$gender", "male" ] }
1,
0
]
}
},
"female_like_count" => {
"$sum" => {
"$cond" => [
{ "$eq" => [ "$gender", "female" ] }
1,
0
]
}
},
"unknown_like_count" => {
"$sum" => {
"$cond" => [
{ "$eq" =>[ "$gender", "unknown" ] }
1,
0
]
}
}
}},
{ "$sort" => {
"_id.date" => 1,
"_id.article_id" => 1,
"_id.state" => 1,
"male_like_count" => 1,
"female_like_count" => 1
}}
])
这就是我正在尝试的,通过引用
那么,我应该在
?
位置放置什么来比较按性别划分的计数,以及如何为排序选项添加子句?您主要是在寻找运算符,以便评估条件并返回特定计数器是否应该增加,但这里还缺少一些其他聚合概念:
db.trackings.aggregate([
{ "$match": {
"created_at": { "$gte": startDate, "$lt": endDate },
"country": "US",
"action": "like"
}},
{ "$group": {
"_id": {
"date": {
"month": { "$month": "$created_at" },
"day": { "$dayOfMonth": "$created_at" },
"year": { "$year": "$created_at" }
},
"article_id": "$article_id",
"state": "$state"
},
"male_like_count": {
"$sum": {
"$cond": [
{ "$eq": [ "$gender", "male" ] }
1,
0
]
}
},
"female_like_count": {
"$sum": {
"$cond": [
{ "$eq": [ "$gender", "female" ] }
1,
0
]
}
},
"unknown_like_count": {
"$sum": {
"$cond": [
{ "$eq": [ "$gender", "unknown" ] }
1,
0
]
}
}
}},
{ "$sort": {
"_id.date.year": 1,
"_id.date.month": 1,
"_id.date.day": 1,
"_id.article_id": 1,
"_id.state": 1,
"male_like_count": 1,
"female_like_count": 1
}}
]
)
首先,您基本上希望这样做,这就是为聚合管道提供“查询”条件的方式。它基本上可以是任何管道阶段,但当首先使用它时,它将过滤在以下操作中考虑的输入。在这种情况下,需要的日期范围和国家,以及删除任何不“喜欢”的内容,因为您不担心这些计数
然后根据\u id
中的相应“键”对所有项目进行分组。这可以并且可以用作复合字段,主要是因为所有这些字段值都被视为分组键的一部分,也可以用于一个小组织
您似乎还在输出中请求\u id
本身之外的“不同字段”。不要那样做。数据已经存在,因此复制它没有意义。您可以通过聚合运算符在\u id
之外生成相同的内容,甚至可以使用管道末尾的阶段重命名字段。但你最好还是改掉你认为你需要的习惯,因为这只会花费时间和空间来获得回应
如果说有什么区别的话,你似乎更喜欢一个“漂亮的约会”。我个人更喜欢使用“日期数学”进行大多数操作,因此适合mongoid的修改列表是:
article_id
country
date range (from and to)
action (is `like` in this scenario)
sort_by [ date | state | male_like_count | female_like_count ]
Tracking.collection.aggregate([
{ "$match" => {
"created_at" => { "$gte" => startDate, "$lt" => endDate },
"country" => "US",
"action" => "like"
}},
{ "$group" => {
"_id" => {
"date" => {
"$add" => [
{ "$subtract" => [
{ "$subtract" => [ "$created_at", Time.at(0).utc.to_datetime ] },
{ "$mod" => [
{ "$subtract" => [ "$created_at", Time.at(0).utc.to_datetime ] },
1000 * 60 * 60 * 24
]}
]},
Time.at(0).utc.to_datetime
]
},
"article_id" => "$article_id",
"state" => "$state"
},
"male_like_count" => {
"$sum" => {
"$cond" => [
{ "$eq" => [ "$gender", "male" ] }
1,
0
]
}
},
"female_like_count" => {
"$sum" => {
"$cond" => [
{ "$eq" => [ "$gender", "female" ] }
1,
0
]
}
},
"unknown_like_count" => {
"$sum" => {
"$cond" => [
{ "$eq" =>[ "$gender", "unknown" ] }
1,
0
]
}
}
}},
{ "$sort" => {
"_id.date" => 1,
"_id.article_id" => 1,
"_id.state" => 1,
"male_like_count" => 1,
"female_like_count" => 1
}}
])
这实际上就是得到一个DateTime
对象,该对象适合用作与历元日期相对应的驱动程序参数,并执行各种操作。其中,一个BSON日期和另一个BSON日期的处理将产生一个数值,该数值随后可以使用应用数学四舍五入到当前日期。当然,当将数字时间戳值与BSON日期(再次表示历元)一起使用时,结果再次是BSON日期对象,当然是调整后的四舍五入值
然后,这一切都只是再次作为聚合管道阶段应用的问题,就像外部修饰符一样。很像$match
原则,聚合管道可以在任何地方排序,但最后总是处理最终结果。从来没有想到有人会如此巧妙地发布答案。非常感谢@blakes也为漂亮的约会提供了解决方案。我有两个问题:(1)在排序选项中使用\u id.article\u id
有什么用?(2) 我相信排序选项是按照从上到下的顺序工作的,这意味着,首先它将按日期排序,然后按状态排序,然后按类男性计数和类女性计数?正确的。但如果我不需要那个级别的排序,那么只传递所需的键就可以了?