Node.js MongoDB聚合-$lookup性能_Node.js_Mongodb_Aggregation Framework_Mongodb Lookup

Node.js MongoDB聚合-$lookup性能

node.js mongodb

Node.js MongoDB聚合-$lookup性能,node.js,mongodb,aggregation-framework,mongodb-lookup,Node.js,Mongodb,Aggregation Framework,Mongodb Lookup,我将MongoDB 3.6聚合与查找一起使用，以便连接两个集合（用户和订阅用户）我的目标是查询suscriptionusers和join users集合，匹配开始和结束日期，以便获得订阅的一些分析，如订阅用户的国家、年龄范围和性别，并在折线图中显示数据。我是这样做的： db.getCollection('suscriptionusers').aggregate([ {$match: { 'channel_id': ObjectId('......'), 'subscribed

我将MongoDB 3.6聚合与查找一起使用，以便连接两个集合（用户和订阅用户）

我的目标是查询suscriptionusers和join users集合，匹配开始和结束日期，以便获得订阅的一些分析，如订阅用户的国家、年龄范围和性别，并在折线图中显示数据。我是这样做的：

db.getCollection('suscriptionusers').aggregate([
{$match: {
    'channel_id': ObjectId('......'),
    'subscribed_at': {
            $gte: new Date('2018-01-01'),
            $lte: new Date('2019-01-01'),
    },
    'subscribed': true
}},     
{
    $lookup:{
        from: "users",      
        localField: "user_id", 
        foreignField: "_id",
        as: "users"        
    }
},
/*  Implementing this form instead the earlier (above), make the process even slower :(
 {$lookup:
 {
   from: "users",
   let: { user_id: "$user_id" },
   pipeline: [
      { $match:
          { $expr:
             {$eq: [ "$_id",  "$$user_id" ]}
          }
      },
      { $project: { age_range:1, country: 1, gender:1 } }
   ],
   as: "users"
 }
},*/
{$unwind: {
    path: "$users",
    preserveNullAndEmptyArrays: false
}},
{$project: {
    'users.age_range': 1, 
    'users.country': 1, 
    'users.gender': 1, 
    '_id': 1, 
    'subscribed_at': { $dateToString: { format: "%Y-%m", date: "$subscribed_at" } },
    'unsubscribed_at': { $dateToString: { format: "%Y-%m", date: "$unsubscribed_at" } }
}},
])

主要关注的是性能。例如，对于大约150.000个订阅者，查询检索信息大约需要7~8秒，我担心百万订阅者会发生什么，因为即使我对记录设置了限制（例如，仅检索两个月之间的数据），在这段时间内也可能有数百个订阅者

我已经尝试过为subscriptionusers集合为user_id字段创建索引，但是没有改进

db.getCollection('suscriptionusers').ensureIndex({user_id: 1});

我的问题是，我是否也应该在subscriptionusers集合中保存字段（国家、年龄范围和性别）？因为如果我在不使用“查找用户”集合的情况下进行查询，则该过程足够快

或者，有没有更好的方法使用我当前的方案来提高性能

非常感谢：）

编辑：考虑到用户可以订阅多个频道，正因为如此，订阅没有保存在users集合中。好吧，也许不是最好的方法，但我只是将UserSchema中需要的字段包含在SuscriptionUsersSchema中。这对于分析目的来说明显更快。此外，我还发现分析记录在当时必须保持不变，以保持数据在当前生成时的状态。因此，通过以这种方式使用数据，即使用户更改其信息或删除帐户，数据也将保持不变。如果您有任何建议，请随时分享：）

仅供参考，我的SuscriptionUsersSchema现在看起来像：

    var SuscriptionUsersSchema = mongoose.Schema({
  user_id: {
    ref: 'Users',
    type: mongoose.Schema.ObjectId
  },
  channel_id: {
    ref: 'Channels',
    type: mongoose.Schema.ObjectId
  },
  subscribed: {type: Boolean, default:false},
  gender: { type: String, enum: ['male', 'female', 'unknown'], default: 'unknown'},
  age_range: { type: String, enum: [12, 16, 18], default: 18},
  country: {type:String, default:'co'}
  unsubscribed_at: Date,
  subscribed_at: Date
});

您是否已在subscribed_at字段上创建索引？还可以使用更新的

$lookup

语法来

$project

管道内的字段感谢您的帮助@AnthonyWinzlet，我已经实现了您的建议（请参见更新），但是，时间响应几乎相同。我已经在subscribed_at、subscribed和channel_id上创建了索引，甚至还创建了reIndex（），但仍然是一样的。还有其他建议吗？：）

    var SuscriptionUsersSchema = mongoose.Schema({
  user_id: {
    ref: 'Users',
    type: mongoose.Schema.ObjectId
  },
  channel_id: {
    ref: 'Channels',
    type: mongoose.Schema.ObjectId
  },
  subscribed: {type: Boolean, default:false},
  gender: { type: String, enum: ['male', 'female', 'unknown'], default: 'unknown'},
  age_range: { type: String, enum: [12, 16, 18], default: 18},
  country: {type:String, default:'co'}
  unsubscribed_at: Date,
  subscribed_at: Date
});