Mongodb 两个域上的内连接_Mongodb_Mongoose_Aggregation Framework

Mongodb 两个域上的内连接

mongodb mongoose

Mongodb 两个域上的内连接,mongodb,mongoose,aggregation-framework,Mongodb,Mongoose,Aggregation Framework,我有以下模式 var User = mongoose.Schema({ email:{type: String, trim: true, index: true, unique: true, sparse: true}, password: String, name:{type: String, trim: true, index: true, unique: true, sparse: true}, gender: String, }); var Song =

我有以下模式

var User = mongoose.Schema({
    email:{type: String, trim: true, index: true, unique: true, sparse: true},
    password: String,
    name:{type: String, trim: true, index: true, unique: true, sparse: true},
    gender: String,
});

var Song = Schema({
    track: { type: Schema.Types.ObjectId, ref: 'track' },//Track can be deleted
    author: { type: Schema.Types.ObjectId, ref: 'user' },
    url: String,
    title: String,
    photo: String,
        publishDate: Date,
    views: [{ type: Schema.Types.ObjectId, ref: 'user' }],
    likes: [{ type: Schema.Types.ObjectId, ref: 'user' }],
    collaborators: [{ type: Schema.Types.ObjectId, ref: 'user' }],
});

我想选择所有用户（不带密码值），但我希望每个用户都拥有他是作者或合作者之一的所有歌曲，并且这些歌曲是在过去两周内发布的

执行此操作（user.id和song.collaborator之间的绑定）的最佳策略是什么？它可以在一次选择中完成吗？

在一次请求中很可能完成，MongoDB的基本工具是

我认为从

Song

集合中查询实际上更有意义，因为您的条件是它们必须列在该集合的两个属性之一中

最优内连接-反向假设实际的“模型”名称是上面列出的：

var today = new Date.now(),
    oneDay = 1000 * 60 * 60 * 24,
    twoWeeksAgo = new Date(today - ( oneDay * 14 ));

var userIds;   // Should be assigned as an 'Array`, even if only one

Song.aggregate([
  { "$match": { 
    "$or": [
      { "author": { "$in": userIds } },
      { "collaborators": { "$in": userIds } }
    ],
    "publishedDate": { "$gt": twoWeeksAgo }
  }},
  { "$addFields": { 
    "users": { 
      "$setIntersection": [ 
        userIds,
        { "$setUnion": [ ["$author"], "$collaborators" ] }
      ]
    }
  }},
  { "$lookup": {
    "from": User.collection.name,
    "localField": "users",
    "foreignField": "_id",
    "as": "users"
  }},
  { "$unwind": "$users" },
  { "$group": {
    "_id": "$users._id",
    "email": { "$first": "$users.email" },
    "name": { "$first": "$users.name" },
    "gender": { "$first": "$users.gender" },
    "songs": {
      "$push": {
        "_id": "$_id",
        "track": "$track",
        "author": "$author",
        "url": "$url",
        "title": "$title",
        "photo": "$photo",
        "publishedDate": "$publishedDate",
        "views": "$views",
        "likes": "$likes",
        "collaborators": "$collaborators"
      }
    }
  }}
])

对我来说，这是最符合逻辑的过程，只要它是您希望从结果中获得的“内部连接”，这意味着在涉及的两个属性中“所有用户必须至少提及一首歌曲”

采用组合这两个元素的“唯一列表”（

ObjectId

无论如何都是唯一的）。因此，如果一个“作者”也是一个“合作者”，那么他们只会为这首歌列出一次

“过滤器”将该组合列表中的列表过滤为仅在查询条件中指定的列表。这将删除选择中不存在的任何其他“collaborator”条目

对该组合数据进行“联接”以获得用户，之所以这样做是因为您希望

用户

成为主要细节。因此，我们基本上将结果中的“用户数组”反转为“歌曲数组”

此外，由于主要条件来自歌曲，因此从该集合中查询作为方向是有意义的

可选左连接另一种方式是“左加入”，即“所有用户”，无论是否有相关歌曲：

User.aggregate([
  { "$lookup": {
    "from": Song.collection.name,
    "localField": "_id",
    "foreignField": "author",
    "as": "authors"
  }},
  { "$lookup": {
    "from": Song.collection.name,
    "localField": "_id",
    "foreignField": "collaborators",
    "as": "collaborators"
  }},
  { "$project": {
    "email": 1,
    "name": 1,
    "gender": 1,
    "songs": { "$setUnion": [ "$authors", "$collaborators" ] }
  }}
])

因此，该语句的列表“看起来”更短，但它迫使“两个”阶段，以获得可能的“作者”和“合作者”的结果，而不是一个。因此，实际的“连接”操作在执行时间上可能代价高昂

其余的应用相同，但这一次是“结果数组”，而不是原始数据源

如果您希望在“过滤器”上为“歌曲”设置与上述类似的“查询”条件，而不是实际返回的

用户

文档，那么对于LEFT Join，您实际上需要数组内容“post”：

这意味着，通过左连接条件，将返回所有

用户

文档，但唯一包含任何“歌曲”的文档将是那些满足作为所提供

用户ID

一部分的“筛选”条件的文档。甚至那些包含在列表中的用户也只会显示

publishedDate

所需范围内的“歌曲”

中的主要添加内容是运算符，这是将

userIds

中提供的列表与文档中两个字段的“组合”列表进行比较的一种简短方法。这里注意到，由于每个用户的早期条件，“当前用户”必须是“相关的”

MongoDB 3.6预览版 MongoDB 3.6版本中提供了一种新的“子管道”语法，这意味着您可以将其构造为“子管道”，在返回结果之前对内容进行最佳过滤，而不是像左连接变量所示的“两个”阶段：

User.aggregate([
  { "$lookup": {
    "from": Song.collection.name,
    "let": {
      "user": "$_id"
    },
    "pipeline": [
      { "$match": {
        "$or": [
          { "author": { "$in": userIds } },
          { "collaborators": { "$in": userIds } }
        ],
        "publishedDate": { "$gt": twoWeeksAgo },
        "$expr": {
          "$or": [
            { "$eq": [ "$$user", "$author" ] },
            { "$setIsSubset": [ ["$$user"], "$collaborators" ]
          ]
        }
      }}
    ],
    "as": "songs"
  }}
])

在这种情况下，就是这样，因为

$expr

允许使用

$$user

中声明的

$$user

变量与歌曲集合中的每个条目进行比较，以仅选择除其他查询条件外匹配的条目。结果是每个用户或空数组中只有那些匹配的歌曲。因此，将整个“子管道”简化为一个表达式，这与附加逻辑几乎相同，而不是固定的本地键和外键

因此，您甚至可以向下面的管道添加一个阶段，以过滤掉任何“空”数组结果，使整个结果成为内部联接

所以我个人会在你可以的时候使用第一种方法，并且只在你需要的时候使用第二种方法

注意：这里有几个选项实际上并不适用。第一种是一种特殊情况，其中基本情况适用于初始内部联接示例，但不能应用于左联接情况

这是因为为了获得左联接，的使用必须使用
preserveNullAndEmptyArrays:true
实现，这打破了应用规则，即
展开
和
匹配
不能在中“卷起”，也不能“在之前”应用于外部集合返回结果
因此，为什么不在示例中应用它，而是在返回的数组中使用它，因为在“返回”结果之前，没有可以应用于外部集合的最佳操作，并且没有任何东西可以阻止仅在外键上匹配的歌曲的所有结果返回。内部联接当然不同
另一种情况是用猫鼬填充
.populate（）
。最重要的区别在于，
.populate（）
不是一个请求，而是实际发出多个查询的编程“速记”。因此，无论如何，实际上会发出多个查询，总是需要所有结果才能应用任何筛选
这导致了对过滤实际应用位置的限制，通常意味着您在使用需要在外部集合上应用条件的“客户端连接”时无法真正实现“分页”概念
这里有更多关于这方面的细节，以及英国广播公司如何
User.aggregate([ { "$lookup": { "from": Song.collection.name, "let": { "user": "$_id" }, "pipeline": [ { "$match": { "$or": [ { "author": { "$in": userIds } }, { "collaborators": { "$in": userIds } } ], "publishedDate": { "$gt": twoWeeksAgo }, "$expr": { "$or": [ { "$eq": [ "$$user", "$author" ] }, { "$setIsSubset": [ ["$$user"], "$collaborators" ] ] } }} ], "as": "songs" }} ])