MongoDB为推荐系统构建匹配查询

MongoDB为推荐系统构建匹配查询,mongodb,mongodb-query,aggregation-framework,recommendation-engine,Mongodb,Mongodb Query,Aggregation Framework,Recommendation Engine,我正在尝试使用MongoDB构建用户的个人资料推荐系统(比如Tinder之类的约会应用程序),我想在其中构建特定用户个人资料的查询,其中: 过滤掉已经访问过的用户 按相似兴趣[用户兴趣的交集]对结果进行排序,并返回N most MongoDB系列: 用户配置文件集合: { userId: "user_1", interest1: ["interest_1_a", "interest_1_b", &

我正在尝试使用MongoDB构建用户的个人资料推荐系统(比如Tinder之类的约会应用程序),我想在其中构建特定用户个人资料的查询,其中:

  • 过滤掉已经访问过的用户
  • 按相似兴趣[用户兴趣的交集]对结果进行排序,并返回N most
  • MongoDB系列: 用户配置文件集合:

        {
            userId: "user_1",
            interest1: ["interest_1_a", "interest_1_b", "interest_1_c"],
            interest2: ["interest_2_a", "interest_2_b", "interest_2_c"]
        },
        {
            userId: "user_2",
            interest1: ["interest_1_b", "interest_1_d", "interest_1_e"],
            interest2: ["interest_2_l", "interest_2_k", "interest_2_j"]
        },
        {
            userId: "user_3",
            interest1: ["interest_1_d", "interest_1_g", "interest_1_x"],
            interest2: ["interest_2_f", "interest_2_o", "interest_2_v"]
        },
        {
            userId: "user_4",
            interest1: ["interest_1_q", "interest_1_w", "interest_1_u"],
            interest2: ["interest_2_c", "interest_2_l", "interest_2_l"]
        },
        {
            userId: "user_5",
            interest1: ["interest_1_q", "interest_1_b", "interest_1_x"],
            interest2: ["interest_2_u", "interest_2_c", "interest_2_z"]
        },
        {
            userId: "user_6",
            interest1: ["interest_1_q", "interest_1_b", "interest_1_x"],
            interest2: ["interest_2_u", "interest_2_c", "interest_2_z"]
        },
        ....
    ]
    
    [
        {
            _id: "event_1"
            userId: "user_1"
            visitedUserIds: [user_3, user_7]
        },
        {
            _id: "event_2"
            userId: "user_1"
            visitedUserIds: [user_5]
        }
    ]
    
    UserEvent集合:

        {
            userId: "user_1",
            interest1: ["interest_1_a", "interest_1_b", "interest_1_c"],
            interest2: ["interest_2_a", "interest_2_b", "interest_2_c"]
        },
        {
            userId: "user_2",
            interest1: ["interest_1_b", "interest_1_d", "interest_1_e"],
            interest2: ["interest_2_l", "interest_2_k", "interest_2_j"]
        },
        {
            userId: "user_3",
            interest1: ["interest_1_d", "interest_1_g", "interest_1_x"],
            interest2: ["interest_2_f", "interest_2_o", "interest_2_v"]
        },
        {
            userId: "user_4",
            interest1: ["interest_1_q", "interest_1_w", "interest_1_u"],
            interest2: ["interest_2_c", "interest_2_l", "interest_2_l"]
        },
        {
            userId: "user_5",
            interest1: ["interest_1_q", "interest_1_b", "interest_1_x"],
            interest2: ["interest_2_u", "interest_2_c", "interest_2_z"]
        },
        {
            userId: "user_6",
            interest1: ["interest_1_q", "interest_1_b", "interest_1_x"],
            interest2: ["interest_2_u", "interest_2_c", "interest_2_z"]
        },
        ....
    ]
    
    [
        {
            _id: "event_1"
            userId: "user_1"
            visitedUserIds: [user_3, user_7]
        },
        {
            _id: "event_2"
            userId: "user_1"
            visitedUserIds: [user_5]
        }
    ]
    
    因此,任务针对id为:user_1的用户:

  • 首先筛选出已访问的用户:[用户3、用户7、用户5](这些用户从UserEvent集合聚合)
  • 根据interest1和interest2字段的交集排序并返回用户 因此,结果可能是:
  • 如果这些交叉点对于排序结果可能具有不同的值“系数”(例如,interest1的重要性是interest2的两倍),则需要额外的目标
  • 将结果限制在一些N-比如说100
  • 我知道这并不是MongoDB的最佳使用,但如果能够使用这个数据库在一定规模上以良好的性能(比如说10万用户)执行这样的任务也很有趣,每个用户每天将执行5次这样的匹配查询。另外,我很高兴听到一些关于改进此查询性能的建议,如索引等

    谢谢

    p、 使用最新的MongoDB