优化a";“转售”;MySQL查询

优化a";“转售”;MySQL查询,mysql,query-optimization,Mysql,Query Optimization,我有一个帖子表、一个回复表和一个表示“用户跟踪”状态的表 我想做一些类似Twitter的事情,在那里我显示所有来自被跟踪用户的帖子或回复 我希望帖子在第一次出现时就出现,这样,如果多个用户重新发布帖子,它只会在第一次出现 为了加快这个查询的速度,每当创建一篇文章时,我都会将其插入到repost表中,这样相应的repost(来自作者)也会被创建 我的模式如下所示: Table Post id: INT userId: INT time: INT Table Repost id: INT post

我有一个帖子表、一个回复表和一个表示“用户跟踪”状态的表

我想做一些类似Twitter的事情,在那里我显示所有来自被跟踪用户的帖子或回复

我希望帖子在第一次出现时就出现,这样,如果多个用户重新发布帖子,它只会在第一次出现

为了加快这个查询的速度,每当创建一篇文章时,我都会将其插入到repost表中,这样相应的repost(来自作者)也会被创建

我的模式如下所示:

Table Post
id: INT
userId: INT
time: INT

Table Repost
id: INT
postId: INT
userId: INT
time: INT

Table users_following
userId: INT
followerId: INT
我的问题是这样的

SELECT sr.* FROM Repost sr
INNER JOIN (
    SELECT MIN(ir.time) min_time, ir.postId FROM Repost ir
    WHERE ir.userId IN (
        SELECT uf.userId FROM users_following uf WHERE
        ir.userId = uf.userId AND uf.followerId = 1
    )
    OR ir.userId = 1
    GROUP BY ir.postId
) rr ON rr.postId = sr.postId AND sr.time = rr.min_time
这个想法是:

  • 从uf后面的用户中选择。选择查看器后面的所有用户ID
  • 从转发ir中选择。选择给定帖子的最小转发时间,其中转发者id为跟踪用户或查看者
  • 从Repost sr.中选择。使用内部联接选择具有给定帖子的最小时间的Repost
  • 这是可行的,但第三阶段进展缓慢。我相信这是因为一旦我们有了一个大的min_时间列表,我们就不能使用任何索引从该子查询中进行选择,这意味着我们需要扫描所有内容。是否有一种方法可以构造此查询以提高其性能

    下面是为核心读者提供的完整的
    解释
    显示创建表

    解释

    +----+--------------------+------------+------------+--------+-------------------------------------------------------------+----------------------+---------+---------------------------------+--------+----------+--------------------------+
    | id | select_type        | table      | partitions | type   | possible_keys                                               | key                  | key_len | ref                             | rows   | filtered | Extra                    |
    +----+--------------------+------------+------------+--------+-------------------------------------------------------------+----------------------+---------+---------------------------------+--------+----------+--------------------------+
    |  1 | PRIMARY            | <derived2> | NULL       | ALL    | NULL                                                        | NULL                 | NULL    | NULL                            | 797455 |   100.00 | Using where              |
    |  1 | PRIMARY            | sr         | NULL       | ref    | IDX_DA9843F3E094D20D,repost_time_idx,repost_stream_idx      | repost_time_idx      | 4       | rr.min_time                     |      1 |     4.92 | Using where              |
    |  2 | DERIVED            | ir         | NULL       | index  | IDX_DA9843F364B64DCC,IDX_DA9843F3E094D20D,repost_stream_idx | IDX_DA9843F3E094D20D | 4       | NULL                            | 797456 |   100.00 | Using where              |
    |  3 | DEPENDENT SUBQUERY | uf         | NULL       | eq_ref | PRIMARY,IDX_17C2F70264B64DCC,IDX_17C2F702F542AA03           | PRIMARY              | 8       | prose_2_24_2021.ir.userId,const |      1 |   100.00 | Using where; Using index |
    +----+--------------------+------------+------------+--------+-------------------------------------------------------------+----------------------+---------+---------------------------------+--------+----------+--------------------------+
    
    显示以下创建表用户\u

    CREATE TABLE `users_following` (
      `userId` int(11) NOT NULL,
      `followerId` int(11) NOT NULL,
      PRIMARY KEY (`userId`,`followerId`),
      KEY `IDX_17C2F70264B64DCC` (`userId`),
      KEY `IDX_17C2F702F542AA03` (`followerId`),
      CONSTRAINT `FK_17C2F70264B64DCC` FOREIGN KEY (`userId`) REFERENCES `ProseUser` (`id`),
      CONSTRAINT `FK_17C2F702F542AA03` FOREIGN KEY (`followerId`) REFERENCES `ProseUser` (`id`)
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci 
    
    编辑

    这样调整查询会产生更快的结果,尽管添加
    orderby
    会使查询速度变慢。如果没有
    订购,此查询非常好

    SELECT sr.* FROM Repost sr
    INNER JOIN (
        SELECT MIN(ir.time) min_time, ir.postId FROM Repost ir
        INNER JOIN users_following uf ON ir.userId = uf.userId AND uf.followerId = 1
        GROUP BY ir.postId
    ) rr ON rr.postId = sr.postId AND sr.time = rr.min_time
    ORDER BY sr.time desc
    LIMIT 10
    
    以下是此查询的解释:

    +----+-------------+------------+------------+--------+--------------------------------------------------------------------------------+----------------------+---------+---------------------------+------+----------+----------------------------------------------+
    | id | select_type | table      | partitions | type   | possible_keys                                                                  | key                  | key_len | ref                       | rows | filtered | Extra                                        |
    +----+-------------+------------+------------+--------+--------------------------------------------------------------------------------+----------------------+---------+---------------------------+------+----------+----------------------------------------------+
    |  1 | PRIMARY     | <derived2> | NULL       | ALL    | NULL                                                                           | NULL                 | NULL    | NULL                      |  691 |   100.00 | Using where; Using temporary; Using filesort |
    |  1 | PRIMARY     | sr         | NULL       | ref    | IDX_DA9843F3E094D20D,repost_time_idx,repost_stream_idx,repost_stream2_idx      | repost_stream2_idx   | 8       | rr.min_time,rr.postId     |    1 |   100.00 | NULL                                         |
    |  2 | DERIVED     | uf         | NULL       | ref    | PRIMARY,IDX_17C2F70264B64DCC,IDX_17C2F702F542AA03                              | IDX_17C2F702F542AA03 | 4       | const                     |  145 |   100.00 | Using index; Using temporary; Using filesort |
    |  2 | DERIVED     | ir         | NULL       | ref    | IDX_DA9843F364B64DCC,IDX_DA9843F3E094D20D,repost_stream_idx,repost_stream2_idx | IDX_DA9843F364B64DCC | 4       | prose_2_24_2021.uf.userId |    9 |   100.00 | NULL                                         |
    |  2 | DERIVED     | rp         | NULL       | eq_ref | PRIMARY,post_spotlight_idx,post_time_idx,post_trending_idx                     | PRIMARY              | 4       | prose_2_24_2021.ir.postId |    1 |    50.00 | Using where                                  |
    +----+-------------+------------+------------+--------+--------------------------------------------------------------------------------+----------------------+---------+---------------------------+------+----------+----------------------------------------------+
    
    +----+-------------+------------+------------+--------+--------------------------------------------------------------------------------+----------------------+---------+---------------------------+------+----------+----------------------------------------------+
    |id |选择|类型|表格|分区|类型|可能的|键|键|列|参考|行|过滤|额外|
    +----+-------------+------------+------------+--------+--------------------------------------------------------------------------------+----------------------+---------+---------------------------+------+----------+----------------------------------------------+
    |1 | PRIMARY | NULL | ALL | NULL | NULL | NULL | NULL | 691 | 100.00 |使用where;使用临时设备;使用文件排序|
    |1 |主| sr | NULL | ref | IDX | u DA9843F3E094D20D,repost | u time | IDX,repost | u stream | sr | NULL | repost | u stream 2 | IDX | 8 | rr min | time,rr posted 1 | 100.00 | NULL|
    |2 |派生| uf | NULL | ref | PRIMARY,IDX|u 17C2F70264B64DCC,IDX|u 17C2F702F542AA03 | IDX|u 17C2F702F542AA03 | 4 | const | 145 | 100.00 |使用索引;使用临时设备;使用文件排序|
    |2 |派生| ir | NULL | ref | IDX | u DA9843F364B64DCC,IDX | u DA9843F3E094D20D,repost | u stream | IDX | IDX | u DA9843F364B64DCC | 4 |散文| u 24 | u 2021.uf userId | 9 | 100.00 | NULL|
    |2 |派生| rp | NULL | eq | ref |主要,聚光灯后| idx,时间后| idx,趋势后| idx |主要| 4 |散文| 2 | 24 | U 2021.ir.postId | 1 | 50.00 |使用where|
    +----+-------------+------------+------------+--------+--------------------------------------------------------------------------------+----------------------+---------+---------------------------+------+----------+----------------------------------------------+
    
    因此,我编写此类排名查询的典型方式是:

    select id, postid, userid, time
    from
    (
      select rp.*, min(time) over (partition by postid) as first_time
      from repost rp
      where userid = 1 
      or userid in (select userid from users_following where followerid = 1)
    ) numbered
    where time = first_time;
    
    <>有时优化器会遇到<代码>或的问题,如果它们考虑得更快,则无法看到它们可以在表中运行两次。在这种情况下,我们可以使用
    UNION
    提示:

    select id, postid, userid, time
    from
    (
      select rp.*, min(time) over (partition by postid) as first_time
      from
      (
        select *
        from repost
        where userid = 1 
        union all
        select *
        from repost
        where userid in (select userid from users_following where followerid = 1)
      ) rp
    ) numbered
    where time = first_time;
    
    MySQL曾因在
    子句中遇到
    问题而闻名。我认为情况不再是这样了。如果DBMS确实存在问题,您可以使用
    EXISTS

    from repost rp
    where exists 
    (
      select null
      from users_following uf
      where uf.userid = rp.userid 
      and uf.followerid = 1
    )
    
    在8版之前的MySQL版本中,诸如
    minover
    等分析函数不可用。在这些版本中,您必须找到每篇文章的最短时间,然后再次阅读该表。一条直截了当的道路:

    select *
    from repost
    where (postid, time) in
    (
      select postid, min(time)
      from repost
      where userid = 1 
      or userid in (select userid from users_following where followerid = 1)
      group by postid
    );
    
    在任何情况下,您都希望索引能够快速查找后续用户。DBMS可以自由地附带一个repost用户,并检查他们后面是否有user#1,或者获取user#1并查找所有跟随的用户。因此,我将提供两个索引:

    create index idx1 on users_following (userid, followerid);
    create index idx2 on users_following (followerid, userid);
    
    然后你想快速找到他们的回复,然后按帖子ID分组,按时间排序。这方面的索引:

    create index idx3 on repost (userid, postid, time);
    
    另一种方法是:如果我们通读整个表并为所需的用户保留行,那么如果这些行已经按postid、time排序就更好了。所以,以防万一:

    create index idx3 on repost (postid, time);
    
    进行完整索引扫描

    索引是对DBMS的一种提供。DBMS可以接受这个提议并使用索引,也可以不使用。我经常做的事:

  • 考虑DBMS访问表的顺序
  • 为这些路由提供索引
  • 使用
    EXPLAIN
    查看使用了哪些索引
  • 放下其他的

  • 回购需要对来自

      PRIMARY KEY (`id`),
      KEY `IDX_DA9843F364B64DCC` (`userId`),
      KEY `IDX_DA9843F3E094D20D` (`postId`),
      KEY `repost_time_idx` (`time`),
      KEY `repost_stream_idx` (`time`,`userId`,`postId`),
    

    (我不知道其他人是否有用。)

    将(选择…
    中的
    更改为
    存在(选择1…

    是性能杀手。用OR的一侧计时查询,然后用另一侧计时。假设
      PRIMARY KEY (`id`),
      KEY `IDX_DA9843F364B64DCC` (`userId`),
      KEY `IDX_DA9843F3E094D20D` (`postId`),
      KEY `repost_time_idx` (`time`),
      KEY `repost_stream_idx` (`time`,`userId`,`postId`),
    
      PRIMARY KEY(postId, userId, time, id),   -- `id` is for uniqueness
      INDEX(id)  -- to keep AUTO_INCREMENT happy