文章/答案/技术大牛

发布

社区首页 >问答首页 >优化“repost”MySQL查询

问优化“repost”MySQL查询
EN

Stack Overflow用户

提问于 2021-02-25 06:05:46

回答 2查看 66关注 0票数 1

我有一个帖子表，一个转贴表，以及一个代表“用户跟随”状态的表格。

我想做一些像Twitter这样的事情，在那里我展示了所有跟踪用户的帖子或转贴。

我希望文章在第一次出现时出现，这样，如果多个用户重新发布该帖子，它只会在第一次出现。

为了加快查询速度，每当创建post时，我都会将其插入到repost表中，这样也会创建相应的repost (来自作者)。

我的模式如下所示：

Table Post
id: INT
userId: INT
time: INT

Table Repost
id: INT
postId: INT
userId: INT
time: INT

Table users_following
userId: INT
followerId: INT

我的查询看起来是这样的。

SELECT sr.* FROM Repost sr
INNER JOIN (
    SELECT MIN(ir.time) min_time, ir.postId FROM Repost ir
    WHERE ir.userId IN (
        SELECT uf.userId FROM users_following uf WHERE
        ir.userId = uf.userId AND uf.followerId = 1
    )
    OR ir.userId = 1
    GROUP BY ir.postId
) rr ON rr.postId = sr.postId AND sr.time = rr.min_time

他们的想法是：

SELECT从viewer.
SELECT uf.选择所有的用户id，然后选择来自Repost的viewer.
SELECT。为给定的帖子选择最小的重发时间，其中，重新发帖id要么是跟随的用户，要么是repost的viewer.
SELECT。使用内部连接为给定的帖子选择最短的时间。

这是可行的，但第三阶段是缓慢的。我相信这是因为一旦我们有了一个大的min_times列表，我们就不能使用任何索引从这个子查询中进行选择，这意味着我们需要扫描所有的东西。是否有一种方法来构造这个查询，使其更具表现性？

这是为核心读者提供的完整的EXPLAIN和SHOW CREATE TABLE。

解释

+----+--------------------+------------+------------+--------+-------------------------------------------------------------+----------------------+---------+---------------------------------+--------+----------+--------------------------+
| id | select_type        | table      | partitions | type   | possible_keys                                               | key                  | key_len | ref                             | rows   | filtered | Extra                    |
+----+--------------------+------------+------------+--------+-------------------------------------------------------------+----------------------+---------+---------------------------------+--------+----------+--------------------------+
|  1 | PRIMARY            | <derived2> | NULL       | ALL    | NULL                                                        | NULL                 | NULL    | NULL                            | 797455 |   100.00 | Using where              |
|  1 | PRIMARY            | sr         | NULL       | ref    | IDX_DA9843F3E094D20D,repost_time_idx,repost_stream_idx      | repost_time_idx      | 4       | rr.min_time                     |      1 |     4.92 | Using where              |
|  2 | DERIVED            | ir         | NULL       | index  | IDX_DA9843F364B64DCC,IDX_DA9843F3E094D20D,repost_stream_idx | IDX_DA9843F3E094D20D | 4       | NULL                            | 797456 |   100.00 | Using where              |
|  3 | DEPENDENT SUBQUERY | uf         | NULL       | eq_ref | PRIMARY,IDX_17C2F70264B64DCC,IDX_17C2F702F542AA03           | PRIMARY              | 8       | prose_2_24_2021.ir.userId,const |      1 |   100.00 | Using where; Using index |
+----+--------------------+------------+------------+--------+-------------------------------------------------------------+----------------------+---------+---------------------------------+--------+----------+--------------------------+

显示创建表Repost

CREATE TABLE `Repost` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `postId` int(11) NOT NULL,
  `userId` int(11) NOT NULL,
  `time` int(11) NOT NULL,
  `isRepost` int(11) NOT NULL,
  PRIMARY KEY (`id`),
  KEY `IDX_DA9843F364B64DCC` (`userId`),
  KEY `IDX_DA9843F3E094D20D` (`postId`),
  KEY `repost_time_idx` (`time`),
  KEY `repost_stream_idx` (`time`,`userId`,`postId`),
  CONSTRAINT `FK_DA9843F364B64DCC` FOREIGN KEY (`userId`) REFERENCES `ProseUser` (`id`),
  CONSTRAINT `FK_DA9843F3E094D20D` FOREIGN KEY (`postId`) REFERENCES `Post` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=809018 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

显示创建表users_following

CREATE TABLE `users_following` (
  `userId` int(11) NOT NULL,
  `followerId` int(11) NOT NULL,
  PRIMARY KEY (`userId`,`followerId`),
  KEY `IDX_17C2F70264B64DCC` (`userId`),
  KEY `IDX_17C2F702F542AA03` (`followerId`),
  CONSTRAINT `FK_17C2F70264B64DCC` FOREIGN KEY (`userId`) REFERENCES `ProseUser` (`id`),
  CONSTRAINT `FK_17C2F702F542AA03` FOREIGN KEY (`followerId`) REFERENCES `ProseUser` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

编辑

像这样调整查询会产生更快的结果，尽管添加ORDER BY会使其变得更慢。没有ORDER BY，这个查询是很棒的。

SELECT sr.* FROM Repost sr
INNER JOIN (
    SELECT MIN(ir.time) min_time, ir.postId FROM Repost ir
    INNER JOIN users_following uf ON ir.userId = uf.userId AND uf.followerId = 1
    GROUP BY ir.postId
) rr ON rr.postId = sr.postId AND sr.time = rr.min_time
ORDER BY sr.time desc
LIMIT 10

以下是对此查询的解释：

+----+-------------+------------+------------+--------+--------------------------------------------------------------------------------+----------------------+---------+---------------------------+------+----------+----------------------------------------------+
| id | select_type | table      | partitions | type   | possible_keys                                                                  | key                  | key_len | ref                       | rows | filtered | Extra                                        |
+----+-------------+------------+------------+--------+--------------------------------------------------------------------------------+----------------------+---------+---------------------------+------+----------+----------------------------------------------+
|  1 | PRIMARY     | <derived2> | NULL       | ALL    | NULL                                                                           | NULL                 | NULL    | NULL                      |  691 |   100.00 | Using where; Using temporary; Using filesort |
|  1 | PRIMARY     | sr         | NULL       | ref    | IDX_DA9843F3E094D20D,repost_time_idx,repost_stream_idx,repost_stream2_idx      | repost_stream2_idx   | 8       | rr.min_time,rr.postId     |    1 |   100.00 | NULL                                         |
|  2 | DERIVED     | uf         | NULL       | ref    | PRIMARY,IDX_17C2F70264B64DCC,IDX_17C2F702F542AA03                              | IDX_17C2F702F542AA03 | 4       | const                     |  145 |   100.00 | Using index; Using temporary; Using filesort |
|  2 | DERIVED     | ir         | NULL       | ref    | IDX_DA9843F364B64DCC,IDX_DA9843F3E094D20D,repost_stream_idx,repost_stream2_idx | IDX_DA9843F364B64DCC | 4       | prose_2_24_2021.uf.userId |    9 |   100.00 | NULL                                         |
|  2 | DERIVED     | rp         | NULL       | eq_ref | PRIMARY,post_spotlight_idx,post_time_idx,post_trending_idx                     | PRIMARY              | 4       | prose_2_24_2021.ir.postId |    1 |    50.00 | Using where                                  |
+----+-------------+------------+------------+--------+--------------------------------------------------------------------------------+----------------------+---------+---------------------------+------+----------+----------------------------------------------+

mysql

query-optimization

回答 2

Stack Overflow用户

发布于 2021-02-25 23:51:00

因此，我编写这样的排名查询的典型方式是：

select id, postid, userid, time
from
(
  select rp.*, min(time) over (partition by postid) as first_time
  from repost rp
  where userid = 1 
  or userid in (select userid from users_following where followerid = 1)
) numbered
where time = first_time;

有时候优化器在OR上有问题，如果他们认为这样做更快的话，他们无法看到他们可以在表中运行两次。在这种情况下，我们可以使用UNION提示

select id, postid, userid, time
from
(
  select rp.*, min(time) over (partition by postid) as first_time
  from
  (
    select *
    from repost
    where userid = 1 
    union all
    select *
    from repost
    where userid in (select userid from users_following where followerid = 1)
  ) rp
) numbered
where time = first_time;

有一次，MySQL因处理IN子句有问题而闻名。我不认为现在是这样了。如果DBMS确实有问题，可以使用EXISTS：

from repost rp
where exists 
(
  select null
  from users_following uf
  where uf.userid = rp.userid 
  and uf.followerid = 1
)

在版本8之前的MySQL版本中，MIN OVER这样的分析函数是不可用的。在这些版本中，您必须找到每个帖子的最短时间，然后再读一次表。一个直截了当的方法：

select *
from repost
where (postid, time) in
(
  select postid, min(time)
  from repost
  where userid = 1 
  or userid in (select userid from users_following where followerid = 1)
  group by postid
);

在任何情况下，您都需要一个索引来查找跟踪用户。DBMS可以随重发用户而来，并检查它们是否后面跟着用户#1，或者接受用户#1并查找所有跟随的用户。因此，我提供了两个索引：

create index idx1 on users_following (userid, followerid);
create index idx2 on users_following (followerid, userid);

然后，您希望快速找到他们的帖子，然后按post ID和order按时间分组。这方面的指数：

create index idx3 on repost (userid, postid, time);

另一种看待这个问题的方法是:如果我们遍历整个表并为所需的用户保留行，那么如果已经按照postid、time对行进行排序，那就太好了。所以以防万一：

create index idx3 on repost (postid, time);

进行完整的索引扫描。

索引是提供给DBMS的。DBMS可以接受此建议并使用索引或不使用索引。我经常做的事：

考虑DBMS可以访问这些路由的
索引的顺序。
使用EXPLAIN查看我的哪些索引被使用。
删除其他的。

。

票数 1

Stack Overflow用户

发布于 2021-02-26 20:58:00

Repost需要对来自

  PRIMARY KEY (`id`),
  KEY `IDX_DA9843F364B64DCC` (`userId`),
  KEY `IDX_DA9843F3E094D20D` (`postId`),
  KEY `repost_time_idx` (`time`),
  KEY `repost_stream_idx` (`time`,`userId`,`postId`),

至

  PRIMARY KEY(postId, userId, time, id),   -- `id` is for uniqueness
  INDEX(id)  -- to keep AUTO_INCREMENT happy

(我不知道其他人是否有用。)

将IN ( SELECT ... )更改为EXISTS ( SELECT 1 ... )

OR是性能杀手。用OR的一边计时查询，然后用另一侧计时。假设这些时间的总和比你当前的时间要快，那么把这些时间加在一起吧。如果可行的话，简化每个查询。给我看看结果，我可能有更多的索引建议。

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/66363209

复制

相似问题

问优化“repost”MySQL查询
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问优化“repost”MySQL查询EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问优化“repost”MySQL查询
EN