left join左表一定是驱动表吗?
日常工作中,遇到很多left join的SQL,今天对left join的这种语法进行简单讲解。刚开始接触MySQL的时候,我也认为使用left join的时候,是左表驱动右表的,但是随着对MySQL理解的深入,时间长了发现这个理解是错误的。
我们先来看个例子:
mysql> create table a(f1 int, f2 int, index(f1))engine=innodb;
Query OK, 0 rows affected (0.01 sec)
mysql> create table b(f1 int, f2 int)engine=innodb;
Query OK, 0 rows affected (0.01 sec)
mysql> insert into a values(1,1),(2,2),(3,3),(4,4),(5,5),(6,6);
Query OK, 6 rows affected (0.00 sec)
Records: 6 Duplicates: 0 Warnings: 0
mysql> insert into b values(3,3),(4,4),(5,5),(6,6),(7,7),(8,8);
Query OK, 6 rows affected (0.02 sec)
Records: 6 Duplicates: 0 Warnings: 0
首先我们创建2个表,表a和表b,2个表的结构一致,其中表a的f1字段有索引,表b没有索引。
来看下面两条SQL
select * from a left join b on(a.f1=b.f1)
and (a.f2=b.f2); /*SQL 1*/
select * from a left join b on(a.f1=b.f1)
where (a.f2=b.f2);/*SQL 2*/
mysql> select * from a left join b on(a.f1=b.f1) and (a.f2=b.f2);
+------+------+------+------+
| f1 | f2 | f1 | f2 |
+------+------+------+------+
| 3 | 3 | 3 | 3 |
| 4 | 4 | 4 | 4 |
| 5 | 5 | 5 | 5 |
| 6 | 6 | 6 | 6 |
| 1 | 1 | NULL | NULL |
| 2 | 2 | NULL | NULL |
+------+------+------+------+
6 rows in set (0.01 sec)
mysql> select * from a left join b on(a.f1=b.f1) where (a.f2=b.f2);
+------+------+------+------+
| f1 | f2 | f1 | f2 |
+------+------+------+------+
| 3 | 3 | 3 | 3 |
| 4 | 4 | 4 | 4 |
| 5 | 5 | 5 | 5 |
| 6 | 6 | 6 | 6 |
+------+------+------+------+
4 rows in set (0.01 sec)
从结果可以看到,这两条SQL返回的值是不一样的,SQL1中,将表b中不存在的记录用null来进行表示,和表a中的记录进行了连接查询。SQL2中,只将表a和表b中共有的记录进行了连接查询。
SQL2的原因可以解释为a.f2 != b.f2,所以不满足where条件。
接下来,来看这两个SQL的执行计划,先看SQL1:
mysql> explain select * from a left join b on(a.f1=b.f1) and (a.f2=b.f2);
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+----------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+----------------------------------------------------+
| 1 | SIMPLE | a | NULL | ALL | NULL | NULL | NULL | NULL | 6 | 100.00 | NULL |
| 1 | SIMPLE | b | NULL | ALL | NULL | NULL | NULL | NULL | 6 | 100.00 | Using where; Using join buffer (Block Nested Loop) |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+----------------------------------------------------+
2 rows in set, 1 warning (0.00 sec)
SQL1的执行计划中不难看出来,表a作为了驱动表,表b作为了被驱动表,之所以做出这个判断,是因为在连接查询的执行计划中,每个表都会对应一条记录,这些记录的id列的值是相同的,出现在前边的表表示驱动表,出现在后边的表表示被驱动表。由于表b上的f1没有索引,所以使用了BNL算法,如果表b上的f1有索引,则会使用INLJ算法。
再来看SQL 2的执行计划:
mysql> explain select * from a left join b on(a.f1=b.f1) where (a.f2=b.f2);
+----+-------------+-------+------------+------+---------------+------+---------+-----------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+-----------+------+----------+-------------+
| 1 | SIMPLE | b | NULL | ALL | NULL | NULL | NULL | NULL | 6 | 100.00 | Using where |
| 1 | SIMPLE | a | NULL | ref | f1 | f1 | 5 | test.b.f1 | 1 | 16.67 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+-----------+------+----------+-------------+
2 rows in set, 1 warning (0.00 sec)
从这个执行计划中可以看到,表b作为了这个SQL的驱动表,表a作为了被驱动表,这个SQL的执行过程是这样的:顺序扫描表b,并将表b的字段放入join buffer,对于join buffer中表b的每一行用b.f1到表a中去查,匹配到记录后判断a.f2=b.f2是否满足,满足条件的话就作为结果集的一部分返回。
SQL2中,因为where条件中,NULL跟任何值执行等值判断和不等值判断的结果,都是NULL,而where null不会输出任何结果集,如下:
mysql> select * from a where null;
Empty set (0.00 sec)
mysql> select * from a ;
+------+------+
| f1 | f2 |
+------+------+
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
| 4 | 4 |
| 5 | 5 |
| 6 | 6 |
+------+------+
6 rows in set (0.00 sec)
所以,where条件得到的结果集中不会包含null值相关的列。
我们再来看看SQL2的explain结果中的warnings:
mysql> explain select * from a left join b on(a.f1=b.f1) where (a.f2=b.f2);
+----+-------------+-------+------------+------+---------------+------+---------+-----------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+-----------+------+----------+-------------+
| 1 | SIMPLE | b | NULL | ALL | NULL | NULL | NULL | NULL | 6 | 100.00 | Using where |
| 1 | SIMPLE | a | NULL | ref | f1 | f1 | 5 | test.b.f1 | 1 | 16.67 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+-----------+------+----------+-------------+
2 rows in set, 1 warning (0.00 sec)
mysql> show warnings;
+-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message |
+-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note | 1003 | /* select#1 */ select `test`.`a`.`f1` AS `f1`,`test`.`a`.`f2` AS `f2`,`test`.`b`.`f1` AS `f1`,`test`.`b`.`f2` AS `f2` from `test`.`a` join `test`.`b` where ((`test`.`a`.`f1` = `test`.`b`.`f1`) and (`test`.`a`.`f2` = `test`.`b`.`f2`)) |
+-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
可以看到,MySQL的优化器把这条语句的left join改写成了join,然后因为表a的f1上有索引,就把表b作为驱动表,这样就可以用上表a的f1索引。
这个例子说明了两点
1、即使我们在SQL语句中写成left join,执行过程还是有可能不是从左到右连接的。也就是说,使用left join时,左边的表不一定是驱动表。
2、如果需要left join的语义,就不能把被驱动表的字段放在where条件里面做等值判断或不等值判断,必须都写在on里面。
如果我们将上面SQL中的left join写成join呢?
select * from a join b on(a.f1=b.f1)
and (a.f2=b.f2); /*SQL 1*/
select * from a join b on(a.f1=b.f1)
where (a.f2=b.f2);/*SQL 2*/
mysql> explain select * from a join b on(a.f1=b.f1) and (a.f2=b.f2);
+----+-------------+-------+------------+------+---------------+------+---------+-----------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+-----------+------+----------+-------------+
| 1 | SIMPLE | b | NULL | ALL | NULL | NULL | NULL | NULL | 6 | 100.00 | Using where |
| 1 | SIMPLE | a | NULL | ref | f1 | f1 | 5 | test.b.f1 | 1 | 16.67 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+-----------+------+----------+-------------+
2 rows in set, 1 warning (0.00 sec)
mysql> show warnings;
+-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message |
+-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note | 1003 | /* select#1 */ select `test`.`a`.`f1` AS `f1`,`test`.`a`.`f2` AS `f2`,`test`.`b`.`f1` AS `f1`,`test`.`b`.`f2` AS `f2` from `test`.`a` join `test`.`b` where ((`test`.`a`.`f2` = `test`.`b`.`f2`) and (`test`.`a`.`f1` = `test`.`b`.`f1`)) |
+-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
mysql> explain select * from a join b on(a.f1=b.f1) where (a.f2=b.f2);
+----+-------------+-------+------------+------+---------------+------+---------+-----------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+-----------+------+----------+-------------+
| 1 | SIMPLE | b | NULL | ALL | NULL | NULL | NULL | NULL | 6 | 100.00 | Using where |
| 1 | SIMPLE | a | NULL | ref | f1 | f1 | 5 | test.b.f1 | 1 | 16.67 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+-----------+------+----------+-------------+
2 rows in set, 1 warning (0.00 sec)
mysql> show warnings;
+-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message |
+-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note | 1003 | /* select#1 */ select `test`.`a`.`f1` AS `f1`,`test`.`a`.`f2` AS `f2`,`test`.`b`.`f1` AS `f1`,`test`.`b`.`f2` AS `f2` from `test`.`a` join `test`.`b` where ((`test`.`a`.`f1` = `test`.`b`.`f1`) and (`test`.`a`.`f2` = `test`.`b`.`f2`)) |
+-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
可以通过结果看到,优化器将这两条SQL改写成了相同的结果。