Mysql优化_第十三篇（HashJoin篇）

Mysql优化_第十三篇（HashJoin篇）
- 1 适用场景

1 适用场景

纯等值查询，不能使用索引

从MYSQL 8.0.18开始，MYSQL实现了对于相等条件下的HASHJOIN，并且，join条件中无法使用任何索引，比如下面的语句：

SELECT *

    FROM t1

    JOIN t2

        ON t1.c1=t2.c1;

等值查询，使用到索引

当然，如果有一个或者多个索引可以适用于单表谓词，hash join也可以使用到。（这句话不是很懂？原句为：A hash join can also be used when there are one or more indexes that can be used for single-table predicates.

相对于Blocked Nested Loop Algorithm，以下简称BNL，hash join性能更高，并且两者的使用场景相同，所以从8.0.20开始，BNL已经被移除。使用hash join替代之。

通常在EXPLAIN的结果里面，在Extra列，会有如下描述：

Extra: Using where; Using join buffer (hash join)

说明使用到了hash join。

多个join条件中至少包含一个等值查询（可以包含非等值）

虽然hash join适用于等值join，但是，从原则上来讲，在多个join条件中，只要有每对join条件中，至少存在一个等值，Mysql就可以使用到hash join来提升速度，比如下面的语句：

SELECT * FROM t1

    JOIN t2 ON (t1.c1 = t2.c1 AND t1.c2 < t2.c2)  该语句包含非等值的join条件

    JOIN t3 ON (t2.c1 = t3.c1);

EXPLAIN FORMAT=TREE的结果如下：

EXPLAIN: -> Inner hash join (t3.c1 = t1.c1)  (cost=1.05 rows=1)

    -> Table scan on t3  (cost=0.35 rows=1)

    -> Hash

        -> Filter: (t1.c2 < t2.c2)  (cost=0.70 rows=1)

            -> Inner hash join (t2.c1 = t1.c1)  (cost=0.70 rows=1)

                -> Table scan on t2  (cost=0.35 rows=1)

                -> Hash

                    -> Table scan on t1  (cost=0.35 rows=1)

多个join条件对中完全没有等值查询（从8.0.20开始）

在Mysql8.0.20之前，如果join条件中有任何一个条件没有包含等值，那么BNL就会被应用，但是从8.0.20开始，hash join也可以应用到下面的语句：

mysql> EXPLAIN FORMAT=TREE

    -> SELECT * FROM t1

    ->     JOIN t2 ON (t1.c1 = t2.c1)

    ->     JOIN t3 ON (t2.c1 < t3.c1)\G   该join条件不包含等值，会作为filter来使用

*************************** 1. row ***************************

EXPLAIN: -> Filter: (t1.c1 < t3.c1)  (cost=1.05 rows=1)

    -> Inner hash join (no condition)  (cost=1.05 rows=1)

        -> Table scan on t3  (cost=0.35 rows=1)

        -> Hash

            -> Inner hash join (t2.c1 = t1.c1)  (cost=0.70 rows=1)

                -> Table scan on t2  (cost=0.35 rows=1)

                -> Hash

                    -> Table scan on t1  (cost=0.35 rows=1)

笛卡尔积

当然，也可以适用于笛卡尔积（没有指定join条件）：

mysql> EXPLAIN FORMAT=TREE

    -> SELECT *

    ->     FROM t1

    ->     JOIN t2

    ->     WHERE t1.c2 > 50\G

*************************** 1. row ***************************

EXPLAIN: -> Inner hash join  (cost=0.70 rows=1)

    -> Table scan on t2  (cost=0.35 rows=1)

    -> Hash

        -> Filter: (t1.c2 > 50)  (cost=0.35 rows=1)  where条件提早过滤

            -> Table scan on t1  (cost=0.35 rows=1)

普通inner join完全没有等值

mysql> EXPLAIN FORMAT=TREE SELECT * FROM t1 JOIN t2 ON t1.c1 < t2.c1\G

*************************** 1. row ***************************

EXPLAIN: -> Filter: (t1.c1 < t2.c1)  (cost=4.70 rows=12)  //join条件变成了filter

    -> Inner hash join (no condition)  (cost=4.70 rows=12)

        -> Table scan on t2  (cost=0.08 rows=6)

        -> Hash

            -> Table scan on t1  (cost=0.85 rows=6)

Semijoin（Mysql文档EXPLAIN有误，这里更正下）

mysql> EXPLAIN FORMAT=TREE SELECT * FROM t1

    ->     WHERE t1.c1 IN (SELECT t2.c2 FROM t2)\G

*************************** 1. row ***************************

| -> Filter: (t1.c1 < t2.c1)  (cost=0.70 rows=1)

    -> Inner hash join (no condition)  (cost=0.70 rows=1)

        -> Table scan on t2  (cost=0.35 rows=1)

        -> Hash

            -> Table scan on t1  (cost=0.35 rows=1)

 |

Antijoin（Mysql文档EXPLAIN有误，这里更正下）

mysql> EXPLAIN FORMAT=TREE SELECT * FROM t2

    ->     WHERE NOT EXISTS (SELECT * FROM t1 WHERE t1.col1 = t2.col1)\G

*************************** 1. row ***************************

| -> Hash antijoin (t1.c1 = t2.c2)  (cost=0.70 rows=1)

    -> Table scan on t2  (cost=0.35 rows=1)

    -> Hash

        -> Table scan on t1  (cost=0.35 rows=1)

 |

Left outer join

mysql> EXPLAIN FORMAT=TREE SELECT * FROM t1 LEFT JOIN t2 ON t1.c1 = t2.c1\G

*************************** 1. row ***************************

EXPLAIN: -> Left hash join (t2.c1 = t1.c1)  (cost=3.99 rows=36)

    -> Table scan on t1  (cost=0.85 rows=6)

    -> Hash

        -> Table scan on t2  (cost=0.14 rows=6)

Right outer join(MYSQL会把所有的右外连接转换为左外连接)：

mysql> EXPLAIN FORMAT=TREE SELECT * FROM t1 RIGHT JOIN t2 ON t1.c1 = t2.c1\G

*************************** 1. row ***************************

EXPLAIN: -> Left hash join (t1.c1 = t2.c1)  (cost=3.99 rows=36)

    -> Table scan on t2  (cost=0.85 rows=6)

    -> Hash

        -> Table scan on t1  (cost=0.14 rows=6)