https://dev.mysql.com/doc/refman/5.6/en/subquery-optimization.html

Semi-join in MySQL 5.6

 
MySQL 5.6.5 Development Milestone Release has a whole new set of algorithms for processing subqueries. It is based on transforming a subquery into a semi-join operation, and then treating semi-join like another join operation throughout the optimizer.

A subquery can be transformed to a semi-join if it matches these criteria:

  • The subquery is part of an IN or =ANY predicate. It cannot be e.g. NOT IN.
  • The subquery consists of a single query block (it must not contain UNION).
  • The subquery does not contain GROUP BY or HAVING.
  • The subquery is not implicitly grouped (it contains no aggregate functions).
  • The subquery predicate is part of a WHERE clause.
  • The subquery predicate must not be part of a disjunctive nor a negated search condition.
  • Neither query block contains the STRAIGHT_JOIN qualifier.
  • The statement must be a SELECT or INSERT statement. Semi-joins are not allowed with UPDATE or DELETE statements.
上面这段话的中文翻译
semi join的定义点wiki, MySQL需要满足如下条件,才会转换成Semi-join
#子查询是IN或者=ANY的,不可以是NOT IN
#子查询只能包含一个Query block,不可以有UNION等操作
#子查询不能包含GROUP BY或者HAVING
#子查询不能包含聚合函数
#子查询谓语不可以是外接查询条件或者否定查询条件
#不可以包含STRAIGHT_JOIN限定词
#Semi-join只能用于SELECT或者INSERT,不可用于UPDATE和DELETE
 
Example subquery that can be transformed to a semi-join:
 
  SELECT * FROM nation
  WHERE n_regionkey IN (SELECT r_regionkey FROM region
                        WHERE r_name='AFRICA');

What is a semi-join operation

A semi-join operation is similar to a regular join operation: It takes two (sets of) tables and combines them using a join condition.
 
Like an outer join, a semi-join is a noncommutative join operator. We denote the tables of the containing query block the outer tables and the tables of the subquery the inner tables.
 
Two factors distinguish a semi-join from a regular join:
  • In a semi-join, the inner tables do not cause duplicates in the result.
  • No columns from the inner tables are added to the result of the operation.
This means that the result of the semi-join is a subset of the rows from the outer tables. It also means that much of the special handling of semi-join is about efficiently eliminating duplicates from the inner tables.
 
We can represent the above query using the artificial SEMI JOIN operator as follows:
 
  SELECT nation.*
  FROM nation SEMI JOIN region
       ON nation.n_regionkey = region.r_regionkey AND
          region.r_name='AFRICA';

Semi-join optimization

If possible, a subquery is transformed into a semi-join during the query resolution stage. The query blocks containing the outer tables and the inner tables are then combined into a larger query block.
 
The next step of the optimization is to determine inner tables that are candidate for table pullout. If the inner table of a semi-join has an eq-ref relationship to the outer part of the query, there can be no duplicates from the inner side, and the semi-join can be converted to a regular join. We call that a table pullout, because the table is "pulled out" from the semi-join and joined with the outer tables.
 
The above subquery can be transformed using table pullout, because there is a unique index on the r_regionkey column. The explain result of the query is as follows:
 
| id | select_type | table  | type | possible_keys | key           | key_len | ref                     | rows | Extra       |
+----+-------------+--------+------+---------------+---------------+---------+-------------------------+------+-------------+
| 1 | PRIMARY | region | ALL | PRIMARY | NULL | NULL | NULL | 5 | Using where |
| 1 | PRIMARY | nation | ref | i_n_regionkey | i_n_regionkey | 5 | dbt3.region.r_regionkey | 2 | |
 
 
As you can see, there is no difference between this explain result and the explain result of a regular join between these two tables.
 
The cost-based optimization procedure is then applied to the tables of the query block. Tables taking part in semi-joins are combined in various ways, with the appropriate cost added to the total plan cost. The end result is an optimal table order and an optimal set of semi-join execution strategies.

Semi-join execution strategies

As said before, semi-join is a regular join operation, combined with removal of possible duplicates from the semi-join inner tables. MySQL implements four different semi-join execution strategies, which have different ways of removing duplicates:
  1. FirstMatch
  2. DuplicateWeedout
  3. Materialization
  4. LooseScan
I will handle the FirstMatch strategy in more detail below. The other semi-join strategies will be handled in later blogs. How semi-join strategies can be combined with join buffering will also be handled.

Controlling semi-join strategies

Use of semi-join is controlled with the optimizer_switch flag semijoin:
 
  set optimizer_switch='semijoin=on'
 
This command enables transformation of subqueries into semi-join operations. This flag is on by default in MySQL 5.6.5.
 
In addition, individual semi-join strategies can be turned on and off with the following optimizer_switch flags:
 
  firstmatch, materialization, loosescan
 
These flags enable or disable the use of the respective semi-join strategies. Notice that there is no way to disable the DuplicateWeedout strategy, as this is the "last resort" strategy selected by the cost-based optimizer. There is also no way to disable the TablePullout strategy, when semi-join is enabled.
 
The default for all semi-join related optimizer_switch flags are:
 
  'semijoin=on,firstmatch=on,materialization=on,loosescan=on'
 
Notice also that the optimizer_switch flag materialization has a second meaning: It controls the use of the subquery materialization feature (which should not be confused with the semi-join materialization strategy).

Semi-join FirstMatch strategy

The semi-join FirstMatch strategy executes a subquery very similar to how the IN-TO-EXISTS strategy familiar from earlier versions of MySQL works: for each matching row in the outer table, check for a match in the inner table. When a match is found, return the row from the outer table, otherwise continue scanning the inner table until reaching the end.
 
Here is a query that is processed using FirstMatch strategy:
 
  SELECT * FROM nation
  WHERE n_nationkey IN (SELECT c_nationkey FROM customer);
 
Notice the FirstMatch(nation) indication in explain output, which means that after a match has been found in table customer, the query executor goes back to scan more rows in table nation:
 
| id | select_type | table    | type | possible_keys | key           | key_len | ref                     | rows | Extra                           |
+----+-------------+----------+------+---------------+---------------+---------+-------------------------+------+---------------------------------+
| 1 | PRIMARY | nation | ALL | PRIMARY | NULL | NULL | NULL | 25 | |
| 1 | PRIMARY | customer | ref | i_c_nationkey | i_c_nationkey | 5 | dbt3.nation.n_nationkey | 1115 | Using index; FirstMatch(nation) |
 
Here is the result from IN-TO-EXISTS transformation, which can still be enabled by setting optimizer_switch flags:
 
set optimizer_switch='semijoin=off,materialization=off' :
 
| id | select_type        | table    | type           | possible_keys | key           | key_len | ref  | rows | Extra       |
+----+--------------------+----------+----------------+---------------+---------------+---------+------+------+-------------+
| 1 | PRIMARY | nation | ALL | NULL | NULL | NULL | NULL | 25 | Using where |
| 2 | DEPENDENT SUBQUERY | customer | index_subquery | i_c_nationkey | i_c_nationkey | 5 | func | 1115 | Using index |
 
As this is basically the same strategy that MySQL 5.5 would choose, there is usually no speedup to gain when FirstMatch is chosen. The advantage may however come when FirstMatch is not the most efficient strategy, or when the cost-based optimizer chooses a more efficient table order. The optimizer can determine a better table order, because it will estimate a realistic cost for a semi-join operation. This contrasts MySQL 5.5 where the placement of a subquery in the query plan was strictly rule-based.

Multiple subqueries

It is possible to combine multiple subqueries with AND and still have them transformed to a semi-join:
 
  SELECT * FROM nation
  WHERE n_regionkey IN (SELECT r_regionkey FROM region
                        WHERE r_name='AFRICA') AND
        n_nationkey IN (SELECT c_nationkey FROM customer);
 
The explain output shows one FirstMatch strategy:
 
| id | select_type | table    | type | possible_keys         | key           | key_len | ref                     | rows | Extra                           |
+----+-------------+----------+------+-----------------------+---------------+---------+-------------------------+------+---------------------------------+
| 1 | PRIMARY | region | ALL | PRIMARY | NULL | NULL | NULL | 5 | Using where |
| 1 | PRIMARY | nation | ref | PRIMARY,i_n_regionkey | i_n_regionkey | 5 | dbt3.region.r_regionkey | 2 | |
| 1 | PRIMARY | customer | ref | i_c_nationkey | i_c_nationkey | 5 | dbt3.nation.n_nationkey | 1115 | Using index; FirstMatch(nation) |

Nested subqueries

MySQL can process nested subqueries and transform them into one semi-join operation. Here is a slightly artificial example:
 
  SELECT * FROM nation
  WHERE n_nationkey IN
     (SELECT c_nationkey FROM customer
      WHERE c_acctbal IN
         (SELECT o_totalprice FROM orders
          WHERE o_orderstatus = 'P'));
 
From the explain output, we see a FirstMatch strategy applied to tables customer and order, jumping back to table nation when a match is found:
 
| id | select_type | table    | type | possible_keys | key           | key_len | ref                     | rows    | Extra                           |
+----+-------------+----------+------+---------------+---------------+---------+-------------------------+---------+---------------------------------+
| 1 | PRIMARY | nation | ALL | PRIMARY | NULL | NULL | NULL | 25 | |
| 1 | PRIMARY | customer | ref | i_c_nationkey | i_c_nationkey | 5 | dbt3.nation.n_nationkey | 1115 | |
| 1 | PRIMARY | orders | ALL | NULL | NULL | NULL | NULL | 1486350 | Using where; FirstMatch(nation) |

mysql5.6子查询的优化的更多相关文章

  1. MySQL子查询的优化

    本文基于MySQL5.7.19测试 创建四张表,pt1.pt2表加上主键 mysql> create table t1 (a1 int, b1 int); mysql> create ta ...

  2. MySQL 子查询(四)子查询的优化、将子查询重写为连接

    MySQL 5.7 ref ——13.2.10.10优化子查询 十.子查询的优化 开发正在进行中,因此从长远来看,没有什么优化建议是可靠的.以下列表提供了一些您可能想要使用的有趣技巧.See also ...

  3. 优化系列 | DELETE子查询改写优化

    0.导读 有个采用子查询的DELETE执行得非常慢,改写成SELECT后执行却很快,最后把这个子查询DELETE改写成JOIN优化过程 1.问题描述 朋友遇到一个怪事,一个用子查询的DELETE,执行 ...

  4. Mysql查询优化器之关于子查询的优化

    下面这些sql都含有子查询: mysql> select * from t1 where a in (select a from t2); mysql> select * from (se ...

  5. mysql关联、子查询索引优化

    1.驱动表:加索引不起作用,因为全表扫描.表1 left join 表2 ,此时表1是驱动表 被驱动表:给这个加索引.  关联查询  子查询时 尽量不使用not in 或者not exists 而是用 ...

  6. oracle查询优化之子查询条件优化

    环境:oracle 11g 现有a表与b表通过a01字段关联,要查询出a表的数据在b表没有数据的数据:sql如下 ) ) 因为flag是虚拟字段没有走不了索引导致这条sql执行起来特别慢 310W条数 ...

  7. 深入理解MySql子查询IN的执行和优化

    IN为什么慢? 在应用程序中使用子查询后,SQL语句的查询性能变得非常糟糕.例如: SELECT driver_id FROM driver where driver_id in (SELECT dr ...

  8. MySQL的一次优化记录 (IN子查询和索引优化)

    这两天实习项目遇到一个网页加载巨慢的问题(10多秒),然后定位到是一个MySQL查询特别慢的语句引起的: SELECT * FROM ( SELECT DISTINCT t.vc_date, t.c_ ...

  9. 聊聊MySQL的子查询

    1. 背景 在之前介绍MySQL执行计划的博文中已经谈及了一些关于子查询相关的执行计划与优化.本文将重点介绍MySQL中与子查询相关的内容,设计子查询优化策略,包含半连接子查询的优化与非半连接子查询的 ...

随机推荐

  1. Eclipse对svn操作切换账号或更换svn地址方法

    1. 切换账号,主要是删除配置文件达到重新更新svn的时候,弹出框让重新输入新的svn用户名和密码. 1.通过删除SVN客户端的账号配置文件   1)查看你的Eclipse中使用的是什么SVN Int ...

  2. 【面试题015】链表中倒数第k个结点

    [面试题015]链表中倒数第k个结点    可以用两个指针,当第一个指针指向了第k个时候,第二个指针让他指向链表的第一个元素,然后这两个指针同时向后面移动, 当第一个指针移动到末尾的时候,第二个指针指 ...

  3. TDD 用语

    OOP  封装  继承  多态 SOLID  SRP 单一职责  Single Responsibility Principle  OCP 开放封闭  Open/Close Principle  LS ...

  4. HUSTOJ(转发)

    来源:http://blog.csdn.net/xiajian2010/article/details/12954855 缘起 大四了,快毕业了,所以想准备点LAMP的知识和经验.刚好实验室里有人在搞 ...

  5. *[topcoder]LongWordsDiv2

    http://community.topcoder.com/stat?c=problem_statement&pm=13147 此题关键在于发现ABAB的组合最多有26*26种,可以穷举,然后 ...

  6. POSIX semaphore: sem_open, sem_close, sem_post, sem_wait

    http://www.cnblogs.com/BloodAndBone/archive/2011/01/18/1938552.html 一.Posix有名信号灯 1.posix有名信号灯函数 函数se ...

  7. Java学习笔记之:java环境搭建

    一.准备工作 在学习java之前需要安装对配置java的运行环境,所以我们需要安装以下软件: 1.java jdk 2.eclipse 二.配置环境变量 1.首先,找到你刚才安装的JDK的安装目录,我 ...

  8. 利用python 获取 windows 组策略

    工作中有时候会有这种需求: 1. 自动配置组策略的安全基线,这个东西不用你自己写了,微软有这个工具,Microsoft Security Compliance Manager,你可以在下面的地址去下载 ...

  9. 被称为同步神器的 BTSync,你可以怎么用?

    在这高速运作的信息化时代,使用云端来衔接工作和生活的点滴已是寻常事.可你是否曾扪心自问过:用各大云端备份自己的信息资料,真的安全放心吗? 毫不夸张的说,其实恶意代码和漏洞早已和你如影随形.你甚至都不用 ...

  10. SSH公钥认证登录

    概述: SSH登录的认证方式分为两种,一种是使用用户名密码的方式,另一种就是通过公钥认证的方式进行访问, 用户名密码登录的方式没什么好说的,本文主要介绍通过公钥认证的方式进行登录. 思路: 在客户端生 ...