MySQL优化器join顺序

前一篇介绍了cost的计算方法，下面测试一下两表关联的查询：

测试用例

CREATE TABLE `xpchild` (

  `id` int(11) NOT NULL,

  `name` varchar(100) DEFAULT NULL,

  `c1` int(11) DEFAULT NULL,

  `c2` int(11) DEFAULT NULL,

  PRIMARY KEY (`id`),

  KEY `xpchild_name` (`name`),

  KEY `xpchild_id_c1` (`id`,`c1`)

) ENGINE=InnoDB DEFAULT CHARSET=latin1

 CREATE TABLE `xpchild_1` (

  `xxid` bigint(20) DEFAULT NULL,

  `name` varchar(100) DEFAULT NULL,

  KEY `xpchild_1_id` (`xxid`)

) ENGINE=InnoDB DEFAULT CHARSET=latin1

测试sql

select * from xpchild, xpchild_1 where xpchild.id=100 and xpchild.id=xpchild_1.xxid;

函数调用栈：
JOIN::optimize
make_join_statistics
update_ref_and_keys
get_quick_record_count
choose_plan

以上省略了JOIN::prepare的过程，prepare主要进行一些等级变化，上面的sql是一个比较简单的两表关联，并不会进行过多的变换。

step1: 初始化

make_join_statistics：
根据select_lex->leaf_tables初始化每个JOIN_TAB对象，至此，一个sql对于一个join，对应两个join_tab.
初始化quick_condition_rows为innodb中的stat统计信息中的record记录数。

step 2: 查询可用索引

update_ref_and_keys：根据where条件，选择可以使用的索引，加入到possible keys中，本例子加入的keys包括：
xpchild: primary，xpchild_id_c1
xpchild_1: xxid

(gdb) p *keyuse_array

$67 = {

  buffer = 0x8ca1fb58 "@24214",

  elements = 3,

  max_element = 20,

  alloc_increment = 64,

  size_of_element = 48

step 3：计算cost

get_quick_record_count:根据可选择的possible_keys计算cost。

1. xpchild表

因为有可以使用的primary，xpchild表s->type == JT_CONST，所以cost的计算为：

　　　　s->found_records=s->records=s->read_time=1。
所以，mysql对于使用primary，unique key的使用上比较有倾向性，而且可以节省大量的计算cost的时间。

2. xpchild_1表：

全表扫描的records && read_time是：
　　s->found_records= 1215
　　s->read_time= 5

计算xxid索引的cost：
　　get_quick_record_count
　　　　test_quick_select：
　　　　　　最终计算的cost：
　　　　　　　　estimated_records=1
　　　　　　　　best_read_time=2.21
　　具体的计算方式，可以参考前面一篇博客

到此：xpchild的JOIN_TAB结构中，比较简单，const table类型，cost=1;
xpchild_1的JOIN_TAB结构中，found_records=1, read_time=2.21;
对于单表的的查询access path已经是最优的。

step 4：join的顺序：

choose_join:

1. 如果是const table；不再进行join顺序的计算，直接选择使用当前positions。
memcpy((uchar*) join->best_positions,(uchar*) join->positions,sizeof(POSITION)*join->const_tables);
join->best_read=1.0;

2. 为非const table，选择最优的访问顺序
optimizer_search_depth：优化访问表join顺序的递归计算深度。
straight_join：按照sql的顺序，或者添加sql hint确定的顺序，默认不使用
greedy_search：贪婪算法，对于当前的查询，在候选的表中，选择一个最小cost添加到当前的plan中，递归完成。
best_extension_by_limited_search：根据current_record_count，与调用best_access_path得到的best_record_count进行比较，选择最优的路径。
best_access_path：table->quick_rows根据前面计算的records，得出cost，得到join->positions[idx]的最优路径。

join顺序选择的步骤：

1. 根据best_extension_by_limited_search在remaining table中选择cost最小的那个，本例中，xpchild的cost为：records=1 , read_time=0。所以选择为第一张表。

2. 然后从候选表中选择一个（只剩下xpchild_1表）加入到join的顺序中，并根据best_access_path选择一个cost最低的执行计划加入到plan中，这里选择xpchild_1_id的索引。

最后得到best plan，并赋值给last_query_cost;
　　　　join->thd->status_var.last_query_cost= join->best_read;

最后得到的best plan：
(gdb) p tab->join->best_read
$73 = 1.1990000000000001

(gdb) p tab->join->best_positions

$72 = {{

    records_read = 1,

    read_time = 0,

    table = 0x8ca06078, 'xpchild'

    key = 0x8ca1fb58,   'primary'

    ref_depend_map = 0

  }, {

    records_read = 1,

    read_time = 1,

    table = 0x8ca0620c, 'xpchild_1'

    key = 0x8ca1fbb8,    'xpchild_1_id'

    ref_depend_map = 0

  }

未完待续：