高性能MySQL笔记-第5章Indexing for High Performance-005聚集索引

一、聚集索引介绍

1.什么是聚集索引？

InnoDB’s clustered indexes actually store a B-Tree index and the rows together in the same structure.

2.为什么一张表只能一个聚集索引？

When a table has a clustered index, its rows are actually stored in the index’s leaf pages.The term “clustered” refers to the fact that rows with adjacent key values are stored close to each other. You can have only one clustered index per table, because you can’t store the rows in two places at once. (However, covering indexes let you emulate mul-
tiple clustered indexes; more on this later.)

3.聚集索引的优点

• You can keep related data close together. For example, when implementing a mailbox, you can cluster by user_id , so you can retrieve all of a single user’s messages by fetching only a few pages from disk. If you didn’t use clustering, each message might require its own disk I/O.
• Data access is fast. A clustered index holds both the index and the data together in one B-Tree, so retrieving rows from a clustered index is normally faster than a comparable lookup in a nonclustered index.
• Queries that use covering indexes can use the primary key values contained at the leaf node.

4.聚集索引的缺点

• Clustering gives the largest improvement for I/O-bound workloads. If the data fits in memory the order in which it’s accessed doesn’t really matter, so clustering doesn’t give much benefit.
• Insert speeds depend heavily on insertion order. Inserting rows in primary key order is the fastest way to load data into an InnoDB table. It might be a good idea to reorganize the table with OPTIMIZE TABLE after loading a lot of data if you didn’t load the rows in primary key order.
• Updating the clustered index columns is expensive, because it forces InnoDB to move each updated row to a new location.
• Tables built upon clustered indexes are subject to page splits when new rows are inserted, or when a row’s primary key is updated such that the row must be moved.A page split happens when a row’s key value dictates that the row must be placed into a page that is full of data. The storage engine must split the page into two to
accommodate the row. Page splits can cause a table to use more space on disk.
• Clustered tables can be slower for full table scans, especially if rows are less densely packed or stored nonsequentially because of page splits.
• Secondary (nonclustered) indexes can be larger than you might expect, because their leaf nodes contain the primary key columns of the referenced rows.

• Secondary index accesses require two index lookups instead of one.

二、聚集索引(用innodb)与非聚集索引(用MyISAM)的区别

表结构

 CREATE TABLE layout_test (

 col1 int NOT NULL,

 col2 int NOT NULL,

 PRIMARY KEY(col1),

 KEY(col2)

 );

1.MyISAM的结构

In fact, in MyISAM, there is no structural difference between a primary key and any other index. A primary key is simply a unique, nonnullable index named PRIMARY .

2.Innodb的结构

At first glance, that might not look very different from Figure 5-5. But look again, and notice that this illustration shows the whole table, not just the index. Because the clustered index “is” the table in InnoDB, there’s no separate row storage as there is for MyISAM.

Each leaf node in the clustered index contains the primary key value, the transaction ID, and rollback pointer InnoDB uses for transactional and MVCC purposes, and the rest of the columns (in this case, col2 ). If the primary key is on a column prefix, InnoDB includes the full column value with the rest of the columns.

Also in contrast to MyISAM, secondary indexes are very different from clustered indexes in InnoDB. Instead of storing “row pointers,” InnoDB’s secondary index leaf nodes contain the primary key values, which serve as the “pointers” to the rows. This strategy reduces the work needed to maintain secondary indexes when rows move or
when there’s a data page split. Using the row’s primary key values as the pointer makes the index larger, but it means InnoDB can move a row without updating pointers to it.

区别总结：

(1)第一个重大区别是InnoDB的数据文件本身就是索引文件。从上文知道，MyISAM索引文件和数据文件是分离的，索引文件仅保存数据记录的地址。而在InnoDB中，表数据文件本身就是按B+Tree组织的一个索引结构，这棵树的叶节点data域保存了完整的数据记录。这个索引的key是数据表的主键，因此InnoDB表数据文件本身就是主索引。

(2)第二个与MyISAM索引的不同是InnoDB的辅助索引data域存储相应记录主键的值而不是地址。换句话说，InnoDB的所有辅助索引都引用主键作为data域。所以辅助索引搜索需要检索两遍索引：首先检索辅助索引获得主键，然后用主键到主索引中检索获得记录。

(3)聚集索引一个表只能有一个，而非聚集索引一个表可以存在多个(因为前者的叶子结点就包含数据，而后者是指针)

(4)聚集索引存储记录是物理上连续存在，而非聚集索引是逻辑上的连续，物理存储并不连续(不知道对不对)

三、用聚集索引时，primary key是否连续的影响

Notice that not only does it take longer to insert the rows with the UUID primary key,but the resulting indexes are quite a bit bigger. Some of that is due to the larger primary key, but some of it is undoubtedly due to page splits and resultant fragmentation as well.

2.主键是否连续为什么会有差别？

连续主键的插入

不连续主键的插入

插入不连续主键的缺点：

• The destination page might have been flushed to disk and removed from the caches,or might not have ever been placed into the caches, in which case InnoDB will have to find it and read it from the disk before it can insert the new row. This causes a lot of random I/O.
• When insertions are done out of order, InnoDB has to split pages frequently to make room for new rows. This requires moving around a lot of data, and modifying at least three pages instead of one.
• Pages become sparsely and irregularly filled because of splitting, so the final data is fragmented.

高性能MySQL笔记-第5章Indexing for High Performance-005聚集索引的更多相关文章

高性能MySQL笔记-第5章Indexing for High Performance-004怎样用索引才高效
一.怎样用索引才高效 1.隔离索引列 MySQL generally can’t use indexes on columns unless the columns are isolated in t ...
高性能MySQL笔记-第5章Indexing for High Performance-001B-Tree indexes(B+Tree)
一. 1.什么是B-Tree indexes? The general idea of a B-Tree is that all the values are stored in order, and ...
高性能MySQL笔记-第5章Indexing for High Performance-002Hash indexes
一. 1.什么是hash index A hash index is built on a hash table and is useful only for exact lookups that u ...
高性能MySQL笔记-第5章Indexing for High Performance-003索引的作用
一. 1. 1). Indexes reduce the amount of data the server has to examine.2). Indexes help the server av ...
高性能MySQL笔记第5章创建高性能的索引
索引(index),在MySQL中也被叫做键(key),是存储引擎用于快速找到记录的一种数据结构.索引优化是对查询性能优化最有效的手段. 5.1 索引基础索引的类型索引是在存储引擎层而 ...
高性能MySQL笔记第6章查询性能优化
6.1 为什么查询速度会慢查询的生命周期大致可按照顺序来看:从客户端,到服务器,然后在服务器上进行解析,生成执行计划,执行,并返回结果给客户端.其中“执行”可以认为是整个生命周期中最重要的阶段. ...
高性能MySQL笔记第4章 Schema与数据类型优化
4.1 选择优化的数据类型通用原则更小的通常更好前提是要确保没有低估需要存储的值范围:因为它占用更少的磁盘.内存.CPU缓存,并且处理时需要的CPU周期也更少. 简单就好简 ...
高性能MySQL笔记-第1章MySQL Architecture and History-001
1.MySQL架构图 2.事务的隔离性事务的隔离性是specific rules for which changes are and aren’t visible inside and outsid ...
高性能MySQL笔记-第4章Optimizing Schema and Data Types
1.Good schema design is pretty universal, but of course MySQL has special implementation details to ...

随机推荐

Python-正则表达式及实战小例子
注意Python的字符串本身也用'\'转义,所以要特别注意,一般我们都建议使用Python的r前缀,就不用考虑转义的问题了 1,行的起始例子:匹配‘cat’ 开头 patt=re.compile( ...
Java NIO阻塞式通信
package com.nio.t; import java.io.IOException; import java.net.InetSocketAddress; import java.nio.By ...
linux C gbk utf-8编码转换
http://blog.csdn.net/sealyao/article/details/5043138
程序员如何编写好开发技术文档如何编写优质的API文档工作
编写技术文档,是令众多开发者望而生畏的任务之一.它本身是一件费时费力才能做好的工作.可是大多数时候,人们却总是想抄抄捷径,这样做的结果往往非常令人遗憾的,因为优质的技术文档是决定你的项目是否引人关注的 ...
Linux U盘启动盘
/****************************************************************************** * Linux U盘启动盘 * 说明: ...
UVA - 11212 Editing a Book (IDA*)
给你一个长度为n(n<=9)的序列,每次可以将一段连续的子序列剪切到其他地方,问最少多少次操作能将序列变成升序. 本题最大的坑点在于让人很容易想到许多感觉挺正确但实际却不正确的策略来避开一些看似 ...
Hat’s Words（字典树的运用）
个人心得:通过这道题,对于树的运用又加深了一点,字典树有着他独特的特点,那个指针的一直转换着实让我好生想半天, 不得不佩服这些发明算法人的大脑. 这题的解决方法还是从网上找到的,还好算法是自己实现得, ...
「新手向」koa2从起步到填坑
前传出于兴趣最近开始研究koa2,由于之前有过一些express经验,以为koa还是很好上手的,但是用起来发现还是有些地方容易懵逼,因此整理此文,希望能够帮助到一些新人. 如果你不懂javascri ...
内存优化总结:ptmalloc、tcmalloc和jemalloc
概述需求系统的物理内存是有限的,而对内存的需求是变化的, 程序的动态性越强,内存管理就越重要,选择合适的内存管理算法会带来明显的性能提升.比如nginx, 它在每个连接accept后会malloc ...
GXT4.0 BorderLayoutContainer布局
T字型布局. @Override public void onModuleLoad() { BorderLayoutContainer con = new BorderLayoutContainer( ...

高性能MySQL笔记-第5章Indexing for High Performance-005聚集索引

高性能MySQL笔记-第5章Indexing for High Performance-005聚集索引的更多相关文章

随机推荐

热门专题