To pack or not to pack – MyISAM Key compression
MyISAM storage engine has key compression which makes its indexes much smaller, allowing better fit in caches and so improving performance dramatically. Actually packed indexes not a bit longer rows is frequent reason of MyISAM performing better than Innodb. In this article I’ll get in a bit more details about packed keys and performance implications it causes.
First lets see how key compression works. Key compression applies on the block boundaries – first index value is stored fully while for following value index will contain value of the bytes sames as in previous cases + the all bytes which are different. So lets say you have page starting with index values “aaaaaa” and “aaaaab”. They will be encoded as “aaaaaa”, “<5>b”. The similar compression applies to data pointer for the same key value. It also can be used for numbers as numbers stored with high byte first. This is obviously a bit of simplification but it should be enough to understand performance implications of this approach.
Note trailing spaces for strings are not stored in the index even if it is not packed.
Compressed blocks need to be treated differently. For uncompressed index blocks MySQL can do binary search inside the page – kind of jumping in the middle and looking at the key value and repeating it then repeating it for left or right half. For compressed block this is not going to work and MySQL will need to scan keyblock from the start uncompressing keys to find matching key value. This means the following will apply:
- Forward range/index scan will be fast (read_next) as MyISAM will remember previous position and will not uncompress block from the start to retrieve next value
- Reverse range/index scan will be slower (read_prev) as MyISAM will have to start bock uncompressing over to get previous value
- Random single row lookup by index (ie joins) will also suffer as key block needs to be uncompressed for each of these
- Larger pages means slower index lookup as you need to scan half the page in average to find the key value
So how you control which indexes are packed and which are not ? Unfortunately at this point you can’t set compression on per index basics. Neither you can set index block size for the keys. So you can’t look at your index and say “OK. This is going to be typically scanned forward, so I can pack it” The only settings available is PACK_KEYS=0|1|DEFAULT. Value 0 disables compression for all keys. Value 1 forces it for all keys. Value DEFAULT will use key compression for character column type but not for numerical and other column types.
Also according to my tests it looks there is something special in how integer primary key is handled. I can’t see the index size difference between DEFAULT and 1 values for such key. However it I change it to “key” the size reduces if PACK_KEYS=1. Might be it is some kind of special hack ![]()
The fact MyISAM will by default compress string keys but not integer keys is the biggest reason joins using string keys is so much slower compared to integer keys. Integer keys still best but for Innodb for example difference is not so large.
Let us try couple of benchmarks now. I’ve created simple table with 100.000 rows with columns. And I set id=c to keep it simple:
CREATE TABLE `t1` (
`id` int(10) unsigned NOT NULL ,
`c` varchar(20) NOT NULL default ”,
KEY `c` (`c`),
KEY `id` (`id`)
) ENGINE=MyISAM
Index size:
- PACK_KEYS=DEFAULT – 1550K
- PACK_KEYS=1 – 1453K
- PACK_KEYS=0 – 8176K
As we can see difference between 1 and DEFAULT is rather small for this case, as integers are relatively short.
Now lets do some benchmarks. I’m testing 4 types of queries:
- select count(*) from t1, t1 t2 where t1.c=t2.c - simple join by string column
- select count(*) from t1, t1 t2 where t1.id=t2.id - same but using id column
- select c from t1 where c like “%abc%” order by c limit 1 - forward index scan
- select c from t1 where c like “%abc%” order by c desc limit 1 – reverse index scan
- PACK_KEYS=DEFAULT Q1: 3.50s Q2: 0.35s Q3: 0.12s Q4: 1.26s
- PACK_KEYS=0 Q1: 1.25s Q2:0.40s Q3: 0.14s Q4: 0.16s
- PACK_KEYS=1 Q1: 3.50s Q2:1.55s Q3: 0.12s Q4: 1.26s
So we can see theory goes pretty much in hand with practice here. With DEFAULT configuration lookups for integer keys are fast and forward scans are. It is however slow to do reverse scan on the string index.
It is interesting to see PACK_KEYS=0 does not always improve performance even if data fits in memory. Take a look at Q3 for. The the reason perhaps is – smaller index side, which means less traffic to memory. With modern CPUs it might be actually faster to do a bit of uncompression than to traverse large memory areas. Reverse index scan performance got some 8 times faster and join on string value is almost 3 times faster.
With PACK_KEYS=1 we got join by integer key performing almost 5 times slower than it was – this is something really to watch out for.
Note – these numbers are for CPU bound workload. If your load is disk bound there are more reasons to have keys packed so cache hit is better. Especially if you can get hot indexes in memory by packing them it is almost no brainer
Summary:
- Key prefix compression is cool feature and may give great space savings.
- Defaults for MyISAM are good for most cases
- Avoid joining on strings, it is much slower anyway
- Watch for reverse index scans (see Handler_read_prev) these are much slower for packed keys
- Be careful with PACK_KEYS=1 as it can slow down your integer joins a lot
- Be careful with PACK_KEYS=0 as it can blow up your index size 10x times.
参考:
http://www.mysqlperformanceblog.com/2006/05/13/to-pack-or-not-to-pack-myisam-key-compression/
To pack or not to pack – MyISAM Key compression的更多相关文章
- #pragma pack(push) 和#pragma pack(pop) 以及#pragma pack()
我们知道结构体内存对齐字节可以通过#pragma pack(n) 的方式来指定. 但是,有没有想过一个问题,某些时候我想4字节对齐,有些时候我又想1字节或者8字节对齐,那么怎么解决这个问题呢? 此时, ...
- (转)MyISAM Key Cache详解及优化
原文:http://huanghualiang.blog.51cto.com/6782683/1372721 一.MyISAM Key Cache详解: 为了最小化磁盘I/O,MyISAM将最频繁访问 ...
- MyISAM Key Buffer 读/写/利用率(%) MylSAM平均每秒Key Buffer利用率(%) MylSAM平均每秒Key Buffer读命中率(%) MylSAM平均每秒Key Buffer写命中率(%)
MyISAM Key Buffer 读/写/利用率(%) MylSAM平均每秒Key Buffer利用率(%)MylSAM平均每秒Key Buffer读命中率(%)MylSAM平均每秒Key Buff ...
- 详解C/C++中的的:#pragma pack(push) 、#pragma pack(pop) 和#pragma pack()
前言 我们知道结构体内存对齐字节可以通过#pragma pack(n) 的方式来指定. 但是,有没有想过一个问题,某些时候我想4字节对齐,有些时候我又想1字节或者8字节对齐,那么怎么解决这个问题呢? ...
- PHP: 深入pack/unpack
https://my.oschina.net/goal/blog/195749 PHP作为一门为web而生的服务器端开发语言,被越来越多的公司所采用.其中不乏大公司,如腾迅.盛大.淘米.新浪等.在对性 ...
- Python学习——struct模块的pack、unpack示例
he struct module includes functions for converting between strings of bytes and native Python data t ...
- PHP: 深入pack/unpack 字节序
http://my.oschina.net/goal/blog/195749?p=1 目录[-] 写在前面的话 什么是字节序 MSB和LSB 大端序 小端序 网络字节序 主机字节序 总结 pack/u ...
- php pack、unpack、ord 函数使用方法
string pack ( string $format [, mixed $args [, mixed $... ]] ) Pack given arguments into a binary st ...
- [转]PHP: 深入pack/unpack
From : http://my.oschina.net/goal/blog/195749 http://www.w3school.com.cn/php/func_misc_pack.asp PHP作 ...
随机推荐
- 如何编写 Python 程序
如何编写 Python 程序 从今以后,保存和运行 Python 程序的标准步骤如下: 对于 PyCharm 用户 打开 PyCharm. 以给定的文件名创建新文件. 输入案例中给出的代码. 右键并运 ...
- 【system.array】使用说明
对象:system.array 说明:提供一系列针对数组类型的操作 目录: 方法 返回 说明 system.array.join( array, separator ) [String] 将数组转换 ...
- python终极篇 ---django 模板系统
模板系统 . MV ...
- 【Random】-随机数字-jmeter
参数化 Random 参数化,存储结果的变量名,名字写了,就可以给其它请求使用
- token接口的测法
接口一般都有权限的校验,一般是需要登录后才可以调用 对于接口的认证,一般通过两种方式来实现1.校验用户请求中是否包含某项指定的cookie2.校验用户的请求的header中是否包含某项指定的字段(to ...
- ubuntu networking 与 network-manager
刚遇到的坑,因为操作不当导致网络中断,于是手动配置了/etc/network/interfaces , 修复了系统之后发现ubuntu-desktop中的有线链接不见了,百度了一下说是networki ...
- 【shell 练习3】用户管理脚本(一)
一.创建十个用户,密码为八位 [root@localhost ~]# cat UserManger02.sh #!/bin/bash . /etc/init.d/functions [ $UID -n ...
- 基于Kubernetes(k8s)的RabbitMQ 集群
目前,有很多种基于Kubernetes搭建RabbitMQ集群的解决方案.今天笔者今天将要讨论我们在Fuel CCP项目当中所采用的方式.这种方式加以转变也适用于搭建RabbitMQ集群的一般方法.所 ...
- 收割大厂offer需要具备的条件
转载出处 本人也一直在关注互联网,觉得还是有些了解.互联网要求是越来越高了,竞争的人太多了,不过你不用担心,个人觉得,你到了中层的水平,拿二线offer应该没问题,人多也有人多的好处,我比别人多努力一 ...
- C++计算器项目的初始部分(第三次作业)
C++计算器项目的初始部分 项目源文件地址:calculator 项目信息: * 项目名称:Calculator * 项目实现: * 对四则运算表达式进行拆解 * 对拆解的表达式进行简单的错误判断 * ...