To pack or not to pack – MyISAM Key compression

MyISAM storage engine has key compression which makes its indexes much smaller, allowing better fit in caches and so improving performance dramatically. Actually packed indexes not a bit longer rows is frequent reason of MyISAM performing better than Innodb. In this article I’ll get in a bit more details about packed keys and performance implications it causes.

First lets see how key compression works. Key compression applies on the block boundaries – first index value is stored fully while for following value index will contain value of the bytes sames as in previous cases + the all bytes which are different. So lets say you have page starting with index values “aaaaaa” and “aaaaab”. They will be encoded as “aaaaaa”, “<5>b”. The similar compression applies to data pointer for the same key value. It also can be used for numbers as numbers stored with high byte first. This is obviously a bit of simplification but it should be enough to understand performance implications of this approach.

Note trailing spaces for strings are not stored in the index even if it is not packed.

Compressed blocks need to be treated differently. For uncompressed index blocks MySQL can do binary search inside the page – kind of jumping in the middle and looking at the key value and repeating it then repeating it for left or right half. For compressed block this is not going to work and MySQL will need to scan keyblock from the start uncompressing keys to find matching key value. This means the following will apply:

Forward range/index scan will be fast (read_next) as MyISAM will remember previous position and will not uncompress block from the start to retrieve next value
Reverse range/index scan will be slower (read_prev) as MyISAM will have to start bock uncompressing over to get previous value
Random single row lookup by index (ie joins) will also suffer as key block needs to be uncompressed for each of these
Larger pages means slower index lookup as you need to scan half the page in average to find the key value

So how you control which indexes are packed and which are not ? Unfortunately at this point you can’t set compression on per index basics. Neither you can set index block size for the keys. So you can’t look at your index and say “OK. This is going to be typically scanned forward, so I can pack it” The only settings available is PACK_KEYS=0|1|DEFAULT. Value 0 disables compression for all keys. Value 1 forces it for all keys. Value DEFAULT will use key compression for character column type but not for numerical and other column types.

Also according to my tests it looks there is something special in how integer primary key is handled. I can’t see the index size difference between DEFAULT and 1 values for such key. However it I change it to “key” the size reduces if PACK_KEYS=1. Might be it is some kind of special hack
The fact MyISAM will by default compress string keys but not integer keys is the biggest reason joins using string keys is so much slower compared to integer keys. Integer keys still best but for Innodb for example difference is not so large.

Let us try couple of benchmarks now. I’ve created simple table with 100.000 rows with columns. And I set id=c to keep it simple:

CREATE TABLE `t1` (
`id` int(10) unsigned NOT NULL ,
`c` varchar(20) NOT NULL default ”,
KEY `c` (`c`),
KEY `id` (`id`)
) ENGINE=MyISAM

Index size:

PACK_KEYS=DEFAULT – 1550K
PACK_KEYS=1 – 1453K
PACK_KEYS=0 – 8176K

As we can see difference between 1 and DEFAULT is rather small for this case, as integers are relatively short.

Now lets do some benchmarks. I’m testing 4 types of queries:

select count(*) from t1, t1 t2 where t1.c=t2.c - simple join by string column
select count(*) from t1, t1 t2 where t1.id=t2.id - same but using id column
select c from t1 where c like “%abc%” order by c limit 1 - forward index scan
select c from t1 where c like “%abc%” order by c desc limit 1 – reverse index scan

PACK_KEYS=DEFAULT Q1: 3.50s Q2: 0.35s Q3: 0.12s Q4: 1.26s
PACK_KEYS=0 Q1: 1.25s Q2:0.40s Q3: 0.14s Q4: 0.16s
PACK_KEYS=1 Q1: 3.50s Q2:1.55s Q3: 0.12s Q4: 1.26s

So we can see theory goes pretty much in hand with practice here. With DEFAULT configuration lookups for integer keys are fast and forward scans are. It is however slow to do reverse scan on the string index.

It is interesting to see PACK_KEYS=0 does not always improve performance even if data fits in memory. Take a look at Q3 for. The the reason perhaps is – smaller index side, which means less traffic to memory. With modern CPUs it might be actually faster to do a bit of uncompression than to traverse large memory areas. Reverse index scan performance got some 8 times faster and join on string value is almost 3 times faster.
With PACK_KEYS=1 we got join by integer key performing almost 5 times slower than it was – this is something really to watch out for.

Note – these numbers are for CPU bound workload. If your load is disk bound there are more reasons to have keys packed so cache hit is better. Especially if you can get hot indexes in memory by packing them it is almost no brainer

Summary:

Key prefix compression is cool feature and may give great space savings.
Defaults for MyISAM are good for most cases
Avoid joining on strings, it is much slower anyway
Watch for reverse index scans (see Handler_read_prev) these are much slower for packed keys
Be careful with PACK_KEYS=1 as it can slow down your integer joins a lot
Be careful with PACK_KEYS=0 as it can blow up your index size 10x times.

参考:

http://www.mysqlperformanceblog.com/2006/05/13/to-pack-or-not-to-pack-myisam-key-compression/

To pack or not to pack – MyISAM Key compression的更多相关文章

#pragma pack(push) 和#pragma pack(pop) 以及#pragma pack()
我们知道结构体内存对齐字节可以通过#pragma pack(n) 的方式来指定. 但是,有没有想过一个问题,某些时候我想4字节对齐,有些时候我又想1字节或者8字节对齐,那么怎么解决这个问题呢? 此时, ...
(转)MyISAM Key Cache详解及优化
原文:http://huanghualiang.blog.51cto.com/6782683/1372721 一.MyISAM Key Cache详解: 为了最小化磁盘I/O,MyISAM将最频繁访问 ...
MyISAM Key Buffer 读/写/利用率(%) MylSAM平均每秒Key Buffer利用率(%) MylSAM平均每秒Key Buffer读命中率(%) MylSAM平均每秒Key Buffer写命中率(%)
MyISAM Key Buffer 读/写/利用率(%) MylSAM平均每秒Key Buffer利用率(%)MylSAM平均每秒Key Buffer读命中率(%)MylSAM平均每秒Key Buff ...
详解C/C++中的的：#pragma pack(push) 、#pragma pack(pop) 和#pragma pack()
前言我们知道结构体内存对齐字节可以通过#pragma pack(n) 的方式来指定. 但是,有没有想过一个问题,某些时候我想4字节对齐,有些时候我又想1字节或者8字节对齐,那么怎么解决这个问题呢? ...
PHP: 深入pack/unpack
https://my.oschina.net/goal/blog/195749 PHP作为一门为web而生的服务器端开发语言,被越来越多的公司所采用.其中不乏大公司,如腾迅.盛大.淘米.新浪等.在对性 ...
Python学习——struct模块的pack、unpack示例
he struct module includes functions for converting between strings of bytes and native Python data t ...
PHP: 深入pack/unpack 字节序
http://my.oschina.net/goal/blog/195749?p=1 目录[-] 写在前面的话什么是字节序 MSB和LSB 大端序小端序网络字节序主机字节序总结 pack/u ...
php pack、unpack、ord 函数使用方法
string pack ( string $format [, mixed $args [, mixed $... ]] ) Pack given arguments into a binary st ...
[转]PHP: 深入pack/unpack
From : http://my.oschina.net/goal/blog/195749 http://www.w3school.com.cn/php/func_misc_pack.asp PHP作 ...

随机推荐

tpo-10 C1 How to get photographs exhibited
第 1 段 1.Listen to a conversation between a student and her Photography professor. 听一段学生和摄影学教授的对话. 第 ...
css多行文本溢出显示省略号(…)
text-overflow:ellipsis属性可以实现单行文本的溢出显示省略号(…).但部分浏览器还需要加宽度width属性. css代码: overflow: hidden; text-overf ...
Java Web开发框架Spring+Hibernate整合效果介绍（附源码）（已过期，有更好的）
最近花了一些时间整合了一个SpringMVC+springAOP+spring security+Hibernate的一套框架,之前只专注于.NET的软件架构设计,并没有接触过Java EE,好在有经 ...
CryptoZombies学习笔记——Lesson4
第四课主要介绍payable函数相关. chapter1: payable修饰函数以太坊允许同时调用函数和eth转账.msg.value显示发送到合约的以太币数,ether是内置整型数.如果函数没有 ...
一个改变this指向bind的函数，vue源代码
function bind(fn, ctx) { return function (a) { var l = arguments.length; return l ? l > 1 ? fn.ap ...
20172305 2018-2019-1 《Java软件结构与数据结构》第一周学习总结
20172305 2018-2019-1 <Java软件结构与数据结构>第一周学习总结教材学习内容总结本周内容主要为书第一章和第二章的内容: 第一章软件质量: 正确性(软件达到特定需 ...
内存转储文件调试系统崩溃bug
百度百科:内存转储文件内存转储是用于系统崩溃时,将内存中的数据转储保存在转储文件中,供给有关人员进行排错分析用途.而它所保存生成的文件就叫做内存转储文件. 内存转储文件也被称作虚拟内存,它是用硬盘里 ...
在linux下如何显示隐藏文件
#显示所有文件(包含隐藏文件)ls -a #只显示隐藏文件l.或者ls -d .* #在XWindow的KDE桌面中在"查看(View)"菜单里选"显示隐藏文件(Show ...
shell基础练习题讲解
1037774765 克隆 1.创建一个用户redhat,其ID号为1001,基本组为like(组ID为2002),附近租为linux. groupadd -g 2002 likegroupadd l ...
Flink的序列化与flink-hadoop-compatibility
最近用户提交了一个问题说他的jar包里明明包含相关的类型但是在提交Flink作业的时候却报出classnotfound的错误查看之后发现这里是flink的一个没有说的太明白的地方用户的代 ...

To pack or not to pack – MyISAM Key compression

To pack or not to pack – MyISAM Key compression的更多相关文章

随机推荐

热门专题