MyISAM storage engine has key compression which makes its indexes much smaller, allowing better fit in caches and so improving performance dramatically. Actually packed indexes not a bit longer rows is frequent reason of MyISAM performing better than Innodb. In this article I’ll get in a bit more details about packed keys and performance implications it causes.

First lets see how key compression works. Key compression applies on the block boundaries – first index value is stored fully while for following value index will contain value of the bytes sames as in previous cases + the all bytes which are different. So lets say you have page starting with index values “aaaaaa” and “aaaaab”. They will be encoded as “aaaaaa”, “<5>b”. The similar compression applies to data pointer for the same key value. It also can be used for numbers as numbers stored with high byte first. This is obviously a bit of simplification but it should be enough to understand performance implications of this approach.

Note trailing spaces for strings are not stored in the index even if it is not packed.

Compressed blocks need to be treated differently. For uncompressed index blocks MySQL can do binary search inside the page – kind of jumping in the middle and looking at the key value and repeating it then repeating it for left or right half. For compressed block this is not going to work and MySQL will need to scan keyblock from the start uncompressing keys to find matching key value. This means the following will apply:

  • Forward range/index scan will be fast (read_next) as MyISAM will remember previous position and will not uncompress block from the start to retrieve next value
  • Reverse range/index scan will be slower (read_prev) as MyISAM will have to start bock uncompressing over to get previous value
  • Random single row lookup by index (ie joins) will also suffer as key block needs to be uncompressed for each of these
  • Larger pages means slower index lookup as you need to scan half the page in average to find the key value

So how you control which indexes are packed and which are not ? Unfortunately at this point you can’t set compression on per index basics. Neither you can set index block size for the keys. So you can’t look at your index and say “OK. This is going to be typically scanned forward, so I can pack it” The only settings available is PACK_KEYS=0|1|DEFAULT. Value 0 disables compression for all keys. Value 1 forces it for all keys. Value DEFAULT will use key compression for character column type but not for numerical and other column types.

Also according to my tests it looks there is something special in how integer primary key is handled. I can’t see the index size difference between DEFAULT and 1 values for such key. However it I change it to “key” the size reduces if PACK_KEYS=1. Might be it is some kind of special hack 
The fact MyISAM will by default compress string keys but not integer keys is the biggest reason joins using string keys is so much slower compared to integer keys. Integer keys still best but for Innodb for example difference is not so large.

Let us try couple of benchmarks now. I’ve created simple table with 100.000 rows with columns. And I set id=c to keep it simple:

CREATE TABLE `t1` (
`id` int(10) unsigned NOT NULL ,
`c` varchar(20) NOT NULL default ”,
KEY `c` (`c`),
KEY `id` (`id`)
) ENGINE=MyISAM

Index size:

  • PACK_KEYS=DEFAULT – 1550K
  • PACK_KEYS=1 – 1453K
  • PACK_KEYS=0 – 8176K

As we can see difference between 1 and DEFAULT is rather small for this case, as integers are relatively short.

Now lets do some benchmarks. I’m testing 4 types of queries:

  1. select count(*) from t1, t1 t2 where t1.c=t2.c - simple join by string column
  2. select count(*) from t1, t1 t2 where t1.id=t2.id - same but using id column
  3. select c from t1 where c like “%abc%” order by c limit 1 - forward index scan
  4. select c from t1 where c like “%abc%” order by c desc limit 1 – reverse index scan
  • PACK_KEYS=DEFAULT Q1: 3.50s Q2: 0.35s Q3: 0.12s Q4: 1.26s
  • PACK_KEYS=0 Q1: 1.25s Q2:0.40s Q3: 0.14s Q4: 0.16s
  • PACK_KEYS=1 Q1: 3.50s Q2:1.55s Q3: 0.12s Q4: 1.26s

So we can see theory goes pretty much in hand with practice here. With DEFAULT configuration lookups for integer keys are fast and forward scans are. It is however slow to do reverse scan on the string index.

It is interesting to see PACK_KEYS=0 does not always improve performance even if data fits in memory. Take a look at Q3 for. The the reason perhaps is – smaller index side, which means less traffic to memory. With modern CPUs it might be actually faster to do a bit of uncompression than to traverse large memory areas. Reverse index scan performance got some 8 times faster and join on string value is almost 3 times faster.
With PACK_KEYS=1 we got join by integer key performing almost 5 times slower than it was – this is something really to watch out for.

Note – these numbers are for CPU bound workload. If your load is disk bound there are more reasons to have keys packed so cache hit is better. Especially if you can get hot indexes in memory by packing them it is almost no brainer

Summary:

    • Key prefix compression is cool feature and may give great space savings.
    • Defaults for MyISAM are good for most cases
    • Avoid joining on strings, it is much slower anyway
    • Watch for reverse index scans (see Handler_read_prev) these are much slower for packed keys
    • Be careful with PACK_KEYS=1 as it can slow down your integer joins a lot
    • Be careful with PACK_KEYS=0 as it can blow up your index size 10x times.

参考:

http://www.mysqlperformanceblog.com/2006/05/13/to-pack-or-not-to-pack-myisam-key-compression/

To pack or not to pack – MyISAM Key compression的更多相关文章

  1. #pragma pack(push) 和#pragma pack(pop) 以及#pragma pack()

    我们知道结构体内存对齐字节可以通过#pragma pack(n) 的方式来指定. 但是,有没有想过一个问题,某些时候我想4字节对齐,有些时候我又想1字节或者8字节对齐,那么怎么解决这个问题呢? 此时, ...

  2. (转)MyISAM Key Cache详解及优化

    原文:http://huanghualiang.blog.51cto.com/6782683/1372721 一.MyISAM Key Cache详解: 为了最小化磁盘I/O,MyISAM将最频繁访问 ...

  3. MyISAM Key Buffer 读/写/利用率(%) MylSAM平均每秒Key Buffer利用率(%) MylSAM平均每秒Key Buffer读命中率(%) MylSAM平均每秒Key Buffer写命中率(%)

    MyISAM Key Buffer 读/写/利用率(%) MylSAM平均每秒Key Buffer利用率(%)MylSAM平均每秒Key Buffer读命中率(%)MylSAM平均每秒Key Buff ...

  4. 详解C/C++中的的:#pragma pack(push) 、#pragma pack(pop) 和#pragma pack()

    前言 我们知道结构体内存对齐字节可以通过#pragma pack(n) 的方式来指定. 但是,有没有想过一个问题,某些时候我想4字节对齐,有些时候我又想1字节或者8字节对齐,那么怎么解决这个问题呢? ...

  5. PHP: 深入pack/unpack

    https://my.oschina.net/goal/blog/195749 PHP作为一门为web而生的服务器端开发语言,被越来越多的公司所采用.其中不乏大公司,如腾迅.盛大.淘米.新浪等.在对性 ...

  6. Python学习——struct模块的pack、unpack示例

    he struct module includes functions for converting between strings of bytes and native Python data t ...

  7. PHP: 深入pack/unpack 字节序

    http://my.oschina.net/goal/blog/195749?p=1 目录[-] 写在前面的话 什么是字节序 MSB和LSB 大端序 小端序 网络字节序 主机字节序 总结 pack/u ...

  8. php pack、unpack、ord 函数使用方法

    string pack ( string $format [, mixed $args [, mixed $... ]] ) Pack given arguments into a binary st ...

  9. [转]PHP: 深入pack/unpack

    From : http://my.oschina.net/goal/blog/195749 http://www.w3school.com.cn/php/func_misc_pack.asp PHP作 ...

随机推荐

  1. leetcode-零钱兑换—int溢出

     零钱兑换 给定不同面额的硬币 coins 和一个总金额 amount.编写一个函数来计算可以凑成总金额所需的最少的硬币个数.如果没有任何一种硬币组合能组成总金额,返回 -1. 示例 1: 输入: c ...

  2. 「雅礼集训 2017 Day1」市场 (线段树除法,区间最小,区间查询)

    老师说,你们暴力求除法也整不了多少次就归一了,暴力就好了(应该只有log(n)次) 于是暴力啊暴力,结果我归天了. 好吧,在各种题解的摧残下,我终于出了一篇巨好看(chou lou)代码(很多结构体党 ...

  3. eos智能合约开发最佳实践

    安全问题 1.可能的错误 智能合约终止 限制转账限额 限制速率 有效途径来进行bug修复和提升 2.谨慎发布智能合约 对智能合约进行彻底的测试 并在任何新的攻击手法被发现后及时制止 赏金计划和审计合约 ...

  4. Python3 Tkinter-Grid

    1.创建 from tkinter import * root=Tk() lb1=Label(root,text='Hello') lb2=Label(root,text='Grid') lb1.gr ...

  5. [leetcode-663-Equal Tree Partition]

    Given a binary tree with n nodes, your task is to check if it's possible to partition the tree to tw ...

  6. var,let,const,三种申明变量的整理

    javascript,正在慢慢变成一个工业级语言,势力慢慢渗透ios,安卓,后台 首先let,是局部变量,块级作用域:var全局的,const是常量,也就是只读的: 一行demo说明 for (var ...

  7. Android 上实现非root的 Traceroute -- 非Root权限下移植可执行二进制文件 脚本文件

    作者 : 万境绝尘 转载请著名出处 : http://blog.csdn.net/shulianghan/article/details/36438365 示例代码下载 : -- CSDN : htt ...

  8. PAT1024 强行用链表写了一发。

    主要的思想还是 上课的那个PPT上面的 链表反转的思想. 然后加一点七七八八的 递推. 一层一层往下翻转就好啦. 1A 真开心. 代码:http://paste.ubuntu.net/16506799 ...

  9. .getClass()和.class的区别

    一直在想.class和.getClass()的区别,思索良久,有点思绪,然后有网上搜了搜,找到了如下的一篇文章,与大家分享. 原来为就是涉及到java的反射----- Java反射学习 所谓反射,可以 ...

  10. Divide two integers without using multiplication, division and mod operator.

    描述 不能使用乘法.除法和取模(mod)等运算,除开两个数得到结果,如果内存溢出则返回Integer类型的最大值.解释一下就是:输入两个数,第一个数是被除数dividend,第二个是除数divisor ...