What does working with large data sets in mySQL teach you ? Of course you have to learn a lot about query optimization, art of building summary tables and tricks of executing queries exactly as you want. I already wrote about development and configuration side of the problem so I will not go to details again.

Two great things you’ve got to learn when working with large data things in MySQL is patience and careful planning. Both of which relate two single property of large data sets – it can take hell a lot of time to deal with. This may sound obvious if you have some large data set experience but it is not the case for many people – I constantly run into the customers assuming it will be quick to rearrange their database or even restore from backup.

You need Patience simply because things are going to take a lot of time. Think about 500GB table for example – ALTER TABLE make take days or even weeks depending on your storage engine and set of indexes, Batch Jobs can take quite similar time. Binary Backup and restore will be faster but it can still take hours especially if database is already loaded. So operating wit such large databases you need to be patient and learn to have bunch of tasks running in the background while you’re doing something else.

You need Careful Planning because if you do not plan things properly you easily get into the trouble as well as because you can’t often use simple “online” solutions but have to do more complicated things instead. You typically can’t simply run ALTER TABLE because table will stay locked for too long you would need to do careful process of ALTERing table on the slave and switching roles or some other techniques. You can’t run many simple reporting queries because for MyISAM they will lock tables for very long time and Innodb can get too many old row versions to deal with which can slow down some queries considerably. You need to be planning for your handling of crashed MyISAM after power failure as check and repair may take long hours (this is indeed one of the big reasons to use Innodb even if you do not care about Table Locks or transactions).

Besides these various trips and gotchas you simply need to plan carefully how you’re going to alter your database because it takes a lot of time and may require waiting for maintainance window or bringing the site down. If you have tiny 1GB table you pretty much can use trial and error approach even for production – found some bad queries fixed them by adding indexes and got back to fix some more. For large data sets this does not work and you really need to have some playground with smaller data sets to play with different schema designs and index structures… this however results in the challenge as results you’ve gotten for small data set may not apply to large data set so you need to re-test your “final design” again with large set.

One thing I often find people miscount is assuming data management operations will be proportional to the database size. Say it takes 30 minutes to alter 10GB table so it will take 5 hours to alter 100GB one. It can be close to that if you’re lucky but it can be much much slower if you’re not. Many operations require certain size of table to fit in memory for decent performance. Typically it would be some portion of Index BTREE (even MyISAM which builds “normal” indexes by sort builds primary key and unique indexes using keycache) If it does not performance may drop performance order of magnitude.

This is actually one of the reasons I try to keep data in smaller tables whenever possible. But this is something I’ve written about in another article.

参考:

http://www.mysqlperformanceblog.com/2007/07/05/working-with-large-data-sets-in-mysql/

Working with large data sets in MySQL的更多相关文章

  1. Result window is too large, from + size must be less than or equal to: [10000] but was [78440]. See the scroll api for a more efficient way to request large data sets

    {"error":{"root_cause":[{"type":"query_phase_execution_exception& ...

  2. 最大信息系数(MIC)——Detecting Novel Associations in Large Data Sets

    本文介绍了一种发现两个随机变量之间依赖关系强度的度量MIC(最大信息系数,类似于相关系数的作用).MIC具有以下性质和优势: MIC度量具有普适性.其不仅可以发现变量间的线性函数关系,还能发现非线性函 ...

  3. -- Warning: Skipping the data of table mysql.event. Specify the --events option explicitly.

    [root@DB ~]# mysqldump -uroot -p123 --flush-logs --all-databases >fullbackup_sunday_11_PM.sql -- ...

  4. How to change data dir of mysql?

    # 1 copy orgin data dir of mysql to new one cp -R /var/lib/mysql /mysqldata chown mysql:mysql -R /my ...

  5. 无法为具有固定名称“MySql.Data.MySqlClient”的 ADO.NET 提供程序加载在应用程序配置文件中注册的实体框架提供程序类型“MySql.Data.MySqlClient.MySqlProviderServices,MySql.Data.Entity.EF6”

    "System.InvalidOperationException"类型的未经处理的异常在 mscorlib.dll 中发生 其他信息: 无法为具有固定名称"MySql. ...

  6. Spring Boot使用Spring Data Jpa对MySQL数据库进行CRUD操作

    只需两步!Eclipse+Maven快速构建第一个Spring Boot项目 构建了第一个Spring Boot项目. Spring Boot连接MySQL数据库 连接了MySQL数据库. 本文在之前 ...

  7. Spring Boot (五)Spring Data JPA 操作 MySQL 8

    一.Spring Data JPA 介绍 JPA(Java Persistence API)Java持久化API,是 Java 持久化的标准规范,Hibernate是持久化规范的技术实现,而Sprin ...

  8. Data Provider 中没有.net framework Data provider for Mysql 的解决方法

    近来做的一个项目中,数据库用的是 MySql, 而在项目使用 Entity Data Model 来做数据服务层,可是在项目中添加 Data Entty Model 时,一般我们都会选择从数据库中直接 ...

  9. How To Build Compelling Stories From Your Data Sets

    How To Build Compelling Stories From Your Data Sets Every number has a story. As a data scientist, y ...

随机推荐

  1. haproxy + keepalived 实现高可用负载均衡集群

    1. 首先准备两台tomcat机器,作为集群的单点server. 第一台: 1)tomcat,需要Java的支持,所以同样要安装Java环境. 安装非常简单. tar xf  jdk-7u65-lin ...

  2. css常用样式属性详细介绍

    对于初学css的来说,肯定会觉得这么多样式不好记,而且记住了也容易忘,其实刚开始我们不用去记这么多的样式,确实是记了也会忘,刚开始只需记住一些常用的就可以了,然后在慢慢的使用过程当中接触并学习一些高级 ...

  3. C语言实例解析精粹学习笔记——31

    实例31: 判断字符串是否是回文 思路解析: 引入两个指针变量(head和tail),开始时,两指针分别指向字符串的首末字符,当两指针所指字符相等时,两指针分别向后和向前移动一个字符位置,并继续比较, ...

  4. LeetCode:24. Swap Nodes in Pairs(Medium)

    1. 原题链接 https://leetcode.com/problems/swap-nodes-in-pairs/description/ 2. 题目要求 给定一个链表,交换相邻的两个结点.已经交换 ...

  5. C++各种类型的简单排序大汇总~

    啊,排序的技能点也太多了吧!!!LITTLESUN快要**在排序的技能场了啊!(划掉)经历了两天48小时2880分钟172800秒的艰苦奋斗,终于终于终于学的差不多了!明天就可以去打排序的小怪喽!(撒 ...

  6. shell -- for、while用法

    #数字段形式for i in {1..10}do   echo $idone #详细列出(字符且项数不多)for File in 1 2 3 4 5do    echo $Filedone #对存在的 ...

  7. MySQL数据库服务器逐渐变慢分析

    第一步 检查系统的状态 1.1 使用sar来检查操作系统是否存在IO问题 #sar -u 2 10 — 即每隔2秒检察一次,共执行20次. [root@CacheMemCache tester]# s ...

  8. 通过repcached实现memcached主从复制

    一.环境 服务器A:ubuntu server 12.04(192.168.1.111) 服务器B:ubuntu server 12.04 (47.50.13.111) 二.memcached安装 s ...

  9. jmeter实例,如果有说明错误,请各位大神批评

    首先我们打开jmeter,今天录制的脚本的是获取QQ头像,找了好久才找到可以免费试用的接口,如果有什么错误的地方,欢迎大家提出来,我会及时修改,也给自己一次进步的机会,希望大家不吝赐教!!!如果有什么 ...

  10. Spark性能超过Hadoop百倍

    Spark在偷换概念,Hadoop跑硬盘,Spark跑内存,地球人都知道,内存的速度可是远超硬盘一个量级,超过100倍又有什么奇怪的.如果要比,咱们都拿硬盘来跑跑看!