LOAD DATA INFILE – performance case study
转:
http://venublog.com/2007/11/07/load-data-infile-performance/
I often noticed that people complain about the LOAD DATA performance when loading the table with large number of rows of data. Even today I saw a case where the LOAD DATA on a simple 3 column table with about 5 million rows taking ~15 minutes of time. This is because the server did not had any tuning in regards to bulk insertion.
Consider the following simple MyISAM table on Redhat Linux 32-bit.
Shell
|
1
2
3
4
5
6
7
8
|
CREATE TABLE load1 (
`col1` varchar(100) NOT NULL default '',
`col2` int(11) default NULL,
`col3` char(1) default NULL,
PRIMARY KEY (`col1`)
) TYPE=MyISAM;
|
The table has a string key column. Here is the data file(download here) that I used it for testing:
Shell
|
1
2
3
4
5
6
7
|
[vanugant@escapereply:t55 tmp]$ wc loaddata.csv
5164946 5164946 227257389 loaddata.csv
[vanugant@escapereply:t55 tmp]$ ls -alh loaddata.csv
-rw-r--r-- 1 vanugant users 217M Nov 6 14:42 loaddata.csv
[vanugant@escapereply:t55 tmp]$
|
Here is the default mysql system variables related to LOAD DATA:
Shell
|
1
2
3
4
5
6
7
8
9
10
|
mysql> show variables;
+-------------------------+---------+
| Variable_name | Value |
+-------------------------+---------+
| bulk_insert_buffer_size | 8388608 |
| myisam_sort_buffer_size | 16777216 |
| key_buffer_size | 33554432 |
+-------------------------+----------+
|
and here is the actual LOAD DATA query to load all ~5m rows (~256M of data) to the table and its timing.
Shell
|
1
2
3
4
5
|
mysql> LOAD DATA INFILE '/home/vanugant/tmp/loaddata.csv' IGNORE INTO TABLE load1 FIELDS TERMINATED BY ',';
Query OK, 4675823 rows affected (14 min 56.84 sec)
Records: 5164946 Deleted: 0 Skipped: 489123 Warnings: 0
|
Now, lets experiment by disabling the keys in the table before running the LOAD DATA:
Shell
|
1
2
3
4
5
6
7
8
9
10
11
|
mysql> SET SESSION BULK_INSERT_BUFFER_SIZE=314572800;
Query OK, 0 rows affected (0.00 sec)
mysql> alter table load1 disable keys;
Query OK, 0 rows affected (0.00 sec)
mysql> LOAD DATA INFILE '/home/vanugant/tmp/loaddata.csv' IGNORE INTO TABLE load1 FIELDS TERMINATED BY ',';
Query OK, 4675823 rows affected (13 min 47.50 sec)
Records: 5164946 Deleted: 0 Skipped: 489123 Warnings: 0
|
No use, just 1% increase or same…., now lets set the real MyISAM values… and try again…
Shell
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
|
mysql> SET SESSION BULK_INSERT_BUFFER_SIZE=256217728;
Query OK, 0 rows affected (0.00 sec)
mysql> set session MYISAM_SORT_BUFFER_SIZE=256217728;
Query OK, 0 rows affected (0.00 sec)
mysql> set global KEY_BUFFER_SIZE=256217728;
Query OK, 0 rows affected (0.05 sec)
mysql> alter table load1 disable keys;
Query OK, 0 rows affected (0.00 sec)
mysql> LOAD DATA INFILE '/home/vanugant/tmp/loaddata.csv' IGNORE INTO TABLE load1 FIELDS TERMINATED BY ',';
Query OK, 4675823 rows affected (1 min 55.05 sec)
Records: 5164946 Deleted: 0 Skipped: 489123 Warnings: 0
mysql> alter table load1 enable keys;
Query OK, 0 rows affected (0.00 sec)
|
Wow…thats almost 90% increase in the performance. So, disabling the keys in MyISAM is not just the key, but tuning the buffer size does play role based on the input data.
For the same case with Innodb, here is the status by adjusting the Innodb_buffer_pool_size=1G andInnodb_log_file_size=256M along with innodb_flush_logs_at_trx_commit=1.
Shell
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
mysql> show variables like '%innodb%size';
+---------------------------------+------------+
| Variable_name | Value |
+---------------------------------+------------+
| innodb_additional_mem_pool_size | 26214400 |
| innodb_buffer_pool_size | 1073741824 |
| innodb_log_buffer_size | 8388608 |
| innodb_log_file_size | 268435456 |
+---------------------------------+------------+
mysql> LOAD DATA INFILE '/home/vanugant/tmp/loaddata.csv' IGNORE INTO TABLE load1 FIELDS TERMINATED BY ',';
Query OK, 4675823 rows affected (2 min 37.53 sec)
Records: 5164946 Deleted: 0 Skipped: 489123 Warnings: 0
|
With innodb_flush_logs_at_trx_commit=2, innodb_flush_method=O_DIRECT and innodb_doublewrite=0; it will be another 40% difference (use all these variables with caution, unless you know what you are doing)
Shell
|
1
2
3
4
5
|
mysql> LOAD DATA INFILE '/home/vanugant/tmp/loaddata.csv' IGNORE INTO TABLE load1 FIELDS TERMINATED BY ',';
Query OK, 4675823 rows affected (1 min 53.69 sec)
Records: 5164946 Deleted: 0 Skipped: 489123 Warnings: 0
|
LOAD DATA INFILE – performance case study的更多相关文章
- LOAD DATA INFILE Syntax--官方
LOAD DATA [LOW_PRIORITY | CONCURRENT] [LOCAL] INFILE 'file_name' [REPLACE | IGNORE] INTO TABLE tbl_n ...
- Data Visualization – Banking Case Study Example (Part 1-6)
python信用评分卡(附代码,博主录制) https://study.163.com/course/introduction.htm?courseId=1005214003&utm_camp ...
- Mysql load data infile 导入数据出现:Data truncated for column
[1]Mysql load data infile 导入数据出现:Data truncated for column .... 可能原因分析: (1)数据库表对应字段类型长度不够或修改为其他数据类型( ...
- Mysql load data infile 命令导入含中文csv源数据文件 【错误代码 1300】
[1]Load data infile 命令导入含中文csv源数据文件 报错:Invalid utf8 character string: '??֧' (1)问题现象 csv格式文件源数据: 导入SQ ...
- Mysql load data infile 命令格式
[1]Linux系统环境下 LOAD DATA INFILE /usr/LOCAL/lib/ubcsrvd/datacsv/201909_source.csv INTO TABLE np_cdr_20 ...
- Mysql 命令 load data infile 权限问题
[1]Mysql命令load data infile 执行权限问题 工作中,经常会遇到往线上环境mysql数据库批量导入源数据的场景. 针对这个场景问题,mysql有一个很高效的命令:load dat ...
- mysql load data infile的使用 和 SELECT into outfile备份数据库数据
LOAD DATA [LOW_PRIORITY | CONCURRENT] [LOCAL] INFILE 'file_name.txt' [REPLACE | IGNORE] INTO TABLE t ...
- SQL基本语句(3) LOAD DATA INFILE
使用LOAD语句批量录入数据 语法: LOAD DATA [LOW_PRIORITY | CONCURRENT] [LOCAL] INFILE 'file_name' [REPLACE | IGNOR ...
- mysql导入数据load data infile用法
mysql导入数据load data infile用法 基本语法: load data [low_priority] [local] infile 'file_name txt' [replace | ...
随机推荐
- 检查和收集 Linux 硬件信息的 7 个命令
http://blog.sae.sina.com.cn/archives/3910 在Linux系统中,有许多命令可用于查询主机的硬件信息.一些命令只针对特定的硬件组件,比如CPU.内存,一些命令可以 ...
- java web多线程
1.多线程并发时,多个线程同时请求同一个资源,必然导致此资源的数据不安全,A线程修改了B线 程的处理的数据,而B线程又修改了A线程处理的数理.显然这是由于全局资源造成的,有时为了解 决此问题,优先考虑 ...
- ios开发--GCD使用介绍:4-延迟执行操作
在开发过程中,我们有时会希望把一些操作封装起来延迟一段时间后再执行.iOS开发中,有两种常用的方法可以实现延迟执行,一种是使用GCD,另外一种是使用NSRunLoop类中提供的方法. 1.使用GCD实 ...
- 图解TCP/IP读书笔记(二)
图解TCP/IP读书笔记(二) 第二章.TCP/IP基础知识 一.TCP/IP出现的背景及其历史 年份 事件 20世纪60年代后半叶 应DoD(美国国防部)要求,美国开始进行通信技术相关的研发 196 ...
- 一行代码设置TLabel.Caption的前世今生
第零步,测试代码: procedure TForm1.Button1Click(Sender: TObject); begin Label1.Caption := 'Hello World'; end ...
- 纯js实现瀑布流布局及ajax动态新增数据
本文用纯js代码手写一个瀑布流网页效果,初步实现一个基本的瀑布流布局,以及滚动到底部后模拟ajax数据加载新图片功能. 缺点: 1. 程序不是响应式,不能实时调整页面宽度: 2. 程序中当新增ajax ...
- 自己用反射写的一个request.getParameter工具类
适用范围:当我们在jsp页面需要接收很多值的时候,如果用request.getParameter(属性名)一个一个写的话那就太麻烦了,于是我想是 否能用反射写个工具类来简化这样的代码,经过1个小时的代 ...
- wso2 data services返回json数据方法
一.首先要修改下配置文件. 修改\repository\conf\axis2目录下axis2.xml与axis2_client.xml配置文件. 增加<parameter name=" ...
- Ubuntu下搭建java开发环境
JDK安装: 1. 在http://www.oracle.com/technetwork/java/javase/downloads/index.html上下载相应版本的JDK环境,这里我使用的是jd ...
- 【分享】IT产业中的三大定理(二) —— 安迪&比尔定理 (Andy and Bill's Law)
摩尔定理给所有的计算机消费者带来一个希望,如果我今天嫌计算机太贵买不起,那么我等十八个月就可以用一半的价钱来买.要真是这样简单的话,计算机的销售量就上不去了.需要买计算机的人会多等几个月,已经有计算机 ...