LOAD DATA INFILE – performance case study
转:
http://venublog.com/2007/11/07/load-data-infile-performance/
I often noticed that people complain about the LOAD DATA performance when loading the table with large number of rows of data. Even today I saw a case where the LOAD DATA on a simple 3 column table with about 5 million rows taking ~15 minutes of time. This is because the server did not had any tuning in regards to bulk insertion.
Consider the following simple MyISAM table on Redhat Linux 32-bit.
Shell
1
2
3
4
5
6
7
8
|
CREATE TABLE load1 (
`col1` varchar(100) NOT NULL default '',
`col2` int(11) default NULL,
`col3` char(1) default NULL,
PRIMARY KEY (`col1`)
) TYPE=MyISAM;
|
The table has a string key column. Here is the data file(download here) that I used it for testing:
Shell
1
2
3
4
5
6
7
|
[vanugant@escapereply:t55 tmp]$ wc loaddata.csv
5164946 5164946 227257389 loaddata.csv
[vanugant@escapereply:t55 tmp]$ ls -alh loaddata.csv
-rw-r--r-- 1 vanugant users 217M Nov 6 14:42 loaddata.csv
[vanugant@escapereply:t55 tmp]$
|
Here is the default mysql system variables related to LOAD DATA:
Shell
1
2
3
4
5
6
7
8
9
10
|
mysql> show variables;
+-------------------------+---------+
| Variable_name | Value |
+-------------------------+---------+
| bulk_insert_buffer_size | 8388608 |
| myisam_sort_buffer_size | 16777216 |
| key_buffer_size | 33554432 |
+-------------------------+----------+
|
and here is the actual LOAD DATA query to load all ~5m rows (~256M of data) to the table and its timing.
Shell
1
2
3
4
5
|
mysql> LOAD DATA INFILE '/home/vanugant/tmp/loaddata.csv' IGNORE INTO TABLE load1 FIELDS TERMINATED BY ',';
Query OK, 4675823 rows affected (14 min 56.84 sec)
Records: 5164946 Deleted: 0 Skipped: 489123 Warnings: 0
|
Now, lets experiment by disabling the keys in the table before running the LOAD DATA:
Shell
1
2
3
4
5
6
7
8
9
10
11
|
mysql> SET SESSION BULK_INSERT_BUFFER_SIZE=314572800;
Query OK, 0 rows affected (0.00 sec)
mysql> alter table load1 disable keys;
Query OK, 0 rows affected (0.00 sec)
mysql> LOAD DATA INFILE '/home/vanugant/tmp/loaddata.csv' IGNORE INTO TABLE load1 FIELDS TERMINATED BY ',';
Query OK, 4675823 rows affected (13 min 47.50 sec)
Records: 5164946 Deleted: 0 Skipped: 489123 Warnings: 0
|
No use, just 1% increase or same…., now lets set the real MyISAM values… and try again…
Shell
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
|
mysql> SET SESSION BULK_INSERT_BUFFER_SIZE=256217728;
Query OK, 0 rows affected (0.00 sec)
mysql> set session MYISAM_SORT_BUFFER_SIZE=256217728;
Query OK, 0 rows affected (0.00 sec)
mysql> set global KEY_BUFFER_SIZE=256217728;
Query OK, 0 rows affected (0.05 sec)
mysql> alter table load1 disable keys;
Query OK, 0 rows affected (0.00 sec)
mysql> LOAD DATA INFILE '/home/vanugant/tmp/loaddata.csv' IGNORE INTO TABLE load1 FIELDS TERMINATED BY ',';
Query OK, 4675823 rows affected (1 min 55.05 sec)
Records: 5164946 Deleted: 0 Skipped: 489123 Warnings: 0
mysql> alter table load1 enable keys;
Query OK, 0 rows affected (0.00 sec)
|
Wow…thats almost 90% increase in the performance. So, disabling the keys in MyISAM is not just the key, but tuning the buffer size does play role based on the input data.
For the same case with Innodb, here is the status by adjusting the Innodb_buffer_pool_size=1G andInnodb_log_file_size=256M along with innodb_flush_logs_at_trx_commit=1.
Shell
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
mysql> show variables like '%innodb%size';
+---------------------------------+------------+
| Variable_name | Value |
+---------------------------------+------------+
| innodb_additional_mem_pool_size | 26214400 |
| innodb_buffer_pool_size | 1073741824 |
| innodb_log_buffer_size | 8388608 |
| innodb_log_file_size | 268435456 |
+---------------------------------+------------+
mysql> LOAD DATA INFILE '/home/vanugant/tmp/loaddata.csv' IGNORE INTO TABLE load1 FIELDS TERMINATED BY ',';
Query OK, 4675823 rows affected (2 min 37.53 sec)
Records: 5164946 Deleted: 0 Skipped: 489123 Warnings: 0
|
With innodb_flush_logs_at_trx_commit=2, innodb_flush_method=O_DIRECT and innodb_doublewrite=0; it will be another 40% difference (use all these variables with caution, unless you know what you are doing)
Shell
1
2
3
4
5
|
mysql> LOAD DATA INFILE '/home/vanugant/tmp/loaddata.csv' IGNORE INTO TABLE load1 FIELDS TERMINATED BY ',';
Query OK, 4675823 rows affected (1 min 53.69 sec)
Records: 5164946 Deleted: 0 Skipped: 489123 Warnings: 0
|
LOAD DATA INFILE – performance case study的更多相关文章
- LOAD DATA INFILE Syntax--官方
LOAD DATA [LOW_PRIORITY | CONCURRENT] [LOCAL] INFILE 'file_name' [REPLACE | IGNORE] INTO TABLE tbl_n ...
- Data Visualization – Banking Case Study Example (Part 1-6)
python信用评分卡(附代码,博主录制) https://study.163.com/course/introduction.htm?courseId=1005214003&utm_camp ...
- Mysql load data infile 导入数据出现:Data truncated for column
[1]Mysql load data infile 导入数据出现:Data truncated for column .... 可能原因分析: (1)数据库表对应字段类型长度不够或修改为其他数据类型( ...
- Mysql load data infile 命令导入含中文csv源数据文件 【错误代码 1300】
[1]Load data infile 命令导入含中文csv源数据文件 报错:Invalid utf8 character string: '??֧' (1)问题现象 csv格式文件源数据: 导入SQ ...
- Mysql load data infile 命令格式
[1]Linux系统环境下 LOAD DATA INFILE /usr/LOCAL/lib/ubcsrvd/datacsv/201909_source.csv INTO TABLE np_cdr_20 ...
- Mysql 命令 load data infile 权限问题
[1]Mysql命令load data infile 执行权限问题 工作中,经常会遇到往线上环境mysql数据库批量导入源数据的场景. 针对这个场景问题,mysql有一个很高效的命令:load dat ...
- mysql load data infile的使用 和 SELECT into outfile备份数据库数据
LOAD DATA [LOW_PRIORITY | CONCURRENT] [LOCAL] INFILE 'file_name.txt' [REPLACE | IGNORE] INTO TABLE t ...
- SQL基本语句(3) LOAD DATA INFILE
使用LOAD语句批量录入数据 语法: LOAD DATA [LOW_PRIORITY | CONCURRENT] [LOCAL] INFILE 'file_name' [REPLACE | IGNOR ...
- mysql导入数据load data infile用法
mysql导入数据load data infile用法 基本语法: load data [low_priority] [local] infile 'file_name txt' [replace | ...
随机推荐
- vim不保存退出
对于刚开始使用vi/vim文本编辑器的新手来说,如何在不保存更改而退出vi/vim 文本编辑器呢? 当你使用linux vi/vim 文本编辑器对linux下某个配置文件做编辑操作,当你更改完之后,可 ...
- KETTLE、spoon使用
ETL是Extract”.“ Transform” .“Load”三个单词的首字母缩写分别代表了抽取.转换.装载.是数据仓库中重要的一环.ETL是数据的抽取清洗转换加载的过程,是数据进入数据仓库进行大 ...
- java开发--struts2 标签库使用
在工程中使用struts2标签 一.struts2标签定义文件在struts2-core-2.0.11.1\META-INF 下面,文件名为struts-tags.tld 二.如果工程使用了servl ...
- 美丽渐变色的Form
一直都非常喜欢渐变色的界面,但是没想到漂亮的渐变Form原来这么简单...实在是没想到...看来我不仅技术水平低,脑袋里的创意也是空空如也... --------------------------- ...
- 281. Zigzag Iterator
题目: Given two 1d vectors, implement an iterator to return their elements alternately. For example, g ...
- CentOS中的chkconfig命令
chkconfig: chkconfig命令主要用来更新(启动或停止)和查询系统服务的运行级信息.谨记chkconfig不是立即自动禁止或激活一个服务,它只是简单的改变了符号连接.语法: ...
- [转]Debian 安装与卸载包命令(APT&&DPKG)
转自:zhangjunhd 的BLOG 1.APT主要命令apt-cache search ------package 搜索包sudo apt-get install ------package 安 ...
- Android电源管理-休眠简要分析
一.开篇 1.Linux 描述的电源状态 - On(on) S0 - Working - Standb ...
- Server Profiler
Server Profiler 2014-10-31 工作原理 SQL Server Profiler这个工具是SQL Trace的一个GUI的版本,而SQL Trace是一组脚本,自SQL Serv ...
- 大数据工具——Splunk
Splunk是机器数据的引擎.使用 Splunk 可收集.索引和利用所有应用程序.服务器和设备(物理.虚拟和云中)生成的快速移动型计算机数据 .从一个位置搜索并分析所有实时和历史数据. 使用 Splu ...