半调子数据科学家又要折腾数据,拿到数据一看,3.6G的zip文件,解压看看,卧槽12个G的sql文件。好吧,又要折腾sql数据了。第一件事,肯定是搭一个数据库,导入数据咯。

折腾过sql导入的亲们都知道,mysql默认的参数,导入的速度还是很慢的,特别是数据忒多的情况。这次的数据,折腾完了之后,有1000W条那么多,不用猜也知道,慢的要死,所以需要对数据库做一些设置。

可以设置的地方有两个,第一个是innodb_flush_log_at_trx_commit。官方手册对各个值解释如下:

Controls the balance between strict ACID compliance for commit operations and higher performance that is possible when commit-related I/O operations are rearranged and done in batches. You can achieve better performance by changing the default value but then you can lose up to a second of transactions in a crash.

The default value of 1 is required for full ACID compliance. With this value, the contents of the InnoDB log buffer are written out to the log file at each transaction commit and the log file is flushed to disk.

With a value of 0, the contents of the InnoDB log buffer are written to the log file approximately once per second and the log file is flushed to disk. No writes from the log buffer to the log file are performed at transaction commit. Once-per-second flushing is not guaranteed to happen every second due to process scheduling issues. Because the flush to disk operation only occurs approximately once per second, you can lose up to a second of transactions with any mysqld process crash.

With a value of 2, the contents of the InnoDB log buffer are written to the log file after each transaction commit and the log file is flushed to disk approximately once per second. Once-per-second flushing is not 100% guaranteed to happen every second, due to process scheduling issues. Because the flush to disk operation only occurs approximately once per second, you can lose up to a second of transactions in an operating system crash or a power outage.

InnoDB log flushing frequency is controlled by innodb_flush_log_at_timeout, which allows you to set log flushing frequency to N seconds (where N is 1 ... 2700, with a default value of 1). However, any mysqld process crash can erase up to N seconds of transactions.

DDL changes and other internal InnoDB activities flush the InnoDB log independent of the innodb_flush_log_at_trx_commit setting.

InnoDB crash recovery works regardless of the innodb_flush_log_at_trx_commit setting. Transactions are either applied entirely or erased entirely.

For durability and consistency in a replication setup that uses InnoDB with transactions:

If binary logging is enabled, set sync_binlog=1.

Always set innodb_flush_log_at_trx_commit=1.

Caution
Many operating systems and some disk hardware fool the flush-to-disk operation. They may tell mysqld that the flush has taken place, even though it has not. In this case, the durability of transactions is not guaranteed even with the setting 1, and in the worst case, a power outage can corrupt InnoDB data. Using a battery-backed disk cache in the SCSI disk controller or in the disk itself speeds up file flushes, and makes the operation safer. You can also try to disable the caching of disk writes in hardware caches.

也就是

  • 1 默认值,最慢,每次事务提交都要写入log并刷新到磁盘上,这是最保险的方式
  • 0 最快,每隔1S将log刷新到磁盘,但是不保证。事务提交不会触发log写入。很不安全,mysql挂了,那么上一秒的数据就都丢了。
  • 2 折中的一种,事务提交会写入log,但是log刷新还是每秒一次,不保证。这种时候,就算mysql崩了,但是只要操作系统还在运转,数据还是会被写到磁盘上。

这里提到,有些磁盘系统,就算是刷新也无法保证数据确实被写入了,笔者就碰到过文件copy到硬盘(机械硬盘)上,机器死掉了,重启之后,只有不到一半的数据还在。查了才知道,数据只是被写入硬盘的缓存上了,还没有写入硬盘。

这个参数可以在my.ini里面设置,但是我们只是临时用一下,而且我本地用的是docker的mysql,弄配置文件比较麻烦,所以直接在mysql命令行里面设置就可以了。

mysql> set GLOBAL innodb_flush_log_at_trx_commit = 0;

第二个可以设置的地方,在导入sql时候使用的参数:

net_buffer_length

Each client thread is associated with a connection buffer and result buffer. Both begin with a size given by net_buffer_length but are dynamically enlarged up to max_allowed_packet bytes as needed. The result buffer shrinks to net_buffer_length after each SQL statement.

This variable should not normally be changed, but if you have very little memory, you can set it to the expected length of statements sent by clients. If statements exceed this length, the connection buffer is automatically enlarged. The maximum value to which net_buffer_length can be set is 1MB.

max_allowed_packet

The maximum size of one packet or any generated/intermediate string, or any parameter sent by the mysql_stmt_send_long_data() C API function. The default is 4MB.

The packet message buffer is initialized to net_buffer_length bytes, but can grow up to max_allowed_packet bytes when needed. This value by default is small, to catch large (possibly incorrect) packets.

You must increase this value if you are using large BLOB columns or long strings. It should be as big as the largest BLOB you want to use. The protocol limit for max_allowed_packet is 1GB. The value should be a multiple of 1024; nonmultiples are rounded down to the nearest multiple.

When you change the message buffer size by changing the value of the max_allowed_packet variable, you should also change the buffer size on the client side if your client program permits it. The default max_allowed_packet value built in to the client library is 1GB, but individual client programs might override this. For example, mysql and mysqldump have defaults of 16MB and 24MB, respectively. They also enable you to change the client-side value by setting max_allowed_packet on the command line or in an option file.

The session value of this variable is read only. The client can receive up to as many bytes as the session value. However, the server will not send to the client more bytes than the current global max_allowed_packet value. (The global value could be less than the session value if the global value is changed after the client connects.)

需要注意的事,需要先确定服务端的设置,客户端的设置不能大于服务端设置。

mysql>show variables like 'max_allowed_packet';
mysql>show variables like 'net_buffer_length';

事实上,我用的mariadb的docker,这两个值的设置已经非常大了。而且官方也提到,mysql命令行里面的默认设置是足够大的,不过我测试的结果,还是写上去,速度会快一点,不晓得为啥。

mysql -h127.0.0.1 -uroot -proot123 data_base_name --max_allowed_packet=16777216 --net_buffer_length=16384<your_sql_script.sql

不过,虽说速度快了很多,但是也是几个小时的功夫才折腾完,这一次的数据文本居多,不知道是不是因为这个,还是有什么别的设置我不知道的。

顺便说一句,后面为了方便还是把数据折腾到mongo里面了,数据占的空间大了挺多,但是同样是单线程操作,中间还加了挺多数据处理,但是一小时之内就搞定了。

半调子数据科学家,还要继续折腾数据。。。

(* ̄︿ ̄)

mysql导入太慢解决方法的更多相关文章

  1. MySQL mysqldump与source导入慢的解决方法

    Jquery中文网 >  数据库  >  mysql  >  正文 MySQL mysqldump与source导入慢的解决方法 MySQL mysqldump与source导入慢的 ...

  2. MySQL常见错误分析与解决方法总结

    MySQL常见错误分析与解决方法总结 一.Can't connect to MySQL server on 'localhost' (10061)翻译:不能连接到 localhost 上的mysql分 ...

  3. mysql导入数据大小设置方法

    MySQL导入数据库文件最大限制2048KB和phpmyadmin导入数据最大限制2048KB的解决方法 解决办法: 1.打开php.ini.找到 upload_max_filesize . memo ...

  4. Windows下mysql忘记密码的解决方法

    Windows下mysql忘记密码的解决方法 mysql5.0 http://www.jb51.net/article/21984.htm方法一: 1.在DOS窗口下输入 net stop mysql ...

  5. Mysql常见报错解决方法

    一:登录报错 ERROR 1045 (28000): Access denied for user 'mysql'@'localhost' (using password: NO) mysql日志文件 ...

  6. PHP+MYSQL 出现乱码的解决方法

    PHP+MYSQL 出现乱码的解决方法 使用PHP+MYSQL时遇到过字符乱问题,解决方法: 在mysql_connect后面加一句SET NAMES UTF8,即可使得UTF8的数据库消除乱码,对于 ...

  7. MYSQL转换编码的解决方法

    MYSQL转换编码的解决方法 一.在utf8的mysql下 得到中文‘游客’的gbk下的16进制编码 mysql> SELECT hex(CONVERT( '游客' USING gbk )); ...

  8. 远程首次连接mysql速度慢的解决方法:skip-name-resolve取消DNS的反向解析(转)

    PHP远程连接MYSQL速度慢,有时远程连接到MYSQL用时4-20秒不等,本地连接MYSQL正常,出现这种问题的主要原因是,默认安装的 MYSQL开启了DNS的反向解析,在MY.INI(WINDOW ...

  9. Adobe Illustrator CS6 界面文字按钮太小,高分屏win10PS/AI等软件界面字太小解决方法

    Adobe Illustrator CS6 界面文字按钮太小,高分屏win10PS/AI等软件界面字太小解决方法 Adobe App Scaling on High DPI Displays (FIX ...

随机推荐

  1. QPainter绘制遇到的小问题

    1.Qt绘图基础 (1)绘图需画笔和画布: QPainter 相当于Qt中的一个画笔,绘制时需要一块画布, Qt中扮演画布角色的组件为QPaintDevice和他的各个子类,如: QWidget, Q ...

  2. update_engine-整体结构(二)

    在update_engine-整体结构(一)中分析UpdateEngineDaemon::OnInit()的整体情况.下面先分析在该方法中涉及的DaemonStateAndroid和BinderUpd ...

  3. 使用OwnCloud搭建自己的云盘

    使用OwnCloud搭建自己的云盘 1.用途 ownCloud是一款开源的私有云框架,可以通过它实现个人网盘的功能,ownCloud提供了各个平台的文件同步客户端,因此搭建好ownCloud之后即可使 ...

  4. vue项目做seo优化(prerender-spa-plugin vue-meta-info)

    今天公司考虑seo设计方案,服务端渲染(ssr)和 预渲染的方式,不过只是打算对几个简单的页面seo,所以选择了使用预渲染的方式,以下是实现过程中遇到的问题,供大家查看,有不对的地方请指正: 使用pr ...

  5. django部署笔记

    在开发机上的准备工作: 确认项目没有bug. 用pip freeze > requirements.txt将当前环境的包导出到requirements.txt文件中,方便在部署的时候安装. 将项 ...

  6. SqlServer高版本数据备份还原到低版本(转)

    原文地址:https://www.jb51.net/article/96454.htm 想要将Sqlserver2014高版本备份的数据还原到低版本SqlServer2008R2上去,但是这在SqlS ...

  7. css -理解盒模型

    标准模式: 总宽度/高度=width/height+padding+border+margin 怪异模式(quirks 模式): 总宽度/高度=width/height + margin = 内容区宽 ...

  8. 通过c++11的condition_variable实现的有最大缓存限制的队列

    之前曾写过一个通过C++11的condition_variable实现的有最大缓存限制的队列,底层使用std::queue来实现,如果想要提升性能的话,可以考虑改用固定的长度环形数组.环形数组实现如下 ...

  9. 3、redis使用场景

    现代高并发复杂系统面临的挑战 现代系统随着功能的复杂化,各种各样需求层出不穷,面对愈加复杂话的业务系统.越来越庞大的用户群体,以及用户对体验的要求越来越高,性能就变得更加重要. 抛开代码逻辑.服务器性 ...

  10. rabbit原理及项目应用

    1.rabbitMQ是什么? mq是由erlang语言开发的开源的amqp的实现. 2.rabbitMQ的基本原理是什么? 使用RabbitMQ,首先需要与rabbitMQ的visiu host建立连 ...