http://geek.rohitkalhans.com/2013/09/enhancedMTS-deepdive.html   科学上网

Introduction

Re-applying binary logs generated from highly concurrent master on the slave has always been an area of focus. It is important for various reasons. First, in real-time systems, it becomes extremely important for the slave to keep up with the master. This can only be guaranteed if the slaves’ performance in reapplying the transactions from the binary log is similar (or at-least comparable) to that of master, which is accepting queries directly from multiple clients. Second, in synchronous replication scenarios, having a fast slaves, aids in reducing the response times as seen by the clients to the master. This can be made possible by applying transactions from the binary log in parallel. However if left uncontrolled, a simple round-robin multi-threaded applying will lead to inconsistency and the slave will no longer be the exact replica of the leader.

The infamous out of order commit problem

The Out of order execution of transaction on the slave if left uncontrolled will lead to the slave diverging from the master. Here is an example: consider two transactions T1 and T2 being applied on an initial state.

On Master we apply T1 and T2 in that order.

State0: x= 1, y= 1
T1: { x:= Read(y);
x:= x+1;
Write(x);
Commit; }
State1: x= 2, y= 1
T2: { y:= Read(x);
y:=y+1;
Write(y);
Commit; }
State2: x= 2, y= 3

On the slave however these two transactions commit out of order (Say T2 and then T1).

State0: x= 1, y= 1
T2: { y:= Read(x);
y:= y+1;
Write(y);
Commit; }
State1: x= 1, y= 2
T1: { x:= Read(y);
x:=x+1;
Write(x);
Commit; }
State2: x= 3, y= 2

As we see above the final state state 2 is different in the two cases. Needless to say that we need to control the transactions that can execute in parallel.

Controlled parallelization

The above problem can be solved by controlling what transactions can be executed in parallel with the ones being executed by the slave. This means we need to have some kind of information in the transactions themselves. Interesting to note that we can use the information of parallelization from the master on the slave. Since we have multiple transactions committing at the same time on the master, we can store the information of the transactions that were in the "process of committing" when this transaction committed. Now let's define the phrase "process of committing".
 

The process of committing: On the slave we need to make sure that the transactions that we schedule for parallel execution will be the one which do not have conflicting read and write set. This is the only and the necessary requirement for the slave  workers to work without conflicts. This also implies that if the transactions being executed in parallel do not have intersecting read and write sets, we don't care if they are committed out of order. Since MySQL uses lock based scheduling, all the transactions that have entered the prepared stage but not as yet committed will have disjoint read and write sets and hence can be executed in parallel.

 

Logical clock and commit parent

We have introduced a logical clock. Now before I am tackled by a mathematician from one side and a computer engineer from the other, let me explain. It is a simple counter which is stepped when a binlog group of transaction commits on the master. Essentially this clock is stepped every time the leader execute the flush stage of binlog group commit. The value of this clock is recorded on each transaction when it enters the prepare stage. This recorded value is the "commit parent"
 

The pseudo code is as follows.

During Prepare
trx.commit_parent= commit_clock.get_timestamp();
 
During Commit
for every binlog group
  commit_clock.step();
As it is evident by now the transactions with the same commit parent follow our guiding principle of slave side parallelization i.e. transactions that have entered the prepared stage but has not as yet committed, and hence can be executed in parallel.

In the example we will take up three transactions (T1 T2 T3), two of which have been committed as a part of the same binlog group. T1 enters the prepare stage and get the commit parent as 0 since none of the group have been committed as yet. T1 assigns itself as the leader and then goes on to flush its transaction/statement cache. In the meanwhile transaction T2 enters the prepare stage. It is also assigned the same commit parent "0"(CP as used in the figure) since the commit clock has not as yet been stepped. T2 then goes on a wait for the leader to flush its cache in to the binlog. After the flush has been completed by the leader, it signals T2 to continue and both of them enter the Sync stage, where the leader thread  calls fsync() there by finishing the binlog commit process. The  transaction T3 however enters the prepare stage after the previous group has been synced and there-by ends up getting the next CP.

Another thing to note here is that the "group" of transactions that are being executed in parallel are not bounded by binlog commit group. There is a possibility that a transaction have entered the binlog prepare stage but could not make it to the current binlog group. Our approach takes care of such cases and makes sure that we relax the boundary of the group being executed in parallel on the slave.

 
On the slave we use the existing infrastructure of DB partitioned MTS to execute the tranactions in parallel, simply by modifying the scheduling logic.
 

Conclusion

This feature provides the great enhancement to the existing MySQL replication. To know more about the configuration options of this enhancement refer to this post.
This feature is available inMySQL 5.7.2 release. You can try it out and let us know the feedback.

 

About the author

Rohit Kalhans is a Software Development Engineer based out of Banglore India. His area of focus revolves around Row-based replication and Multi-threaded parallel slave. His interests include system programming, highly-available and scalable systems. In his free time, he plays guitar and piano and loves to write short stories and poems. More Information can be found on his homepage.

MySQL 5.7: Enhanced Multi-threaded slaves的更多相关文章

  1. MySQL复制配置(多主一从)

    复制多主一从 replicaion 原理 复制有三个步骤:(分为三个线程 slave:io线程 sql线程 master:io线程) 1.master将改变记录到二进制日志(binary log)中( ...

  2. MySql集群FAQ----mysql主从配置与集群区别、集群中需要多少台计算机呢?为什么? 等

    抽取一部分显示在这里,如下, What's the difference in using Clustervs using replication? 在复制系统中,一个MySQL主服务器会更新一个或多 ...

  3. MySQL数据很大的时候

    众所周知,mysql在数据量很大的时候查询的效率是很低的,因为假如你需要 OFFSET 100000 LIMIT 5 这样的数据,数据库就需要跳过前100000条数据,才能返回给你你需要的5条数据.由 ...

  4. mysql 2006

    1.在my.ini文件中添加或者修改以下两个变量:wait_timeout=2880000interactive_timeout = 2880000 关于两个变量的具体说明可以google或者看官方手 ...

  5. MySQL高可用架构:mysql+keepalived实现

    系统环境及架构 #主机名 系统版本 mysql版本 ip地址 mysqlMaster <a href="https://www.linuxprobe.com/" title= ...

  6. MySQL之备份

    MySQL备份和备份 备份/还原 冷备:需要停止当前正在运行mysqld,然后直接拷贝或打包数据文件. 半热备:mysqldump+binlog --适合数据量比较小的应用 在线热备:AB复制 --实 ...

  7. linux MySql 的主从复制部署

    MySql 复制 mysql 复制:将某一台主机上的 Mysql 数据复制到其它主机(slaves)上,并重新执行一遍从而实现 当前主机上的 mysql 数据与(master)主机上数据保持一致的过程 ...

  8. java面试一日一题:讲对mysql的MVCC的理解

    问题:请讲下对mysql中MVCC的理解 分析:这个问题要回答的是对MVCC的理解,以及MVCC解决了什么问题这几个方面入手. 回答要点: 主要从以下几点去考虑, 1.什么是MVCC? 2.MVCC用 ...

  9. BlackArch-Tools

    BlackArch-Tools 简介 安装在ArchLinux之上添加存储库从blackarch存储库安装工具替代安装方法BlackArch Linux Complete Tools List 简介 ...

随机推荐

  1. again

    建立一个IPC连接: net use \\192.168.0.5\ipc$ "123456" /u:administrator

  2. OpenGL超级宝典第5版&&缓冲区

    缓冲区有很多用途:可以保存顶点数据,像素数据,纹理数据,着色器处理的输入,不同着色器阶段的输出. 缓冲区保存在GPU内存中,提供高速有效的访问.   像素缓冲区对象: GLuint pixBuffer ...

  3. 关于FireFox下 CSS3 transition 与其他浏览器的差异

    最近一个项目,动画效果全靠CSS3来做,用得比较多的transition,发现了一点火狐与其他浏览器的小差异. 首先我们写CSS的时候,一般为属性值为0的属性,我们一般会这样写 #id{ posito ...

  4. 【hadoop代码笔记】Mapreduce shuffle过程之Map输出过程

    一.概要描述 shuffle是MapReduce的一个核心过程,因此没有在前面的MapReduce作业提交的过程中描述,而是单独拿出来比较详细的描述. 根据官方的流程图示如下: 本篇文章中只是想尝试从 ...

  5. 30 分钟 Java Lambda 入门教程

    Lambda简介 Lambda作为函数式编程中的基础部分,在其他编程语言(例如:Scala)中早就广为使用,但在Java领域中发展较慢,直到java8,才开始支持Lambda. 抛开数学定义不看,直接 ...

  6. 成都Uber优步司机奖励政策(3月5日)

    滴快车单单2.5倍,注册地址:http://www.udache.com/ 如何注册Uber司机(全国版最新最详细注册流程)/月入2万/不用抢单:http://www.cnblogs.com/mfry ...

  7. GridControl表头全选操作实现之最优方法

    突然发现很久没有写博客了. 昨天整了个Windows Live Writer 就为了以后好好写写博客. 所以,开始咯. 为了积累,也为了分享. 之前在博客园中看到一篇文章:<Winform分页控 ...

  8. JavaIO(01)File类详解

    File类 file类中的主要方法和变量   常量: 表示路径的分割符:(windows) 作用:根据java可移植性的特点,编写路径一定要符合本地操作系统要求的分割符: public static ...

  9. WS103C8例程——串口2【worldsing笔记】

    在超MINI核心板 stm32F103C8最小系统板上调试Usart2功能:用Jlink 6Pin接口连接WStm32f103c8的Uart2,PC机向mcu发送数据,mcu收到数据后数据加1,回传给 ...

  10. POJ1228(稳定凸包问题)

    题目:Grandpa's Estate   题意:输入一个凸包上的点(没有凸包内部的点,要么是凸包顶点,要么是凸包边上的点),判断这个凸包是否稳定.所谓稳 定就是判断能不能在原有凸包上加点,得到一个更 ...