http://geek.rohitkalhans.com/2013/09/enhancedMTS-deepdive.html   科学上网

Introduction

Re-applying binary logs generated from highly concurrent master on the slave has always been an area of focus. It is important for various reasons. First, in real-time systems, it becomes extremely important for the slave to keep up with the master. This can only be guaranteed if the slaves’ performance in reapplying the transactions from the binary log is similar (or at-least comparable) to that of master, which is accepting queries directly from multiple clients. Second, in synchronous replication scenarios, having a fast slaves, aids in reducing the response times as seen by the clients to the master. This can be made possible by applying transactions from the binary log in parallel. However if left uncontrolled, a simple round-robin multi-threaded applying will lead to inconsistency and the slave will no longer be the exact replica of the leader.

The infamous out of order commit problem

The Out of order execution of transaction on the slave if left uncontrolled will lead to the slave diverging from the master. Here is an example: consider two transactions T1 and T2 being applied on an initial state.

On Master we apply T1 and T2 in that order.

State0: x= 1, y= 1
T1: { x:= Read(y);
x:= x+1;
Write(x);
Commit; }
State1: x= 2, y= 1
T2: { y:= Read(x);
y:=y+1;
Write(y);
Commit; }
State2: x= 2, y= 3

On the slave however these two transactions commit out of order (Say T2 and then T1).

State0: x= 1, y= 1
T2: { y:= Read(x);
y:= y+1;
Write(y);
Commit; }
State1: x= 1, y= 2
T1: { x:= Read(y);
x:=x+1;
Write(x);
Commit; }
State2: x= 3, y= 2

As we see above the final state state 2 is different in the two cases. Needless to say that we need to control the transactions that can execute in parallel.

Controlled parallelization

The above problem can be solved by controlling what transactions can be executed in parallel with the ones being executed by the slave. This means we need to have some kind of information in the transactions themselves. Interesting to note that we can use the information of parallelization from the master on the slave. Since we have multiple transactions committing at the same time on the master, we can store the information of the transactions that were in the "process of committing" when this transaction committed. Now let's define the phrase "process of committing".
 

The process of committing: On the slave we need to make sure that the transactions that we schedule for parallel execution will be the one which do not have conflicting read and write set. This is the only and the necessary requirement for the slave  workers to work without conflicts. This also implies that if the transactions being executed in parallel do not have intersecting read and write sets, we don't care if they are committed out of order. Since MySQL uses lock based scheduling, all the transactions that have entered the prepared stage but not as yet committed will have disjoint read and write sets and hence can be executed in parallel.

 

Logical clock and commit parent

We have introduced a logical clock. Now before I am tackled by a mathematician from one side and a computer engineer from the other, let me explain. It is a simple counter which is stepped when a binlog group of transaction commits on the master. Essentially this clock is stepped every time the leader execute the flush stage of binlog group commit. The value of this clock is recorded on each transaction when it enters the prepare stage. This recorded value is the "commit parent"
 

The pseudo code is as follows.

During Prepare
trx.commit_parent= commit_clock.get_timestamp();
 
During Commit
for every binlog group
  commit_clock.step();
As it is evident by now the transactions with the same commit parent follow our guiding principle of slave side parallelization i.e. transactions that have entered the prepared stage but has not as yet committed, and hence can be executed in parallel.

In the example we will take up three transactions (T1 T2 T3), two of which have been committed as a part of the same binlog group. T1 enters the prepare stage and get the commit parent as 0 since none of the group have been committed as yet. T1 assigns itself as the leader and then goes on to flush its transaction/statement cache. In the meanwhile transaction T2 enters the prepare stage. It is also assigned the same commit parent "0"(CP as used in the figure) since the commit clock has not as yet been stepped. T2 then goes on a wait for the leader to flush its cache in to the binlog. After the flush has been completed by the leader, it signals T2 to continue and both of them enter the Sync stage, where the leader thread  calls fsync() there by finishing the binlog commit process. The  transaction T3 however enters the prepare stage after the previous group has been synced and there-by ends up getting the next CP.

Another thing to note here is that the "group" of transactions that are being executed in parallel are not bounded by binlog commit group. There is a possibility that a transaction have entered the binlog prepare stage but could not make it to the current binlog group. Our approach takes care of such cases and makes sure that we relax the boundary of the group being executed in parallel on the slave.

 
On the slave we use the existing infrastructure of DB partitioned MTS to execute the tranactions in parallel, simply by modifying the scheduling logic.
 

Conclusion

This feature provides the great enhancement to the existing MySQL replication. To know more about the configuration options of this enhancement refer to this post.
This feature is available inMySQL 5.7.2 release. You can try it out and let us know the feedback.

 

About the author

Rohit Kalhans is a Software Development Engineer based out of Banglore India. His area of focus revolves around Row-based replication and Multi-threaded parallel slave. His interests include system programming, highly-available and scalable systems. In his free time, he plays guitar and piano and loves to write short stories and poems. More Information can be found on his homepage.

MySQL 5.7: Enhanced Multi-threaded slaves的更多相关文章

  1. MySQL复制配置(多主一从)

    复制多主一从 replicaion 原理 复制有三个步骤:(分为三个线程 slave:io线程 sql线程 master:io线程) 1.master将改变记录到二进制日志(binary log)中( ...

  2. MySql集群FAQ----mysql主从配置与集群区别、集群中需要多少台计算机呢?为什么? 等

    抽取一部分显示在这里,如下, What's the difference in using Clustervs using replication? 在复制系统中,一个MySQL主服务器会更新一个或多 ...

  3. MySQL数据很大的时候

    众所周知,mysql在数据量很大的时候查询的效率是很低的,因为假如你需要 OFFSET 100000 LIMIT 5 这样的数据,数据库就需要跳过前100000条数据,才能返回给你你需要的5条数据.由 ...

  4. mysql 2006

    1.在my.ini文件中添加或者修改以下两个变量:wait_timeout=2880000interactive_timeout = 2880000 关于两个变量的具体说明可以google或者看官方手 ...

  5. MySQL高可用架构:mysql+keepalived实现

    系统环境及架构 #主机名 系统版本 mysql版本 ip地址 mysqlMaster <a href="https://www.linuxprobe.com/" title= ...

  6. MySQL之备份

    MySQL备份和备份 备份/还原 冷备:需要停止当前正在运行mysqld,然后直接拷贝或打包数据文件. 半热备:mysqldump+binlog --适合数据量比较小的应用 在线热备:AB复制 --实 ...

  7. linux MySql 的主从复制部署

    MySql 复制 mysql 复制:将某一台主机上的 Mysql 数据复制到其它主机(slaves)上,并重新执行一遍从而实现 当前主机上的 mysql 数据与(master)主机上数据保持一致的过程 ...

  8. java面试一日一题:讲对mysql的MVCC的理解

    问题:请讲下对mysql中MVCC的理解 分析:这个问题要回答的是对MVCC的理解,以及MVCC解决了什么问题这几个方面入手. 回答要点: 主要从以下几点去考虑, 1.什么是MVCC? 2.MVCC用 ...

  9. BlackArch-Tools

    BlackArch-Tools 简介 安装在ArchLinux之上添加存储库从blackarch存储库安装工具替代安装方法BlackArch Linux Complete Tools List 简介 ...

随机推荐

  1. How to enable DateTimePicker to use both date and time z

    Recently in one of my project I needed to have an option to display the DateTimePicker allowing user ...

  2. UVA 11600-Masud Rana(状压,概率dp)

    题意: 有n个节点的图,开始有一些边存在,现在每天任意选择两点连一条边(可能已经连过),求使整个图联通的期望天数. 分析: 由于开始图可以看做几个连通分量,想到了以前做的一个题,一个点代表一个集合(这 ...

  3. hdu1722 bjfu1258 辗转相除法

    这题就是个公式,代码极简单.但我想,真正明白这题原理的人并不多.很多人只是随便网上一搜,找到公式a了就行,其实这样对自己几乎没有提高. 鉴于网上关于这题的解题报告中几乎没有讲解原理的,我就多说几句,也 ...

  4. IOS 第三方开源库记录

    网易客户端使用 1.ZipArchive 2.wax 3.TTTAttributedLabel 4.SSKeychain 5.SDWebImage 6.RegexKitLite 7.pop 8.NJK ...

  5. 简单把webdriver的find_element方法写成函数

    __author__ = 'jyd' from selenium.webdriver.common.by import By #driver webdriver实例化对象 #element 查询元素的 ...

  6. 【JSONCpp】简介及demo

    一.JSON简介 JSON 一种轻量级的数据交换格式,易于阅读.编写.解析,全称为JavsScript ObjectNotation. JSON由两种基本结构组成 ①   名字/值 对的集合,可以理解 ...

  7. Android-day02_广播

    1.什么是广播 貌似一个人大声喊一句话,别人听到了这就是广播 2.在android中广播有标准广播和有序广播 标准广播也就是发送一个广播,所有人都能同一时间接收到 有序广播则是有顺序的广播,发送的时候 ...

  8. CSS快速制作图片轮播的焦点

    来源:http://www.ido321.com/858.html 效果图: 演示地址:http://jsfiddle.net/Web_Code/q5qfd8aL/embedded/result/ 代 ...

  9. 【sgu282】Isomorphism

    题意: 给出n(n<=53)点的无向完全图 要将每条边染上m(m<=1000)种颜色的一种 只改变顶点编号的图视为同种方案 求本质不同方案数%p(p>n且为质树)的值 题解: 这题貌 ...

  10. 安卓app开发方式之webApp

    1.phonegap: 专注于webapp调用native的功能.2.ionic: 专注于webapp的前端ui技术,需要与phonegap(准确的说是和Cordova配合使用).ionic是一个专注 ...