MySQL MHA--在线主库切换(Online master switch)

在线主库切换(Online master switch)条件

1、所有节点正常运行，无论时原主还是新主或者其他从库

  if ( $#dead_servers >= 0 ) {

    $log->error(

      "Switching master should not be started if one or more servers is down."

    );

    $log->info("Dead Servers:");

    $_server_manager->print_dead_servers();

    croak;

  }

2、主库正常，能获取到相关主库信息如Server-ID和BINLOG位点信息。

  $orig_master = $_server_manager->get_current_alive_master();

  if ( !$orig_master ) {

    $log->error(

"Failed to get current master. Maybe replication is misconfigured or configuration file is broken?"

    );

    croak;

  }

3、MHA Manager/Monitor处于关闭状态

  $log->info("Checking MHA is not monitoring or doing failover..");

  if ( $orig_master->get_monitor_advisory_lock() ) {

    $log->error(

"Getting advisory lock failed on the current master. MHA Monitor runs on the current master. Stop MHA Manager/Monitor and try again."

    );

    croak;

  }

4、主库和从库上没有超大事务(默认参数running_updates_limit=1)

  my @threads = $orig_master->get_running_update_threads( $g_running_updates_limit +  );

  if ( $#threads >= 0 ) {

    $log->error(

      sprintf(

"We should not start online master switch when one of connections are running long updates on the current master(%s). Currently %d update thread(s) are running.",

        $orig_master->get_hostinfo(),

        $#threads + 1

      )

    );

    MHA::DBHelper::print_threads_util( \@threads,  );

    croak;

  }

  my @threads = $new_master->get_running_threads($g_running_seconds_limit);

  if ( $#threads >= 0 ) {

    $log->error(

      sprintf(

"We should not start online master switch when one of connections are running long queries on the new master(%s). Currently %d thread(s) are running.",

        $new_master->get_hostinfo(),

        $#threads + 1

      )

    );

    MHA::DBHelper::print_threads_util( \@threads,  );

    croak;

  }

在线主库切换(Online master switch)步骤

1、配置检测(Configuration Check Phase)

、检查主从关系和存活服务器

、主库执行FLUSH NO_WRITE_TO_BINLOG TABLES关闭打开的表

、检查主从复制是否正常

、挑选新主库，并检查新主是否满足条件

、检查当前主库的复制过滤规则，将当前主库设置为dummy slave。

2、禁写当前主库(Rejecting updates Phase)

、尝试当前主库上调用master_ip_online_change_script来进行操作，建议在该脚本中对主库禁写和停用VIP

、使用FLUSH TABLES WITH READ LOCK来禁止当前主库上所有写操作，获取当前主库上最新位点信息

3、启用新主库(switch_master)

、等待新主库复制同步，获取新主库上最新位点信息

、尝试在新主库上调用master_ip_online_change_script来进行操作，建议在该脚本中对从库开启写权限和启用VIP

、新主库上关闭READ_ONLY选项

4、并行切换所有从库(Switching slaves in parallel)

、根据步骤2获取到的原主库最后位点，等待从库应用完所有BINLOG

、根据步骤3获取到的新主库最初位点，重置所有从库。

5、重置原主库(Starting orig master as a new slave)

、对原主库执行(UNLOCK TABLES)释放锁

、按照步骤3获取到的新主库最初位点，重置原主库为新从库。

6、新主库复制信息清理(New master cleanup phase)

、调用STOP SLAVE命令停止复制

、对于5.5版本调用CHANGE MASTER TO MASTER_HOST=''去除原复制信息

、调用RESET SLAVE /*! ALL */命令重置复制

在线主库切换的输出日志：

[root@DBproxy app1]# masterha_master_switch --conf=/data/masterha/app1/app1.cnf --master_state=alive --new_master_host=192.168.0.60 --orig_master_is_new_slave --running_updates_limit= --interactive=

Sat Jul  ::  - [info] MHA::MasterRotate version 0.56.

Sat Jul  ::  - [info] Starting online master switch..

Sat Jul  ::  - [info]

Sat Jul  ::  - [info] * Phase : Configuration Check Phase..

Sat Jul  ::  - [info]

Sat Jul  ::  - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.

Sat Jul  ::  - [info] Reading application default configuration from /data/masterha/app1/app1.cnf..

Sat Jul  ::  - [info] Reading server configuration from /data/masterha/app1/app1.cnf..

Sat Jul  ::  - [info] GTID failover mode =

Sat Jul  ::  - [info] Current Alive Master: 192.168.0.50(192.168.0.50:)

Sat Jul  ::  - [info] Alive Slaves:

Sat Jul  ::  - [info]   192.168.0.60(192.168.0.60:)  Version=5.6.-log (oldest major version between slaves) log-bin:enabled

Sat Jul  ::  - [info]     Replicating from 192.168.0.50(192.168.0.50:)

Sat Jul  ::  - [info]     Primary candidate for the new Master (candidate_master is set)

Sat Jul  ::  - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..

Sat Jul  ::  - [info]  ok.

Sat Jul  ::  - [info] Checking MHA is not monitoring or doing failover..

Sat Jul  ::  - [info] Checking replication health on 192.168.0.60..

Sat Jul  ::  - [info]  ok.

Sat Jul  ::  - [info] 192.168.0.60 can be new master.

Sat Jul  ::  - [info]

From:

192.168.0.50(192.168.0.50:) (current master)

 +--192.168.0.60(192.168.0.60:)

To:

192.168.0.60(192.168.0.60:) (new master)

 +--192.168.0.50(192.168.0.50:)

Sat Jul  ::  - [info] Checking whether 192.168.0.60(192.168.0.60:) is ok for the new master..

Sat Jul  ::  - [info]  ok.

Sat Jul  ::  - [info] 192.168.0.50(192.168.0.50:): SHOW SLAVE STATUS returned empty result. To check replication filtering rules, temporarily executing CHANGE MASTER to a dummy host.

Sat Jul  ::  - [info] 192.168.0.50(192.168.0.50:): Resetting slave pointing to the dummy host.

Sat Jul  ::  - [info] ** Phase : Configuration Check Phase completed.

Sat Jul  ::  - [info]

Sat Jul  ::  - [info] * Phase : Rejecting updates Phase..

Sat Jul  ::  - [info]

Sat Jul  ::  - [warning] master_ip_online_change_script is not defined. Skipping disabling writes on the current master.

Sat Jul  ::  - [info] Locking all tables on the orig master to reject updates from everybody (including root):

Sat Jul  ::  - [info] Executing FLUSH TABLES WITH READ LOCK..

Sat Jul  ::  - [info]  ok.

Sat Jul  ::  - [info] Orig master binlog:pos is mysql-bin.:.

Sat Jul  ::  - [info]  Waiting to execute all relay logs on 192.168.0.60(192.168.0.60:)..

Sat Jul  ::  - [info]  master_pos_wait(mysql-bin.:) completed on 192.168.0.60(192.168.0.60:). Executed  events.

Sat Jul  ::  - [info]   done.

Sat Jul  ::  - [info] Getting new master's binlog name and position..

Sat Jul  ::  - [info]  mysql-bin.:

Sat Jul  ::  - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.0.60', MASTER_PORT=, MASTER_LOG_FILE='mysql-bin.000008', MASTER_LOG_POS=, MASTER_USER='repl', MASTER_PASSWORD='xxx';

Sat Jul  ::  - [info]

Sat Jul  ::  - [info] * Switching slaves in parallel..

Sat Jul  ::  - [info]

Sat Jul  ::  - [info] Unlocking all tables on the orig master:

Sat Jul  ::  - [info] Executing UNLOCK TABLES..

Sat Jul  ::  - [info]  ok.

Sat Jul  ::  - [info] Starting orig master as a new slave..

Sat Jul  ::  - [info]  Resetting slave 192.168.0.50(192.168.0.50:) and starting replication from the new master 192.168.0.60(192.168.0.60:)..

Sat Jul  ::  - [info]  Executed CHANGE MASTER.

Sat Jul  ::  - [info]  Slave started.

Sat Jul  ::  - [info] All new slave servers switched successfully.

Sat Jul  ::  - [info]

Sat Jul  ::  - [info] * Phase : New master cleanup phase..

Sat Jul  ::  - [info]

Sat Jul  ::  - [info]  192.168.0.60: Resetting slave info succeeded.

Sat Jul  ::  - [info] Switching master to 192.168.0.60(192.168.0.60:) completed successfully.

上面日志摘抄自：https://www.cnblogs.com/polestar/p/5737121.html

GTID模式和非GTID模式切换

“原主库切换为新从库”和“原从库切换为新从库”都调用/MHA/ServerManager.pm中的change_master_and_start_slave方法：

  if ( $self->is_gtid_auto_pos_enabled() && !$target->{is_mariadb} ) {

    $dbhelper->change_master_gtid( $addr, $master->{port},

      $master->{repl_user}, $master->{repl_password} );

  }

  else {

    $dbhelper->change_master( $addr,

      $master->{port}, $master_log_file, $master_log_pos, $master->{repl_user},

      $master->{repl_password} );

  }

会根据每个从库的原模式来进行切换，如果原模式使用GTID复制，则切换后也使用GTID复制。

在判断复制同步时，使用/MHA/DBHelper.pm中的master_pos_wait方法：

use constant Master_Pos_Wait_NoTimeout_SQL => "SELECT MASTER_POS_WAIT(?,?,0) AS Result";
sub master_pos_wait($$$) {

  my $self        = shift;

  my $binlog_file = shift;

  my $binlog_pos  = shift;

  my $sth         = $self->{dbh}->prepare(Master_Pos_Wait_NoTimeout_SQL);

  $sth->execute( $binlog_file, $binlog_pos );

  my $href = $sth->fetchrow_hashref;

  return $href->{Result};

}

通过MySQL中MASTER_POS_WAIT函数来确保所有原主库上的日志被应用完成，在该过程中，没有使用Executed_Gtid_Set来对比差异。

函数master_pos_wait

语法 select master_pos_wait(file, pos[, timeout]).

参数file和pos对应要执行到主库BINLOG位点信息，函数逻辑是等待当前从库达到这个位置后返回, 返回期间执行的事务个数。

参数timeout可选，若缺省则无限等待，timeout<=0时与缺省的逻辑相同。若为正数，则等待这么多秒，超时函数返回-.

其他返回值：若当前slave为启动或在等待期间被终止，返回NULL； 若指定的值已经在之前达到，返回0

参考资料：

https://www.cnblogs.com/xiaoboluo768/p/5210820.html

MySQL MHA--在线主库切换(Online master switch)的更多相关文章

mysql mha 主从自动切换高可用
mha(Master High Availability)目前在MySQL多服务器(超过二台),高可用方面是一个相对成熟的解决方案. 一,什么是mha,有什么特性 1. 主服务器的自动监控和故障转移 ...
MySQL MHA FailOver后，原Master节点自动以Slave角色加入解群的研究与实现
MHA是一套MySQL高可用管理软件,除了检测Master宕机后,提升候选Slave为New Master之外(漂虚拟IP),还会自动让其他Slave与New Master 建立复制关系.MHA Ma ...
MySQL MHA候选主库选择
MHA在选择新主库时,会将所有存活的从库分为下面几类: 存活从库数组:挑选所有存活的从库最新从库数组: 挑选Master_Log_File+Read_Master_Log_Pos最高的从库优选从库 ...
MySQL MHA+Keepalived
一.MHA的简单介绍MHA是由perl语言编写的,用外挂脚本的方式实现mysql主从复制的高可用性.MHA可以自动检测mysql是否宕机,如果宕机,在10-30s内完成new master的选举,应用 ...
MySQL高可用方案MHA在线切换的步骤及原理
在日常工作中,会碰到如下的场景,如mysql数据库升级,主服务器硬件升级等,这个时候就需要将写操作切换到另外一台服务器上,那么如何进行在线切换呢?同时,要求切换过程短,对业务的影响比较小. MHA就提 ...
MHA在线切换过程
MHA 在线切换是MHA除了自动监控切换换提供的另外一种方式,多用于诸如硬件升级,MySQL数据库迁移等等.该方式提供快速切换和优雅的阻塞写入,无关关闭原有服务器,整个切换过程在0.5-2s 的时间左 ...
MHA在线切换的步骤及原理
在日常工作中,会碰到如下的场景,如mysql数据库升级,主服务器硬件升级等,这个时候就需要将写操作切换到另外一台服务器上,那么如何进行在线切换呢?同时,要求切换过程短,对业务的影响比较小. MHA就提 ...
MySQL 有关MHA搭建与切换的几个错误log
1:masterha_check_repl 副本集方面报错 replicates is not defined in the configuration file! 具体信息如下: # /usr/l ...
MHA故障切换和在线手工切换原理
一.故障切换的过程当master_manager监控到主库mysqld服务停止后,首先对主库进行SSH登录检查(save_binary_logs -command=test),然后对mysqld服务 ...

随机推荐

使用CompletableFuture实现业务服务的异步调用实战代码
假如我有一个订单相关的统计接口,需要返回3样数据:今日订单数.今日交易额.总交易额. 一般的我们的做法是串行调用3个函数,把调用返回的结果返回给调用者,这3次调用时串行执行的,如果每个调用耗时1秒的话 ...
《Effective Java》第1章创建和销毁对象
第1条用静态工厂方法代替构造器这个静态工厂,与设计模式中的静态工厂不同,这里的静态工厂方法,替换为“静态方法”比较好理解,主要就是建议编写静态方法来创建对象. 使用静态方法的好处: 1.静态方法有 ...
vbscript--FileSystemObject详解
https://blog.csdn.net/superbirds/article/details/6762748 FSO是FileSystemObject 或 Scripting.FileSystem ...
Mac下进入MySQL命令行
/usr/local/MySQL/bin/mysql -u root -p 其中,root为数据库用户名
通过TopShelf简单创建windows service
目前很多项目都是B/S架构的,我们经常会用到webapi.MVC等框架,实际项目中可能不仅仅是一些数据的增删改查,需要对数据进行计算,但是将计算逻辑放到api层又会拖累整个项目的运行速度,从而会写一些 ...
【LOJ502】[LibreOJ β Round] ZQC 的截图（随机化）
真的是神仙题目啊-- 题目 LOJ502 官方题解我认为官方题解比我讲得好. 分析这是一道蒙特卡洛算法的好题上面那个奇奇怪怪的词是从官方题解里看到的,意思大概就是随机化算法 -- ? 一句话题意 ...
【转帖】HBase之五：hbase的region分区
HBase之五:hbase的region分区 https://www.cnblogs.com/duanxz/p/3154487.html 一.Region 概念 Region是表获取和分布的基本元素, ...
c++11多线程记录3：数据争用和Mutex的使用
https://www.youtube.com/watch?v=3ZxZPeXPaM4 学习视频数据争用简单来说就是存在多个线程同时对某个共同的对象进行读写(至少有一个线程在做写操作),造成读取这 ...
python学习-32 zip函数
zip 拉链方法例如:1. ')))) 运行结果: [(')] Process finished with exit code 0 2. a = {'name':'abc','age':18,'ad ...
前端框架之Bootstrap框架
下载地址:https://v3.bootcss.com/,下载Bootstrap3版本下载之后把文件中不需要的文件都删掉需要获取的样式代码,可以直接从这些地方找到,然后复制一.HTML页面导入文 ...

MySQL MHA--在线主库切换(Online master switch)

MySQL MHA--在线主库切换(Online master switch)的更多相关文章

随机推荐

热门专题