mha方案来自:http://www.cnblogs.com/xuanzhi201111/p/4231412.html

MHA的在线切换

192.168.2.131 [root bin]$ masterha_master_switch --conf=/etc/masterha/app1.cnf --master_state=alive --new_master_host=192.168.2.129 --new_master_port=3306 --orig_master_is_new_slave --running_updates_limit=10000
Mon Jan 19 01:51:39 2015 - [info] MHA::MasterRotate version 0.56.
Mon Jan 19 01:51:39 2015 - [info] Starting online master switch..
Mon Jan 19 01:51:39 2015 - [info]
Mon Jan 19 01:51:39 2015 - [info] * Phase 1: Configuration Check Phase..
Mon Jan 19 01:51:39 2015 - [info]
Mon Jan 19 01:51:39 2015 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Jan 19 01:51:39 2015 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Mon Jan 19 01:51:39 2015 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Mon Jan 19 01:51:39 2015 - [info] GTID failover mode = 0
Mon Jan 19 01:51:39 2015 - [info] Current Alive Master: 192.168.2.128(192.168.2.128:3306)
Mon Jan 19 01:51:39 2015 - [info] Alive Slaves:
Mon Jan 19 01:51:39 2015 - [info] 192.168.2.129(192.168.2.129:3306) Version=5.5.30-log (oldest major version between slaves) log-bin:enabled
Mon Jan 19 01:51:39 2015 - [info] Replicating from 192.168.2.128(192.168.2.128:3306)
Mon Jan 19 01:51:39 2015 - [info] Primary candidate for the new Master (candidate_master is set)
Mon Jan 19 01:51:39 2015 - [info] 192.168.2.130(192.168.2.130:3306) Version=5.5.25-log (oldest major version between slaves) log-bin:enabled
Mon Jan 19 01:51:39 2015 - [info] Replicating from 192.168.2.128(192.168.2.128:3306)

It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on 192.168.2.128(192.168.2.128:3306)? (YES/no): yes
Mon Jan 19 01:51:46 2015 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..
Mon Jan 19 01:51:46 2015 - [info] ok.
Mon Jan 19 01:51:46 2015 - [info] Checking MHA is not monitoring or doing failover..
Mon Jan 19 01:51:46 2015 - [info] Checking replication health on 192.168.2.129..
Mon Jan 19 01:51:46 2015 - [info] ok.
Mon Jan 19 01:51:46 2015 - [info] Checking replication health on 192.168.2.130..
Mon Jan 19 01:51:46 2015 - [info] ok.
Mon Jan 19 01:51:46 2015 - [info] 192.168.2.129 can be new master.
Mon Jan 19 01:51:46 2015 - [info]
From:
192.168.2.128(192.168.2.128:3306) (current master)
+--192.168.2.129(192.168.2.129:3306)
+--192.168.2.130(192.168.2.130:3306)

To:
192.168.2.129(192.168.2.129:3306) (new master)
+--192.168.2.130(192.168.2.130:3306)
+--192.168.2.128(192.168.2.128:3306)

Starting master switch from 192.168.2.128(192.168.2.128:3306) to 192.168.2.129(192.168.2.129:3306)? (yes/NO): yes
Mon Jan 19 01:51:50 2015 - [info] Checking whether 192.168.2.129(192.168.2.129:3306) is ok for the new master..
Mon Jan 19 01:51:50 2015 - [info] ok.
Mon Jan 19 01:51:50 2015 - [info] 192.168.2.128(192.168.2.128:3306): SHOW SLAVE STATUS returned empty result. To check replication filtering rules, temporarily executing CHANGE MASTER to a dummy host.
Mon Jan 19 01:51:50 2015 - [info] 192.168.2.128(192.168.2.128:3306): Resetting slave pointing to the dummy host.
Mon Jan 19 01:51:50 2015 - [info] ** Phase 1: Configuration Check Phase completed.
Mon Jan 19 01:51:50 2015 - [info]
Mon Jan 19 01:51:50 2015 - [info] * Phase 2: Rejecting updates Phase..
Mon Jan 19 01:51:50 2015 - [info]
Mon Jan 19 01:51:50 2015 - [info] Executing master ip online change script to disable write on the current master:
Mon Jan 19 01:51:50 2015 - [info] /usr/local/bin/master_ip_online_change --command=stop --orig_master_host=192.168.2.128 --orig_master_ip=192.168.2.128 --orig_master_port=3306 --orig_master_user='root' --orig_master_password='123456' --new_master_host=192.168.2.129 --new_master_ip=192.168.2.129 --new_master_port=3306 --new_master_user='root' --new_master_password='123456' --orig_master_ssh_user=root --new_master_ssh_user=root --orig_master_is_new_slave
Mon Jan 19 01:51:50 2015 173112 Set read_only on the new master.. ok.
Mon Jan 19 01:51:50 2015 178943 Drpping app user on the orig master..
Mon Jan 19 01:51:50 2015 180438 Set read_only=1 on the orig master.. ok.
Mon Jan 19 01:51:50 2015 183258 Killing all application threads..
Mon Jan 19 01:51:50 2015 183387 done.
Mon Jan 19 01:51:50 2015 - [info] ok.
Mon Jan 19 01:51:50 2015 - [info] Locking all tables on the orig master to reject updates from everybody (including root):
Mon Jan 19 01:51:50 2015 - [info] Executing FLUSH TABLES WITH READ LOCK..
Mon Jan 19 01:51:50 2015 - [info] ok.
Mon Jan 19 01:51:50 2015 - [info] Orig master binlog:pos is mysql-bin.000017:107.
Mon Jan 19 01:51:50 2015 - [info] Waiting to execute all relay logs on 192.168.2.129(192.168.2.129:3306)..
Mon Jan 19 01:51:50 2015 - [info] master_pos_wait(mysql-bin.000017:107) completed on 192.168.2.129(192.168.2.129:3306). Executed 0 events.
Mon Jan 19 01:51:50 2015 - [info] done.
Mon Jan 19 01:51:50 2015 - [info] Getting new master's binlog name and position..
Mon Jan 19 01:51:50 2015 - [info] mysql-bin.000005:61791
Mon Jan 19 01:51:50 2015 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.2.129', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000005', MASTER_LOG_POS=61791, MASTER_USER='repl', MASTER_PASSWORD='xxx';
Mon Jan 19 01:51:50 2015 - [info] Executing master ip online change script to allow write on the new master:
Mon Jan 19 01:51:50 2015 - [info] /usr/local/bin/master_ip_online_change --command=start --orig_master_host=192.168.2.128 --orig_master_ip=192.168.2.128 --orig_master_port=3306 --orig_master_user='root' --orig_master_password='123456' --new_master_host=192.168.2.129 --new_master_ip=192.168.2.129 --new_master_port=3306 --new_master_user='root' --new_master_password='123456' --orig_master_ssh_user=root --new_master_ssh_user=root --orig_master_is_new_slave
Mon Jan 19 01:51:50 2015 443208 Set read_only=0 on the new master.
Mon Jan 19 01:51:50 2015 444741 Creating app user on the new master..
Mon Jan 19 01:51:50 2015 - [info] ok.
Mon Jan 19 01:51:50 2015 - [info]
Mon Jan 19 01:51:50 2015 - [info] * Switching slaves in parallel..
Mon Jan 19 01:51:50 2015 - [info]
Mon Jan 19 01:51:50 2015 - [info] -- Slave switch on host 192.168.2.130(192.168.2.130:3306) started, pid: 23040
Mon Jan 19 01:51:50 2015 - [info]
Mon Jan 19 01:51:50 2015 - [info] Log messages from 192.168.2.130 ...
Mon Jan 19 01:51:50 2015 - [info]
Mon Jan 19 01:51:50 2015 - [info] Waiting to execute all relay logs on 192.168.2.130(192.168.2.130:3306)..
Mon Jan 19 01:51:50 2015 - [info] master_pos_wait(mysql-bin.000017:107) completed on 192.168.2.130(192.168.2.130:3306). Executed 0 events.
Mon Jan 19 01:51:50 2015 - [info] done.
Mon Jan 19 01:51:50 2015 - [info] Resetting slave 192.168.2.130(192.168.2.130:3306) and starting replication from the new master 192.168.2.129(192.168.2.129:3306)..
Mon Jan 19 01:51:50 2015 - [info] Executed CHANGE MASTER.
Mon Jan 19 01:51:50 2015 - [info] Slave started.
Mon Jan 19 01:51:50 2015 - [info] End of log messages from 192.168.2.130 ...
Mon Jan 19 01:51:50 2015 - [info]
Mon Jan 19 01:51:50 2015 - [info] -- Slave switch on host 192.168.2.130(192.168.2.130:3306) succeeded.
Mon Jan 19 01:51:50 2015 - [info] Unlocking all tables on the orig master:
Mon Jan 19 01:51:50 2015 - [info] Executing UNLOCK TABLES..
Mon Jan 19 01:51:50 2015 - [info] ok.
Mon Jan 19 01:51:50 2015 - [info] Starting orig master as a new slave..
Mon Jan 19 01:51:50 2015 - [info] Resetting slave 192.168.2.128(192.168.2.128:3306) and starting replication from the new master 192.168.2.129(192.168.2.129:3306)..
Mon Jan 19 01:51:50 2015 - [info] Executed CHANGE MASTER.
Mon Jan 19 01:51:50 2015 - [info] Slave started.
Mon Jan 19 01:51:50 2015 - [info] All new slave servers switched successfully.
Mon Jan 19 01:51:50 2015 - [info]
Mon Jan 19 01:51:50 2015 - [info] * Phase 5: New master cleanup phase..
Mon Jan 19 01:51:50 2015 - [info]
Mon Jan 19 01:51:50 2015 - [info] 192.168.2.129: Resetting slave info succeeded.
Mon Jan 19 01:51:50 2015 - [info] Switching master to 192.168.2.129(192.168.2.129:3306) completed successfully.
192.168.2.131 [root bin]$

192.168.2.131 [root bin]$ masterha_master_switch --conf=/etc/masterha/app1.cnf --master_state=alive --new_master_host=192.168.2.129 --new_master_port=3306 --orig_master_is_new_slave --running_updates_limit=10000
Mon Jan 19 01:51:39 2015 - [info] MHA::MasterRotate version 0.56.
Mon Jan 19 01:51:39 2015 - [info] Starting online master switch..
Mon Jan 19 01:51:39 2015 - [info]
Mon Jan 19 01:51:39 2015 - [info] * Phase 1: Configuration Check Phase..
Mon Jan 19 01:51:39 2015 - [info]
Mon Jan 19 01:51:39 2015 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Jan 19 01:51:39 2015 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Mon Jan 19 01:51:39 2015 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Mon Jan 19 01:51:39 2015 - [info] GTID failover mode = 0
Mon Jan 19 01:51:39 2015 - [info] Current Alive Master: 192.168.2.128(192.168.2.128:3306)
Mon Jan 19 01:51:39 2015 - [info] Alive Slaves:
Mon Jan 19 01:51:39 2015 - [info] 192.168.2.129(192.168.2.129:3306) Version=5.5.30-log (oldest major version between slaves) log-bin:enabled
Mon Jan 19 01:51:39 2015 - [info] Replicating from 192.168.2.128(192.168.2.128:3306)
Mon Jan 19 01:51:39 2015 - [info] Primary candidate for the new Master (candidate_master is set)
Mon Jan 19 01:51:39 2015 - [info] 192.168.2.130(192.168.2.130:3306) Version=5.5.25-log (oldest major version between slaves) log-bin:enabled
Mon Jan 19 01:51:39 2015 - [info] Replicating from 192.168.2.128(192.168.2.128:3306) It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on 192.168.2.128(192.168.2.128:3306)? (YES/no): yes
Mon Jan 19 01:51:46 2015 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..
Mon Jan 19 01:51:46 2015 - [info] ok.
Mon Jan 19 01:51:46 2015 - [info] Checking MHA is not monitoring or doing failover..
Mon Jan 19 01:51:46 2015 - [info] Checking replication health on 192.168.2.129..
Mon Jan 19 01:51:46 2015 - [info] ok.
Mon Jan 19 01:51:46 2015 - [info] Checking replication health on 192.168.2.130..
Mon Jan 19 01:51:46 2015 - [info] ok.
Mon Jan 19 01:51:46 2015 - [info] 192.168.2.129 can be new master.
Mon Jan 19 01:51:46 2015 - [info]
From:
192.168.2.128(192.168.2.128:3306) (current master)
+--192.168.2.129(192.168.2.129:3306)
+--192.168.2.130(192.168.2.130:3306) To:
192.168.2.129(192.168.2.129:3306) (new master)
+--192.168.2.130(192.168.2.130:3306)
+--192.168.2.128(192.168.2.128:3306) Starting master switch from 192.168.2.128(192.168.2.128:3306) to 192.168.2.129(192.168.2.129:3306)? (yes/NO): yes
Mon Jan 19 01:51:50 2015 - [info] Checking whether 192.168.2.129(192.168.2.129:3306) is ok for the new master..
Mon Jan 19 01:51:50 2015 - [info] ok.
Mon Jan 19 01:51:50 2015 - [info] 192.168.2.128(192.168.2.128:3306): SHOW SLAVE STATUS returned empty result. To check replication filtering rules, temporarily executing CHANGE MASTER to a dummy host.
Mon Jan 19 01:51:50 2015 - [info] 192.168.2.128(192.168.2.128:3306): Resetting slave pointing to the dummy host.
Mon Jan 19 01:51:50 2015 - [info] ** Phase 1: Configuration Check Phase completed.
Mon Jan 19 01:51:50 2015 - [info]
Mon Jan 19 01:51:50 2015 - [info] * Phase 2: Rejecting updates Phase..
Mon Jan 19 01:51:50 2015 - [info]
Mon Jan 19 01:51:50 2015 - [info] Executing master ip online change script to disable write on the current master:
Mon Jan 19 01:51:50 2015 - [info] /usr/local/bin/master_ip_online_change --command=stop --orig_master_host=192.168.2.128 --orig_master_ip=192.168.2.128 --orig_master_port=3306 --orig_master_user='root' --orig_master_password='123456' --new_master_host=192.168.2.129 --new_master_ip=192.168.2.129 --new_master_port=3306 --new_master_user='root' --new_master_password='123456' --orig_master_ssh_user=root --new_master_ssh_user=root --orig_master_is_new_slave
Mon Jan 19 01:51:50 2015 173112 Set read_only on the new master.. ok.
Mon Jan 19 01:51:50 2015 178943 Drpping app user on the orig master..
Mon Jan 19 01:51:50 2015 180438 Set read_only=1 on the orig master.. ok.
Mon Jan 19 01:51:50 2015 183258 Killing all application threads..
Mon Jan 19 01:51:50 2015 183387 done.
Mon Jan 19 01:51:50 2015 - [info] ok.
Mon Jan 19 01:51:50 2015 - [info] Locking all tables on the orig master to reject updates from everybody (including root):
Mon Jan 19 01:51:50 2015 - [info] Executing FLUSH TABLES WITH READ LOCK..
Mon Jan 19 01:51:50 2015 - [info] ok.
Mon Jan 19 01:51:50 2015 - [info] Orig master binlog:pos is mysql-bin.000017:107.
Mon Jan 19 01:51:50 2015 - [info] Waiting to execute all relay logs on 192.168.2.129(192.168.2.129:3306)..
Mon Jan 19 01:51:50 2015 - [info] master_pos_wait(mysql-bin.000017:107) completed on 192.168.2.129(192.168.2.129:3306). Executed 0 events.
Mon Jan 19 01:51:50 2015 - [info] done.
Mon Jan 19 01:51:50 2015 - [info] Getting new master's binlog name and position..
Mon Jan 19 01:51:50 2015 - [info] mysql-bin.000005:61791
Mon Jan 19 01:51:50 2015 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.2.129', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000005', MASTER_LOG_POS=61791, MASTER_USER='repl', MASTER_PASSWORD='xxx';
Mon Jan 19 01:51:50 2015 - [info] Executing master ip online change script to allow write on the new master:
Mon Jan 19 01:51:50 2015 - [info] /usr/local/bin/master_ip_online_change --command=start --orig_master_host=192.168.2.128 --orig_master_ip=192.168.2.128 --orig_master_port=3306 --orig_master_user='root' --orig_master_password='123456' --new_master_host=192.168.2.129 --new_master_ip=192.168.2.129 --new_master_port=3306 --new_master_user='root' --new_master_password='123456' --orig_master_ssh_user=root --new_master_ssh_user=root --orig_master_is_new_slave
Mon Jan 19 01:51:50 2015 443208 Set read_only=0 on the new master.
Mon Jan 19 01:51:50 2015 444741 Creating app user on the new master..
Mon Jan 19 01:51:50 2015 - [info] ok.
Mon Jan 19 01:51:50 2015 - [info]
Mon Jan 19 01:51:50 2015 - [info] * Switching slaves in parallel..
Mon Jan 19 01:51:50 2015 - [info]
Mon Jan 19 01:51:50 2015 - [info] -- Slave switch on host 192.168.2.130(192.168.2.130:3306) started, pid: 23040
Mon Jan 19 01:51:50 2015 - [info]
Mon Jan 19 01:51:50 2015 - [info] Log messages from 192.168.2.130 ...
Mon Jan 19 01:51:50 2015 - [info]
Mon Jan 19 01:51:50 2015 - [info] Waiting to execute all relay logs on 192.168.2.130(192.168.2.130:3306)..
Mon Jan 19 01:51:50 2015 - [info] master_pos_wait(mysql-bin.000017:107) completed on 192.168.2.130(192.168.2.130:3306). Executed 0 events.
Mon Jan 19 01:51:50 2015 - [info] done.
Mon Jan 19 01:51:50 2015 - [info] Resetting slave 192.168.2.130(192.168.2.130:3306) and starting replication from the new master 192.168.2.129(192.168.2.129:3306)..
Mon Jan 19 01:51:50 2015 - [info] Executed CHANGE MASTER.
Mon Jan 19 01:51:50 2015 - [info] Slave started.
Mon Jan 19 01:51:50 2015 - [info] End of log messages from 192.168.2.130 ...
Mon Jan 19 01:51:50 2015 - [info]
Mon Jan 19 01:51:50 2015 - [info] -- Slave switch on host 192.168.2.130(192.168.2.130:3306) succeeded.
Mon Jan 19 01:51:50 2015 - [info] Unlocking all tables on the orig master:
Mon Jan 19 01:51:50 2015 - [info] Executing UNLOCK TABLES..
Mon Jan 19 01:51:50 2015 - [info] ok.
Mon Jan 19 01:51:50 2015 - [info] Starting orig master as a new slave..
Mon Jan 19 01:51:50 2015 - [info] Resetting slave 192.168.2.128(192.168.2.128:3306) and starting replication from the new master 192.168.2.129(192.168.2.129:3306)..
Mon Jan 19 01:51:50 2015 - [info] Executed CHANGE MASTER.
Mon Jan 19 01:51:50 2015 - [info] Slave started.
Mon Jan 19 01:51:50 2015 - [info] All new slave servers switched successfully.
Mon Jan 19 01:51:50 2015 - [info]
Mon Jan 19 01:51:50 2015 - [info] * Phase 5: New master cleanup phase..
Mon Jan 19 01:51:50 2015 - [info]
Mon Jan 19 01:51:50 2015 - [info] 192.168.2.129: Resetting slave info succeeded.
Mon Jan 19 01:51:50 2015 - [info] Switching master to 192.168.2.129(192.168.2.129:3306) completed successfully.
192.168.2.131 [root bin]$

在许多情况下, 需要将现有的主服务器迁移到另外一台服务器上。 比如主服务器硬件故障,RAID 控制卡需要重建,将主服务器移到性能更好的服务器上等等。维护主服务器引起性能下降, 导致停机时间至少无法写入数据。 另外, 阻塞或杀掉当前运行的会话会导致主主之间数据不一致的问题发生。 MHA 提供快速切换和优雅的阻塞写入,这个切换过程只需要 0.5-2s 的时间,这段时间内数据是无法写入的。在很多情况下,0.5-2s 的阻塞写入是可以接受的。因此切换主服务器不需要计划分配维护时间窗口。

MHA在线切换的大概过程:
(1)检测复制设置和确定当前主服务器
(2)确定新的主服务器
(3)阻塞写入到当前主服务器
(4)等待所有从服务器赶上复制
(5)授予写入到新的主服务器
(6)重新设置从服务器

注意,在线切换的时候应用架构需要考虑以下两个问题:

1.自动识别master和slave的问题(master的机器可能会切换),如果采用了vip的方式,基本可以解决这个问题。

2.负载均衡的问题(可以定义大概的读写比例,每台机器可承担的负载比例,当有机器离开集群时,需要考虑这个问题)

为了保证数据完全一致性,在最快的时间内完成切换,MHA的在线切换必须满足以下条件才会切换成功,否则会切换失败。

(1)所有slave的IO线程都在运行

(2)所有slave的SQL线程都在运行

(3)所有的show slave status的输出中Seconds_Behind_Master参数小于或者等于running_updates_limit秒,如果在切换过程中不指定running_updates_limit,那么默认情况下running_updates_limit为1秒。

(4)在master端,通过show processlist输出,没有一个更新花费的时间大于running_updates_limit秒。

在线切换步骤如下:

在MHA Manager服务器192.168.2.131上操作,首先,停掉MHA监控:

192.168.2.131 [root ~]$ masterha_stop --conf=/etc/masterha/app1.cnf
Stopped app1 successfully.
[1]+ Exit 1 nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/masterha/app1/manager.log 2>&1 (wd: /usr/local/bin)
(wd now: ~)
192.168.2.131 [root ~]$

执行在线切换命令:(以下是0.53版本的manager和node包报的错)

Starting master switch from 192.168.2.128(192.168.2.128:3306) to 192.168.2.129(192.168.2.129:3306)? (yes/NO): yes
Sun Jan 18 20:06:17 2015 - [info] Checking whether 192.168.2.129(192.168.2.129:3306) is ok for the new master..
Sun Jan 18 20:06:17 2015 - [info] ok.
Sun Jan 18 20:06:17 2015 - [info] 192.168.2.128(192.168.2.128:3306): SHOW SLAVE STATUS returned empty result. To check replication filtering rules, temporarily executing CHANGE MASTER to a dummy host.
Sun Jan 18 20:06:17 2015 - [info] 192.168.2.128(192.168.2.128:3306): Resetting slave pointing to the dummy host.
Sun Jan 18 20:06:17 2015 - [info] ** Phase 1: Configuration Check Phase completed.
Sun Jan 18 20:06:17 2015 - [info]
Sun Jan 18 20:06:17 2015 - [info] * Phase 2: Rejecting updates Phase..
Sun Jan 18 20:06:17 2015 - [info]
Sun Jan 18 20:06:17 2015 - [info] Executing master ip online change script to disable write on the current master:
Sun Jan 18 20:06:17 2015 - [info] /usr/local/bin/master_ip_online_change --command=stop --orig_master_host=192.168.2.128 --orig_master_ip=192.168.2.128 --orig_master_port=3306 --new_master_host=192.168.2.129 --new_master_ip=192.168.2.129 --new_master_port=3306
Got Error: DBI connect(';host=192.168.2.129;port=3306;mysql_connect_timeout=4','',...) failed: Access denied for user 'root'@'192.168.2.131' (using password: NO) at /usr/local/share/perl5/MHA/DBHelper.pm line 181
at /usr/local/bin/master_ip_online_change line 138 Sun Jan 18 20:06:17 2015 - [error][/usr/local/share/perl5/MHA/ManagerUtil.pm, ln178] Got ERROR: at /usr/local/bin/masterha_master_switch line 53

原因是脚本master_ip_online_change不完整,需要自己进行相应的修改,脚本中new_master_password这个变量获取不到,导致在线切换失败,所以进行了相关的硬编码,直接把mysql的root用户密码赋值给变量new_master_password,但mha4mysql-manager-0.56和mha4mysql-node-0.56版本已经不需要自己把密码直接赋值了,它自己能读出来,之前版本貌似在读new_master_password变量时,总获取不到值(perl脚本我也不太懂,需要大家一起来改善,哈哈)

下面来看来0.56版本的执行情况:

192.168.2.131 [root bin]$ masterha_master_switch --conf=/etc/masterha/app1.cnf --master_state=alive --new_master_host=192.168.2.129 --new_master_port=3306  --orig_master_is_new_slave --running_updates_limit=10000

192.168.2.131 [root bin]$ masterha_master_switch --conf=/etc/masterha/app1.cnf --master_state=alive --new_master_host=192.168.2.129 --new_master_port=3306 --orig_master_is_new_slave --running_updates_limit=10000
Mon Jan 19 01:51:39 2015 - [info] MHA::MasterRotate version 0.56.
Mon Jan 19 01:51:39 2015 - [info] Starting online master switch..
Mon Jan 19 01:51:39 2015 - [info]
Mon Jan 19 01:51:39 2015 - [info] * Phase 1: Configuration Check Phase..
Mon Jan 19 01:51:39 2015 - [info]
Mon Jan 19 01:51:39 2015 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Jan 19 01:51:39 2015 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Mon Jan 19 01:51:39 2015 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Mon Jan 19 01:51:39 2015 - [info] GTID failover mode = 0
Mon Jan 19 01:51:39 2015 - [info] Current Alive Master: 192.168.2.128(192.168.2.128:3306)
Mon Jan 19 01:51:39 2015 - [info] Alive Slaves:
Mon Jan 19 01:51:39 2015 - [info] 192.168.2.129(192.168.2.129:3306) Version=5.5.30-log (oldest major version between slaves) log-bin:enabled
Mon Jan 19 01:51:39 2015 - [info] Replicating from 192.168.2.128(192.168.2.128:3306)
Mon Jan 19 01:51:39 2015 - [info] Primary candidate for the new Master (candidate_master is set)
Mon Jan 19 01:51:39 2015 - [info] 192.168.2.130(192.168.2.130:3306) Version=5.5.25-log (oldest major version between slaves) log-bin:enabled
Mon Jan 19 01:51:39 2015 - [info] Replicating from 192.168.2.128(192.168.2.128:3306)

It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on 192.168.2.128(192.168.2.128:3306)? (YES/no): yes
Mon Jan 19 01:51:46 2015 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..
Mon Jan 19 01:51:46 2015 - [info] ok.
Mon Jan 19 01:51:46 2015 - [info] Checking MHA is not monitoring or doing failover..
Mon Jan 19 01:51:46 2015 - [info] Checking replication health on 192.168.2.129..
Mon Jan 19 01:51:46 2015 - [info] ok.
Mon Jan 19 01:51:46 2015 - [info] Checking replication health on 192.168.2.130..
Mon Jan 19 01:51:46 2015 - [info] ok.
Mon Jan 19 01:51:46 2015 - [info] 192.168.2.129 can be new master.
Mon Jan 19 01:51:46 2015 - [info]
From:
192.168.2.128(192.168.2.128:3306) (current master)
+--192.168.2.129(192.168.2.129:3306)
+--192.168.2.130(192.168.2.130:3306)

To:
192.168.2.129(192.168.2.129:3306) (new master)
+--192.168.2.130(192.168.2.130:3306)
+--192.168.2.128(192.168.2.128:3306)

Starting master switch from 192.168.2.128(192.168.2.128:3306) to 192.168.2.129(192.168.2.129:3306)? (yes/NO): yes
Mon Jan 19 01:51:50 2015 - [info] Checking whether 192.168.2.129(192.168.2.129:3306) is ok for the new master..
Mon Jan 19 01:51:50 2015 - [info] ok.
Mon Jan 19 01:51:50 2015 - [info] 192.168.2.128(192.168.2.128:3306): SHOW SLAVE STATUS returned empty result. To check replication filtering rules, temporarily executing CHANGE MASTER to a dummy host.
Mon Jan 19 01:51:50 2015 - [info] 192.168.2.128(192.168.2.128:3306): Resetting slave pointing to the dummy host.
Mon Jan 19 01:51:50 2015 - [info] ** Phase 1: Configuration Check Phase completed.
Mon Jan 19 01:51:50 2015 - [info]
Mon Jan 19 01:51:50 2015 - [info] * Phase 2: Rejecting updates Phase..
Mon Jan 19 01:51:50 2015 - [info]
Mon Jan 19 01:51:50 2015 - [info] Executing master ip online change script to disable write on the current master:
Mon Jan 19 01:51:50 2015 - [info] /usr/local/bin/master_ip_online_change --command=stop --orig_master_host=192.168.2.128 --orig_master_ip=192.168.2.128 --orig_master_port=3306 --orig_master_user='root' --orig_master_password='123456' --new_master_host=192.168.2.129 --new_master_ip=192.168.2.129 --new_master_port=3306 --new_master_user='root' --new_master_password='123456' --orig_master_ssh_user=root --new_master_ssh_user=root --orig_master_is_new_slave
Mon Jan 19 01:51:50 2015 173112 Set read_only on the new master.. ok.
Mon Jan 19 01:51:50 2015 178943 Drpping app user on the orig master..
Mon Jan 19 01:51:50 2015 180438 Set read_only=1 on the orig master.. ok.
Mon Jan 19 01:51:50 2015 183258 Killing all application threads..
Mon Jan 19 01:51:50 2015 183387 done.
Mon Jan 19 01:51:50 2015 - [info] ok.
Mon Jan 19 01:51:50 2015 - [info] Locking all tables on the orig master to reject updates from everybody (including root):
Mon Jan 19 01:51:50 2015 - [info] Executing FLUSH TABLES WITH READ LOCK..
Mon Jan 19 01:51:50 2015 - [info] ok.
Mon Jan 19 01:51:50 2015 - [info] Orig master binlog:pos is mysql-bin.000017:107.
Mon Jan 19 01:51:50 2015 - [info] Waiting to execute all relay logs on 192.168.2.129(192.168.2.129:3306)..
Mon Jan 19 01:51:50 2015 - [info] master_pos_wait(mysql-bin.000017:107) completed on 192.168.2.129(192.168.2.129:3306). Executed 0 events.
Mon Jan 19 01:51:50 2015 - [info] done.
Mon Jan 19 01:51:50 2015 - [info] Getting new master's binlog name and position..
Mon Jan 19 01:51:50 2015 - [info] mysql-bin.000005:61791
Mon Jan 19 01:51:50 2015 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.2.129', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000005', MASTER_LOG_POS=61791, MASTER_USER='repl', MASTER_PASSWORD='xxx';
Mon Jan 19 01:51:50 2015 - [info] Executing master ip online change script to allow write on the new master:
Mon Jan 19 01:51:50 2015 - [info] /usr/local/bin/master_ip_online_change --command=start --orig_master_host=192.168.2.128 --orig_master_ip=192.168.2.128 --orig_master_port=3306 --orig_master_user='root' --orig_master_password='123456' --new_master_host=192.168.2.129 --new_master_ip=192.168.2.129 --new_master_port=3306 --new_master_user='root' --new_master_password='123456' --orig_master_ssh_user=root --new_master_ssh_user=root --orig_master_is_new_slave
Mon Jan 19 01:51:50 2015 443208 Set read_only=0 on the new master.
Mon Jan 19 01:51:50 2015 444741 Creating app user on the new master..
Mon Jan 19 01:51:50 2015 - [info] ok.
Mon Jan 19 01:51:50 2015 - [info]
Mon Jan 19 01:51:50 2015 - [info] * Switching slaves in parallel..
Mon Jan 19 01:51:50 2015 - [info]
Mon Jan 19 01:51:50 2015 - [info] -- Slave switch on host 192.168.2.130(192.168.2.130:3306) started, pid: 23040
Mon Jan 19 01:51:50 2015 - [info]
Mon Jan 19 01:51:50 2015 - [info] Log messages from 192.168.2.130 ...
Mon Jan 19 01:51:50 2015 - [info]
Mon Jan 19 01:51:50 2015 - [info] Waiting to execute all relay logs on 192.168.2.130(192.168.2.130:3306)..
Mon Jan 19 01:51:50 2015 - [info] master_pos_wait(mysql-bin.000017:107) completed on 192.168.2.130(192.168.2.130:3306). Executed 0 events.
Mon Jan 19 01:51:50 2015 - [info] done.
Mon Jan 19 01:51:50 2015 - [info] Resetting slave 192.168.2.130(192.168.2.130:3306) and starting replication from the new master 192.168.2.129(192.168.2.129:3306)..
Mon Jan 19 01:51:50 2015 - [info] Executed CHANGE MASTER.
Mon Jan 19 01:51:50 2015 - [info] Slave started.
Mon Jan 19 01:51:50 2015 - [info] End of log messages from 192.168.2.130 ...
Mon Jan 19 01:51:50 2015 - [info]
Mon Jan 19 01:51:50 2015 - [info] -- Slave switch on host 192.168.2.130(192.168.2.130:3306) succeeded.
Mon Jan 19 01:51:50 2015 - [info] Unlocking all tables on the orig master:
Mon Jan 19 01:51:50 2015 - [info] Executing UNLOCK TABLES..
Mon Jan 19 01:51:50 2015 - [info] ok.
Mon Jan 19 01:51:50 2015 - [info] Starting orig master as a new slave..
Mon Jan 19 01:51:50 2015 - [info] Resetting slave 192.168.2.128(192.168.2.128:3306) and starting replication from the new master 192.168.2.129(192.168.2.129:3306)..
Mon Jan 19 01:51:50 2015 - [info] Executed CHANGE MASTER.
Mon Jan 19 01:51:50 2015 - [info] Slave started.
Mon Jan 19 01:51:50 2015 - [info] All new slave servers switched successfully.
Mon Jan 19 01:51:50 2015 - [info]
Mon Jan 19 01:51:50 2015 - [info] * Phase 5: New master cleanup phase..
Mon Jan 19 01:51:50 2015 - [info]
Mon Jan 19 01:51:50 2015 - [info] 192.168.2.129: Resetting slave info succeeded.
Mon Jan 19 01:51:50 2015 - [info] Switching master to 192.168.2.129(192.168.2.129:3306) completed successfully.
192.168.2.131 [root bin]$

192.168.2.131 [root bin]$ masterha_master_switch --conf=/etc/masterha/app1.cnf --master_state=alive --new_master_host=192.168.2.129 --new_master_port=3306 --orig_master_is_new_slave --running_updates_limit=10000
Mon Jan 19 01:51:39 2015 - [info] MHA::MasterRotate version 0.56.
Mon Jan 19 01:51:39 2015 - [info] Starting online master switch..
Mon Jan 19 01:51:39 2015 - [info]
Mon Jan 19 01:51:39 2015 - [info] * Phase 1: Configuration Check Phase..
Mon Jan 19 01:51:39 2015 - [info]
Mon Jan 19 01:51:39 2015 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Jan 19 01:51:39 2015 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Mon Jan 19 01:51:39 2015 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Mon Jan 19 01:51:39 2015 - [info] GTID failover mode = 0
Mon Jan 19 01:51:39 2015 - [info] Current Alive Master: 192.168.2.128(192.168.2.128:3306)
Mon Jan 19 01:51:39 2015 - [info] Alive Slaves:
Mon Jan 19 01:51:39 2015 - [info] 192.168.2.129(192.168.2.129:3306) Version=5.5.30-log (oldest major version between slaves) log-bin:enabled
Mon Jan 19 01:51:39 2015 - [info] Replicating from 192.168.2.128(192.168.2.128:3306)
Mon Jan 19 01:51:39 2015 - [info] Primary candidate for the new Master (candidate_master is set)
Mon Jan 19 01:51:39 2015 - [info] 192.168.2.130(192.168.2.130:3306) Version=5.5.25-log (oldest major version between slaves) log-bin:enabled
Mon Jan 19 01:51:39 2015 - [info] Replicating from 192.168.2.128(192.168.2.128:3306) It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on 192.168.2.128(192.168.2.128:3306)? (YES/no): yes
Mon Jan 19 01:51:46 2015 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..
Mon Jan 19 01:51:46 2015 - [info] ok.
Mon Jan 19 01:51:46 2015 - [info] Checking MHA is not monitoring or doing failover..
Mon Jan 19 01:51:46 2015 - [info] Checking replication health on 192.168.2.129..
Mon Jan 19 01:51:46 2015 - [info] ok.
Mon Jan 19 01:51:46 2015 - [info] Checking replication health on 192.168.2.130..
Mon Jan 19 01:51:46 2015 - [info] ok.
Mon Jan 19 01:51:46 2015 - [info] 192.168.2.129 can be new master.
Mon Jan 19 01:51:46 2015 - [info]
From:
192.168.2.128(192.168.2.128:3306) (current master)
+--192.168.2.129(192.168.2.129:3306)
+--192.168.2.130(192.168.2.130:3306) To:
192.168.2.129(192.168.2.129:3306) (new master)
+--192.168.2.130(192.168.2.130:3306)
+--192.168.2.128(192.168.2.128:3306) Starting master switch from 192.168.2.128(192.168.2.128:3306) to 192.168.2.129(192.168.2.129:3306)? (yes/NO): yes
Mon Jan 19 01:51:50 2015 - [info] Checking whether 192.168.2.129(192.168.2.129:3306) is ok for the new master..
Mon Jan 19 01:51:50 2015 - [info] ok.
Mon Jan 19 01:51:50 2015 - [info] 192.168.2.128(192.168.2.128:3306): SHOW SLAVE STATUS returned empty result. To check replication filtering rules, temporarily executing CHANGE MASTER to a dummy host.
Mon Jan 19 01:51:50 2015 - [info] 192.168.2.128(192.168.2.128:3306): Resetting slave pointing to the dummy host.
Mon Jan 19 01:51:50 2015 - [info] ** Phase 1: Configuration Check Phase completed.
Mon Jan 19 01:51:50 2015 - [info]
Mon Jan 19 01:51:50 2015 - [info] * Phase 2: Rejecting updates Phase..
Mon Jan 19 01:51:50 2015 - [info]
Mon Jan 19 01:51:50 2015 - [info] Executing master ip online change script to disable write on the current master:
Mon Jan 19 01:51:50 2015 - [info] /usr/local/bin/master_ip_online_change --command=stop --orig_master_host=192.168.2.128 --orig_master_ip=192.168.2.128 --orig_master_port=3306 --orig_master_user='root' --orig_master_password='123456' --new_master_host=192.168.2.129 --new_master_ip=192.168.2.129 --new_master_port=3306 --new_master_user='root' --new_master_password='123456' --orig_master_ssh_user=root --new_master_ssh_user=root --orig_master_is_new_slave
Mon Jan 19 01:51:50 2015 173112 Set read_only on the new master.. ok.
Mon Jan 19 01:51:50 2015 178943 Drpping app user on the orig master..
Mon Jan 19 01:51:50 2015 180438 Set read_only=1 on the orig master.. ok.
Mon Jan 19 01:51:50 2015 183258 Killing all application threads..
Mon Jan 19 01:51:50 2015 183387 done.
Mon Jan 19 01:51:50 2015 - [info] ok.
Mon Jan 19 01:51:50 2015 - [info] Locking all tables on the orig master to reject updates from everybody (including root):
Mon Jan 19 01:51:50 2015 - [info] Executing FLUSH TABLES WITH READ LOCK..
Mon Jan 19 01:51:50 2015 - [info] ok.
Mon Jan 19 01:51:50 2015 - [info] Orig master binlog:pos is mysql-bin.000017:107.
Mon Jan 19 01:51:50 2015 - [info] Waiting to execute all relay logs on 192.168.2.129(192.168.2.129:3306)..
Mon Jan 19 01:51:50 2015 - [info] master_pos_wait(mysql-bin.000017:107) completed on 192.168.2.129(192.168.2.129:3306). Executed 0 events.
Mon Jan 19 01:51:50 2015 - [info] done.
Mon Jan 19 01:51:50 2015 - [info] Getting new master's binlog name and position..
Mon Jan 19 01:51:50 2015 - [info] mysql-bin.000005:61791
Mon Jan 19 01:51:50 2015 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.2.129', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000005', MASTER_LOG_POS=61791, MASTER_USER='repl', MASTER_PASSWORD='xxx';
Mon Jan 19 01:51:50 2015 - [info] Executing master ip online change script to allow write on the new master:
Mon Jan 19 01:51:50 2015 - [info] /usr/local/bin/master_ip_online_change --command=start --orig_master_host=192.168.2.128 --orig_master_ip=192.168.2.128 --orig_master_port=3306 --orig_master_user='root' --orig_master_password='123456' --new_master_host=192.168.2.129 --new_master_ip=192.168.2.129 --new_master_port=3306 --new_master_user='root' --new_master_password='123456' --orig_master_ssh_user=root --new_master_ssh_user=root --orig_master_is_new_slave
Mon Jan 19 01:51:50 2015 443208 Set read_only=0 on the new master.
Mon Jan 19 01:51:50 2015 444741 Creating app user on the new master..
Mon Jan 19 01:51:50 2015 - [info] ok.
Mon Jan 19 01:51:50 2015 - [info]
Mon Jan 19 01:51:50 2015 - [info] * Switching slaves in parallel..
Mon Jan 19 01:51:50 2015 - [info]
Mon Jan 19 01:51:50 2015 - [info] -- Slave switch on host 192.168.2.130(192.168.2.130:3306) started, pid: 23040
Mon Jan 19 01:51:50 2015 - [info]
Mon Jan 19 01:51:50 2015 - [info] Log messages from 192.168.2.130 ...
Mon Jan 19 01:51:50 2015 - [info]
Mon Jan 19 01:51:50 2015 - [info] Waiting to execute all relay logs on 192.168.2.130(192.168.2.130:3306)..
Mon Jan 19 01:51:50 2015 - [info] master_pos_wait(mysql-bin.000017:107) completed on 192.168.2.130(192.168.2.130:3306). Executed 0 events.
Mon Jan 19 01:51:50 2015 - [info] done.
Mon Jan 19 01:51:50 2015 - [info] Resetting slave 192.168.2.130(192.168.2.130:3306) and starting replication from the new master 192.168.2.129(192.168.2.129:3306)..
Mon Jan 19 01:51:50 2015 - [info] Executed CHANGE MASTER.
Mon Jan 19 01:51:50 2015 - [info] Slave started.
Mon Jan 19 01:51:50 2015 - [info] End of log messages from 192.168.2.130 ...
Mon Jan 19 01:51:50 2015 - [info]
Mon Jan 19 01:51:50 2015 - [info] -- Slave switch on host 192.168.2.130(192.168.2.130:3306) succeeded.
Mon Jan 19 01:51:50 2015 - [info] Unlocking all tables on the orig master:
Mon Jan 19 01:51:50 2015 - [info] Executing UNLOCK TABLES..
Mon Jan 19 01:51:50 2015 - [info] ok.
Mon Jan 19 01:51:50 2015 - [info] Starting orig master as a new slave..
Mon Jan 19 01:51:50 2015 - [info] Resetting slave 192.168.2.128(192.168.2.128:3306) and starting replication from the new master 192.168.2.129(192.168.2.129:3306)..
Mon Jan 19 01:51:50 2015 - [info] Executed CHANGE MASTER.
Mon Jan 19 01:51:50 2015 - [info] Slave started.
Mon Jan 19 01:51:50 2015 - [info] All new slave servers switched successfully.
Mon Jan 19 01:51:50 2015 - [info]
Mon Jan 19 01:51:50 2015 - [info] * Phase 5: New master cleanup phase..
Mon Jan 19 01:51:50 2015 - [info]
Mon Jan 19 01:51:50 2015 - [info] 192.168.2.129: Resetting slave info succeeded.
Mon Jan 19 01:51:50 2015 - [info] Switching master to 192.168.2.129(192.168.2.129:3306) completed successfully.
192.168.2.131 [root bin]$

参数说明:

--orig_master_is_new_slave 切换时加上此参数是将原 master 变为 slave 节点,如果不加此参数,原来的 master 将不启动

--running_updates_limit=10000,故障切换时,候选master 如果有延迟的话, mha 切换不能成功,加上此参数表示延迟在此时间范围内都可切换(单位为s),但是切换的时间长短是由recover 时relay 日志的大小决定 

master_ip_online_change脚本代码如下:

#!/usr/bin/env perl

# Copyright (C) 2011 DeNA Co.,Ltd.
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc.,
# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA

## Note: This is a sample script and is not complete. Modify the script based on your environment.

use strict;
use warnings FATAL => 'all';

use Getopt::Long;
use MHA::DBHelper;
use MHA::NodeUtil;
use Time::HiRes qw( sleep gettimeofday tv_interval );
use Data::Dumper;

my $_tstart;
my $_running_interval = 0.1;
my (
$command, $orig_master_is_new_slave, $orig_master_host,
$orig_master_ip, $orig_master_port, $orig_master_user,
$orig_master_password, $orig_master_ssh_user, $new_master_host,
$new_master_ip, $new_master_port, $new_master_user,
$new_master_password, $new_master_ssh_user
);
my $vip = '192.168.2.88/24';
my $key = '1';
my $ssh_start_vip = "/sbin/ifconfig eth0:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig eth0:$key down";
my $orig_master_ssh_port = 22;
my $new_master_ssh_port = 22;
GetOptions(
'command=s' => \$command,
'orig_master_is_new_slave' => \$orig_master_is_new_slave,
'orig_master_host=s' => \$orig_master_host,
'orig_master_ip=s' => \$orig_master_ip,
'orig_master_port=i' => \$orig_master_port,
'orig_master_user=s' => \$orig_master_user,
'orig_master_password=s' => \$orig_master_password,
'orig_master_ssh_user=s' => \$orig_master_ssh_user,
'new_master_host=s' => \$new_master_host,
'new_master_ip=s' => \$new_master_ip,
'new_master_port=i' => \$new_master_port,
'new_master_user=s' => \$new_master_user,
'new_master_password=s' => \$new_master_password,
'new_master_ssh_user=s' => \$new_master_ssh_user,
'orig_master_ssh_port=i' => \$orig_master_ssh_port,
'new_master_ssh_port=i' => \$new_master_ssh_port,
);

exit &main();

sub current_time_us {
my ( $sec, $microsec ) = gettimeofday();
my $curdate = localtime($sec);
return $curdate . " " . sprintf( "%06d", $microsec );
}

sub sleep_until {
my $elapsed = tv_interval($_tstart);
if ( $_running_interval > $elapsed ) {
sleep( $_running_interval - $elapsed );
}
}

sub get_threads_util {
my $dbh = shift;
my $my_connection_id = shift;
my $running_time_threshold = shift;
my $type = shift;
$running_time_threshold = 0 unless ($running_time_threshold);
$type = 0 unless ($type);
my @threads;

my $sth = $dbh->prepare("SHOW PROCESSLIST");
$sth->execute();

while ( my $ref = $sth->fetchrow_hashref() ) {
my $id = $ref->{Id};
my $user = $ref->{User};
my $host = $ref->{Host};
my $command = $ref->{Command};
my $state = $ref->{State};
my $query_time = $ref->{Time};
my $info = $ref->{Info};
$info =~ s/^\s*(.*?)\s*$/$1/ if defined($info);
next if ( $my_connection_id == $id );
next if ( defined($query_time) && $query_time < $running_time_threshold );
next if ( defined($command) && $command eq "Binlog Dump" );
next if ( defined($user) && $user eq "system user" );
next
if ( defined($command)
&& $command eq "Sleep"
&& defined($query_time)
&& $query_time >= 1 );

if ( $type >= 1 ) {
next if ( defined($command) && $command eq "Sleep" );
next if ( defined($command) && $command eq "Connect" );
}

if ( $type >= 2 ) {
next if ( defined($info) && $info =~ m/^select/i );
next if ( defined($info) && $info =~ m/^show/i );
}

push @threads, $ref;
}
return @threads;
}

sub main {
if ( $command eq "stop" ) {
## Gracefully killing connections on the current master
# 1. Set read_only= 1 on the new master
# 2. DROP USER so that no app user can establish new connections
# 3. Set read_only= 1 on the current master
# 4. Kill current queries
# * Any database access failure will result in script die.
my $exit_code = 1;
eval {
## Setting read_only=1 on the new master (to avoid accident)
my $new_master_handler = new MHA::DBHelper();

# args: hostname, port, user, password, raise_error(die_on_error)_or_not
$new_master_handler->connect( $new_master_ip, $new_master_port,
$new_master_user, $new_master_password, 1 );
print current_time_us() . " Set read_only on the new master.. ";
$new_master_handler->enable_read_only();
if ( $new_master_handler->is_read_only() ) {
print "ok.\n";
}
else {
die "Failed!\n";
}
$new_master_handler->disconnect();

# Connecting to the orig master, die if any database error happens
my $orig_master_handler = new MHA::DBHelper();
$orig_master_handler->connect( $orig_master_ip, $orig_master_port,
$orig_master_user, $orig_master_password, 1 );

## Drop application user so that nobody can connect. Disabling per-session binlog beforehand
$orig_master_handler->disable_log_bin_local();
print current_time_us() . " Drpping app user on the orig master..\n";
#FIXME_xxx_drop_app_user($orig_master_handler);

## Waiting for N * 100 milliseconds so that current connections can exit
my $time_until_read_only = 15;
$_tstart = [gettimeofday];
my @threads = get_threads_util( $orig_master_handler->{dbh},
$orig_master_handler->{connection_id} );
while ( $time_until_read_only > 0 && $#threads >= 0 ) {
if ( $time_until_read_only % 5 == 0 ) {
printf
"%s Waiting all running %d threads are disconnected.. (max %d milliseconds)\n",
current_time_us(), $#threads + 1, $time_until_read_only * 100;
if ( $#threads < 5 ) {
print Data::Dumper->new( [$_] )->Indent(0)->Terse(1)->Dump . "\n"
foreach (@threads);
}
}
sleep_until();
$_tstart = [gettimeofday];
$time_until_read_only--;
@threads = get_threads_util( $orig_master_handler->{dbh},
$orig_master_handler->{connection_id} );
}

## Setting read_only=1 on the current master so that nobody(except SUPER) can write
print current_time_us() . " Set read_only=1 on the orig master.. ";
$orig_master_handler->enable_read_only();
if ( $orig_master_handler->is_read_only() ) {
print "ok.\n";
}
else {
die "Failed!\n";
}

## Waiting for M * 100 milliseconds so that current update queries can complete
my $time_until_kill_threads = 5;
@threads = get_threads_util( $orig_master_handler->{dbh},
$orig_master_handler->{connection_id} );
while ( $time_until_kill_threads > 0 && $#threads >= 0 ) {
if ( $time_until_kill_threads % 5 == 0 ) {
printf
"%s Waiting all running %d queries are disconnected.. (max %d milliseconds)\n",
current_time_us(), $#threads + 1, $time_until_kill_threads * 100;
if ( $#threads < 5 ) {
print Data::Dumper->new( [$_] )->Indent(0)->Terse(1)->Dump . "\n"
foreach (@threads);
}
}
sleep_until();
$_tstart = [gettimeofday];
$time_until_kill_threads--;
@threads = get_threads_util( $orig_master_handler->{dbh},
$orig_master_handler->{connection_id} );
}

## Terminating all threads
print current_time_us() . " Killing all application threads..\n";
$orig_master_handler->kill_threads(@threads) if ( $#threads >= 0 );
print current_time_us() . " done.\n";
$orig_master_handler->enable_log_bin_local();
$orig_master_handler->disconnect();

## After finishing the script, MHA executes FLUSH TABLES WITH READ LOCK
eval {
`ssh -p$orig_master_ssh_port $orig_master_ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
};
if ($@) {
warn $@;
}
$exit_code = 0;
};
if ($@) {
warn "Got Error: $@\n";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "start" ) {
## Activating master ip on the new master
# 1. Create app user with write privileges
# 2. Moving backup script if needed
# 3. Register new master's ip to the catalog database

# We don't return error even though activating updatable accounts/ip failed so that we don't interrupt slaves' recovery.
# If exit code is 0 or 10, MHA does not abort
my $exit_code = 10;
eval {
my $new_master_handler = new MHA::DBHelper();

# args: hostname, port, user, password, raise_error_or_not
$new_master_handler->connect( $new_master_ip, $new_master_port,
$new_master_user, $new_master_password, 1 );

## Set read_only=0 on the new master
$new_master_handler->disable_log_bin_local();
print current_time_us() . " Set read_only=0 on the new master.\n";
$new_master_handler->disable_read_only();

## Creating an app user on the new master
print current_time_us() . " Creating app user on the new master..\n";
#FIXME_xxx_create_app_user($new_master_handler);
$new_master_handler->enable_log_bin_local();
$new_master_handler->disconnect();

## Update master ip on the catalog database, etc
`ssh -p$new_master_ssh_port $new_master_ssh_user\@$new_master_host \" $ssh_start_vip \"`;
$exit_code = 0;
};
if ($@) {
warn "Got Error: $@\n";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "status" ) {

# do nothing
exit 0;
}
else {
&usage();
exit 1;
}
}

sub usage {
print
"Usage: master_ip_online_change --command=start|stop|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
die;
}

说明可以参考官网:https://code.google.com/p/mysql-master-ha/wiki/Parameters#master_ip_online_change_script(自备梯子)

2、修复宕机的Master 

通常情况下自动切换以后,原master可能已经废弃掉,待原master主机修复后,如果数据完整的情况下,可能想把原来master重新作为新主库的slave,这时我们可以借助当时自动切换时刻的MHA日志来完成对原master的修复。下面是提取相关日志的命令:

从上面信息可以看到:

All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.2.129', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000005', MASTER_LOG_POS=61791, MASTER_USER='repl', MASTER_PASSWORD='xxx';

意思是说,如果Master主机修复好了,可以在修复好后的Master执行CHANGE MASTER操作,作为新的slave库。

目前高可用方案可以一定程度上实现数据库的高可用,比如前面文章介绍的MMMheartbeat+drbdCluster等。还有percona的Galera Cluster等。这些高可用软件各有优劣。在进行高可用方案选择时,主要是看业务还有对数据一致性方面的要求。最后出于对数据库的高可用和数据一致性的要求,推荐使用MHA架构。

总结:

1.mha manager没有运行

2.集群运行正常

3.执行切换后old master变成第二master,与原第二master互换角色(在数据了很小情况下测试得出的,并且主备是同步的,这种切换对主备同步有要求,不能差距太多)

4.还可以执行同样操作,再次切换角色

MHA的在线切换后的一些总结(mha方案来自网络)的更多相关文章

  1. 关于mha手动切换的一些记录(mha方案来自网络)

    mha方案出自:http://www.cnblogs.com/xuanzhi201111/p/4231412.html 当主服务器故障时,人工手动调用MHA来进行故障切换操作,具体命令如下: 先停MH ...

  2. MHA手动在线切换主 原创3(主不参与复制)

    monitor 执行:slave2连接到slave1,server1 不做(主/从复制角色,停在那里) [root@monitor app1]# masterha_master_switch --co ...

  3. MHA在线切换的步骤及原理

    在日常工作中,会碰到如下的场景,如mysql数据库升级,主服务器硬件升级等,这个时候就需要将写操作切换到另外一台服务器上,那么如何进行在线切换呢?同时,要求切换过程短,对业务的影响比较小. MHA就提 ...

  4. MySQL高可用方案MHA在线切换的步骤及原理

    在日常工作中,会碰到如下的场景,如mysql数据库升级,主服务器硬件升级等,这个时候就需要将写操作切换到另外一台服务器上,那么如何进行在线切换呢?同时,要求切换过程短,对业务的影响比较小. MHA就提 ...

  5. MySQL 高可用架构 之 MHA (Centos 7.5 MySQL 5.7.18 MHA 0.58)

    目录 简介 环境准备 秘钥互信 安装基础依赖包 安装MHA组件 安装 MHA Node组件 安装 MHA Manager 组件 建立 MySQL 一主三从 初始化 MySQL 启动MySQL 并简单配 ...

  6. MHA在线切换过程

    MHA 在线切换是MHA除了自动监控切换换提供的另外一种方式,多用于诸如硬件升级,MySQL数据库迁移等等.该方式提供快速切换和优雅的阻塞写入,无关关闭原有服务器,整个切换过程在0.5-2s 的时间左 ...

  7. mysql mha 主从自动切换 高可用

    mha(Master High Availability)目前在MySQL多服务器(超过二台),高可用方面是一个相对成熟的解决方案. 一,什么是mha,有什么特性 1. 主服务器的自动监控和故障转移 ...

  8. 使用DBMS_REDEFINITION在线切换普通表到分区表

    随着数据库数据量的不断增长,有些表须要由普通的堆表转换为分区表的模式.有几种不同的方法来对此进行操作.诸如导出表数据,然后创建分区表再导入数据到分区表.使用EXCHANGE PARTITION方式来转 ...

  9. MySQL 有关MHA搭建与切换的几个错误log

    1:masterha_check_repl 副本集方面报错  replicates is not defined in the configuration file! 具体信息如下: # /usr/l ...

随机推荐

  1. Python 监控nginx服务是否正常

    Python 监控nginx服务是否正常 #!/usr/bin/env python import os, sys, time from time import strftime while True ...

  2. myBatis,Spring,SpringMVC三大框架ssm整合模板

    整合步骤 创建web工程 导入整合所需的所有jar包 编写各层需要的配置文件 1) mybatis的全局配置文件 <configuration>    <!-- 批量别名的设置 -- ...

  3. 循序渐进Python3(十一) --0-- web之html

    HTML: HTML是英文Hyper Text Mark-up Language(超文本标记语言)的缩写,他是一种制作万维网页面标准语言(标记). 相当于定义统一的一套规则,大家都来遵守他,这样就可以 ...

  4. 初学c# -- 学习笔记(五) winfrom自定义滚动条

    找了些例子,要么庞大.要么搞个安装组件什么的,我要求能用就行了.实在找例子修改麻烦,就做了一个.其实实现挺简单,就是panel或图片什么的跟着鼠标走就行了. 这里panel自己可以加背景图或直接搞个图 ...

  5. PHP用户名用星号处理

    PHP用户名用*号处理: 用户名:英文.中文.中英文混合的.中英文字符混合的 处理为:首字母和末尾保留,中间用*号代替(一个字符直接显示,两个字符:张*,三个以上字符:宋*丹) 首先判断字符中是否包含 ...

  6. mysql查询表的数据大小

    在需要备份数据库里面的数据时,我们需要知道数据库占用了多少磁盘大小,可以通过一些sql语句查询到整个数据库的容量,也可以单独查看表所占容量. 1.要查询表所占的容量,就是把表的数据和索引加起来就可以了 ...

  7. 使用Carthage管理iOS依赖库

    Carthage安装和使用和CocoaPods类似: 1.安装: 终端执行以下命令: $ brew update $ brew install carthage 查看Carthage的版本号: $ c ...

  8. 修复 Firefox 下本地使用 Bootstrap 3 时 glyphicon 不显示问题

    本地开发使用 Firefox 调试,遇到了 glyphicon 图标不显示的问题,期初以为是路径问题,搜索一大圈后找到了答案,原来这是一个安全性的问题,于是问题就好办了,解决方案如下: 1. 在Fir ...

  9. leveldb 学习笔记之log结构与存取流程

    log文件的格式 log文件每一条记录由四个部分组成: CheckSum,即CRC验证码,占4个字节 记录长度,即数据部分的长度,2个字节 类型,这条记录的类型,后续讲解,1个字节 数据,就是这条记录 ...

  10. 如何:对 Windows 窗体控件进行线程安全调用

    http://msdn.microsoft.com/zh-cn/library/ms171728(VS.90).aspx http://msdn.microsoft.com/zh-cn/library ...