一、简介

MHA的逻辑是,为了保证其MySQL的高可用,会有一个StandBy状态的master.在mysql故障切换的过程中,MHA 能做到在 0~30 秒内自动完成数据库的故障切换操作,并且在进行故障切换的过程中,MHA 能最大程度的保证数据的一致性,以达到相对意义上的高可用。

1.1MHA角色

如下图,整个 MHA 架构分为

  • MHA Manager 节点
  • MHA Node 节点

    其中 MHA Manager 节点是单点部署,MHA Node 节点是部署在每个需要监控的 MySQL 集群节点上的。MHA Manager 会定时探测集群中的 Master 节点,当 Master 出现故障时,它可以自动将最新数据的 Standby Master 或 Slave 提升为新的 Master,然后将其他的 Slave 重新指向新的 Master。

二、MHA的工具

Manager节点:

  • masterha_check_ssh:MHA 依赖的 ssh 环境监测工具;
  • masterha_check_repl:MYSQL 复制环境检测工具;
  • masterga_manager:MHA 服务主程序;
  • masterha_check_status:MHA 运行状态探测工具;
  • masterha_master_monitor:MYSQL master 节点可用性监测工具;
  • masterha_master_swith:master:节点切换工具;
  • masterha_conf_host:添加或删除配置的节点;
  • masterha_stop:关闭 MHA 服务的工具。

Node节点:(这些工具通常由MHA Manager的脚本触发,无需人为操作)

  • save_binary_logs:保存和复制 master 的二进制日志;
  • apply_diff_relay_logs:识别差异的中继日志事件并应用于其他 slave;
  • purge_relay_logs:清除中继日志(不会阻塞 SQL 线程);

自定义扩展:

  • secondary_check_script:通过多条网络路由检测master的可用性;
  • master_ip_failover_script:更新application使用的masterip;
  • report_script:发送报告;
  • init_conf_load_script:加载初始配置参数;
  • master_ip_online_change_script;更新master节点ip地址。

三、MHA部署过程

3.1.1 配置

MHA 对 MYSQL 复制环境有特殊要求,例如各节点都要开启二进制日志及中继日志,各从节点必须显示启用其read-only属性,并关闭relay_log_purge功能等,这里对配置做事先说明。

3.1.2 环境规划

机器名 IP 角色 备注
manager 172.30.200.100 manager控制器 用于管理和故障切换
master 172.30.200.101 数据库主服务器 开启binlog,relay-log。关闭relay_log_purge
slave1 172.30.200.102 数据库从服务器 开启binlog,relay-log。关闭relay_log_purge
slave2 172.30.200.103 数据库从服务器 开启binlog,relay-log。关闭relay_log_purge

在各个节点的/etc/hosts文件配置内容添加如下:

[root@localhost ~]# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
172.30.200.100 arpmgr
172.30.200.101 arpmaster
172.30.200.102 arpslave1
172.30.200.103 arpslave2

创建binlog的目录

mkdir -p /data/mysqldata/binlog
chown -R mysql:mysql /data/mysqldata/binlog

101节点配置:

server-id = 200101

log-bin = /data/mysqldata/binlog/mysql-bin
binlog_format= row
max_binlog_size= 512m
relay-log = /data/mysqldata/binlog/relay-bin
expire-logs-days = 14 lower_case_table_names = 1
character-set-server = utf8
log_slave_updates = 1

102节点配置:

server-id = 200102

log-bin = /data/mysqldata/binlog/mysql-bin
binlog_format= row
max_binlog_size= 512m
relay-log = /data/mysqldata/binlog/relay-bin
expire-logs-days = 14 read_only = ON
relay_log_purge = 0 lower_case_table_names = 1
character-set-server = utf8
log_slave_updates = 1

103节点配置:


server-id = 200103 log-bin = /data/mysqldata/binlog/mysql-bin
binlog_format= row
max_binlog_size= 512m
relay-log = /data/mysqldata/binlog/relay-bin
read_only = ON
relay_log_purge = 0
expire-logs-days = 14 lower_case_table_names = 1
character-set-server = utf8
log_slave_updates = 1

3.1.3 配置一主多从

master节点配置:

MariaDB [(none)]>grant replication slave,replication client on *.* to 'repl'@'172.30.200.%' identified by 'repl7101';
MariaDB [(none)]> show master status;
+------------------+----------+--------------+------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+------------------+----------+--------------+------------------+
| mysql-bin.000001 | 548 | | |
+------------------+----------+--------------+------------------+
1 row in set (0.000 sec)

slave节点配置:

grant replication slave,replication client on *.* to 'repl'@'172.30.200.%' identified by 'repl7101';

change master to master_host='172.30.200.101',
master_user='repl',
master_password='repl7101',
master_log_file='mysql-bin.000001',
master_log_pos=548; start slave;
show slave status\G;

至此,一主多从配置完毕。

3.2 MHA配置

3.2.1 master权限授予

可以在所有节点上面配置,其拥有管理权限,目前只需在master结点上设置权限:

grant all on *.* to 'mhaadmin'@'172.30.%.%' identified by 'mha7101';
grant all on *.* to 'mhaadmin'@'arpmgr' identified by 'mha7101';

3.2.2 ssh互信

四个节点都执行如下语句:

ssh-keygen -t rsa
ssh-copy-id -i .ssh/id_rsa.pub root@arpmgr

然后在arpmgr结点上面,可以看到authorized_keys文件的信息内容如下:

[root@localhost .ssh]# cat authorized_keys
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDY3yFhR5uzcFEpd+q+1Uw/cRF9ZRraygms7OZoefLFzY/ydSi6yYuCighG8WquvRep7XDNjFI71HAUagSoXiyPoCe1lqEnzpxSc+fQpIeQqEhUmLJ2bk+R83EskzwRGh+S/D4yp/swWz1vRgUGoTWevLCs33q7ZrsM8i+jB0uwZmzOV+CyQAPW9vLkRjZa4y1sx65lbR0HbdTQWQYZ4IyZauoU8XQjAIOs/CdLw2nBt8dPO53jT7NS7Ywx6eu/Wj9k/sYVVZT3jTb+pBIVs+Du5+tdUDX5aLKzxINpLlqNhorNevoC9iE0Ame1qvYonQfyWQ52Ae0y+58vFfG6PyV3 root@localhost.localdomain
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC/ZPihYSC6ArawKRU75aQRVSFsQ5S89SrYHGWdzyluB4spj+UDUmWH1kLGYr715/HD5hh22KdLmIs7R4jviOeao1HK52fpMvklYaNtYRHuV63Zkg5sOLvLfhrHdta9wuHlW1NyWx75+wIl2LvKBRtnSddwf5ZvitJ/kChf2gpNhHAWidyjGsPoJdr0OBCNHvz1y6oON6cnMb07ExaIjptRnkbCOU0QSVjFq4+Jmh8zTTbJC2up50s15gSfWXH0+WLXmJXJGkvgHdSYqw4vJt/l25f5qAKKZsfnyfC0iyct4GyHPF6trpvQ/c2lqr/Rg4xLWgdxlyt4aBJYl5adIRK/ root@localhost.localdomain
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDba26wV0KwQNTb4pKuiFDCcVMNRLGMXSiJC8ucN4/KIqzoOYJ747QL8GL5F8ePnRaZ1rtOwdjnlTiC0a4Tcg4JLs+JSnJgzvepuixmGgSJfLbJ36iN1WFh6fP2GZEDdR7Qum4sBUpQyYJ20Kf9rKfQQv2wq6csK5IlFk/OoO+zTySauLnYvRxvKY2avVDXPPFJvpqimKXn59MIAoJr6YEKvncbYyqvrSgUy7klZDys9IIjYcWfO7VKjQ5bwbHrrKtNbedME+KPQld7e8ZVL66Omik4Z6ip7DQEHRKWMmuBIpL99AgOOjPLbzJFWLUPOwvy3DtmEBnZ+0NVf/1obC11 root@localhost.localdomain
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCwrgZtGC31EgixeY4SVl4h64m1r8LdL3hM4Be2/I+6Xw7hCzZyKKTAFgz9W/ukfx6WmZwoqp1VO/7Jp6KO1FhYOi5u0q6J1KIObFNp+3E6cB2P0q39WqmZpQ9cNPYrbs9U2Ej0L0JwUtf/xLh334PaSlv/LcNy+p1dWya2OqsBeraiXZ4MgEBzcb+0twkpfpD327VgT/mRHPmA6fPRJOOJti1u4isHeotE4i13YIqQYfBfmbfiLdXKAvgI8FuTf0i91Re/FUBOgBfBcJbqIQNR0Nh5wZ/LvNxkstDQvypZIZwiK+wN+aZZOQ7jF/+997Z9QQleC9OOoHOJR7+fisLb root@localhost.localdomain

正好有四个形如ssh-rsa 相关 的密钥信息。

把如上的公钥信息,拷贝到其余四台服务器上面:

scp authorized_keys root@arpmaster:~/.ssh/
scp authorized_keys root@arpslave1:~/.ssh/
scp authorized_keys root@arpslave2:~/.ssh/

测试ssh是否可用

[root@localhost .ssh]# ssh arpmaster
[root@localhost ~]# ssh arpslave1
[root@localhost ~]# ssh arpslave2
[root@localhost ~]# ssh arpmgr

3.2.3 安装mha包

mha安装包分为两个,一个是node,另外一个是manager

四个节点安装:mha4mysql-node-0.57-0.el7.centos.noarch.rpm

管理节点安装:mha4mysql-manager-0.57-0.el7.centos.noarch.rpm

在安装``mha4mysql-node-0.57-0.el7.centos.noarch.rpm过程中,有对perl-DBD-mysql,perl-DBI`前置依赖,安装步骤如下:

yum install perl-DBD-mysql perl-DBI

在安装``mha4mysql-manager-0.57-0.el7.centos.noarch.rpm过程中,有对perl`前置依赖,安装步骤如下:

安装yum 扩展包
yum install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm yum install perl-ExtUtils-CBuilder perl-ExtUtils-MakeMaker perl-Config-Tiny perl-Log-Dispatch-* perl-Parallel-ForkManager

然后安装信息,都成功,如下:

[root@localhost ~]# rpm -ivh mha4mysql-node-0.57-0.el7.centos.noarch.rpm
准备中... ################################# [100%]
正在升级/安装...
1:mha4mysql-node-0.58-0.el7.centos ################################# [100%]
[root@localhost ~]# rpm -ivh mha4mysql-manager-0.57-0.el7.centos.noarch.rpm
准备中... ################################# [100%]
正在升级/安装...
1:mha4mysql-manager-0.58-0.el7.cent################################# [100%]

0.58中有一个super_read_only不可用在mariadb,所以使用0.57版本。

3.2.4 MHA管理节点配置

[root@localhost ~]# cd /etc/mha_master/
[root@localhost mha_master]# vi /etc/mha_master/mha.cnf

配置文件内容如下:

[server default]
user=mhaadmin
password=mha7101
manager_workdir=/etc/mha_master/app1
manager_log=/etc/mha_master/manager.log
remote_workdir=/data/mha_master/app1
repl_user=repl
repl_password=repl7101
ping_interval=1
[server1]
hostname=172.30.200.101
ssh_port=22
[server2]
hostname=172.30.200.102
ssh_port=22
candidate_master=1
[server3]
hostname=172.30.200.103
ssh_port=22
no_master=1

3.2.5 MHA节点检测

  1. 在管理节点检测ssh连通性如下:

    [root@localhost ~]# masterha_check_ssh -conf=/etc/mha_master/mha.cnf
    
    

    有如下日志,代表正常:

    Thu Jan  9 14:43:09 2020 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
    Thu Jan 9 14:43:09 2020 - [info] Reading application default configuration from /etc/mha_master/mha.cnf..
    root@172.30.200.103(172.30.200.103:22) to root@172.30.200.101(172.30.200.101:22)..
    Thu Jan 9 14:43:11 2020 - [debug] ok.
    Thu Jan 9 14:43:11 2020 - [debug] Connecting via SSH from root@172.30.200.103(172.30.200.103:22) to root@172.30.200.102(172.30.200.102:22)..
    Thu Jan 9 14:43:11 2020 - [debug] ok.
    Thu Jan 9 14:43:12 2020 - [info] All SSH connection tests passed successfully.
  2. 检测MySQL replication是否正常

    masterha_check_repl --conf=/etc/mha_master/mha.cnf

    有如下日志,说明正常:

    Thu Jan  9 14:44:54 2020 - [info] Slaves settings check done.
    Thu Jan 9 14:44:54 2020 - [info]
    172.30.200.101(172.30.200.101:3306) (current master)
    +--172.30.200.102(172.30.200.102:3306)
    +--172.30.200.103(172.30.200.103:3306) Thu Jan 9 14:44:54 2020 - [info] Checking replication health on 172.30.200.102..
    Thu Jan 9 14:44:54 2020 - [info] ok.
    Thu Jan 9 14:44:54 2020 - [info] Checking replication health on 172.30.200.103..
    Thu Jan 9 14:44:54 2020 - [info] ok.
    Thu Jan 9 14:44:54 2020 - [warning] master_ip_failover_script is not defined.
    Thu Jan 9 14:44:54 2020 - [warning] shutdown_script is not defined.
    Thu Jan 9 14:44:54 2020 - [info] Got exit code 0 (Not master dead). MySQL Replication Health is OK.

3.2.6 MHA启动

  1. 启动mha manager:

    nohup masterha_manager --conf=/etc/mha_master/mha.cnf &> /etc/mha_master/manager.log &
  2. 检测master节点状态:

    [root@localhost ~]# masterha_check_status --conf=/etc/mha_master/mha.cnf
    mha (pid:31709) is running(0:PING_OK), master:172.30.200.101

    说明主数据库172.30.200.101启动正常。

  3. 关闭mha manager:

    masterha_stop -conf=/etc/mha_master/mha.cnf

3.2.7 MHA模拟故障

  1. master直接kill mysql节点

    [root@localhost ~]# ps -ef |grep mysql
    root 19864 1 0 08:51 ? 00:00:00 /bin/sh /usr/local/mysql/bin/mysqld_safe --datadir=/data/mysqldata --pid-file=/data/mysqldata/localhost.localdomain.pid
    mysql 19976 19864 0 08:51 ? 00:00:13 /usr/local/mysql/bin/mysqld --basedir=/usr/local/mysql --datadir=/data/mysqldata --plugin-dir=/usr/local/mysql/lib/plugin --user=mysql --log-error=/data/mysqldata/mysqld.log --pid-file=/data/mysqldata/localhost.localdomain.pid --socket=/tmp/mysql.sock
    root 22166 21525 0 14:55 pts/0 00:00:00 grep --color=auto mysql
    [root@localhost ~]# kill -9 19864 19976
  2. MHA转移日志。

    [root@localhost ~]# tail -f /etc/mha_master/manager.log
    
    From:
    172.30.200.101(172.30.200.101:3306) (current master)
    +--172.30.200.102(172.30.200.102:3306)
    +--172.30.200.103(172.30.200.103:3306) To:
    172.30.200.102(172.30.200.102:3306) (new master)
    +--172.30.200.103(172.30.200.103:3306) Master 172.30.200.101(172.30.200.101:3306) is down! Check MHA Manager logs at localhost.localdomain:/etc/mha_master/manager.log for details. Started automated(non-interactive) failover.
    The latest slave 172.30.200.102(172.30.200.102:3306) has all relay logs for recovery.
    Selected 172.30.200.102(172.30.200.102:3306) as a new master.
    172.30.200.102(172.30.200.102:3306): OK: Applying all logs succeeded.
    172.30.200.103(172.30.200.103:3306): This host has the latest relay log events.
    Generating relay diff files from the latest slave succeeded.
    172.30.200.103(172.30.200.103:3306): OK: Applying all logs succeeded. Slave started, replicating from 172.30.200.102(172.30.200.102:3306)
    172.30.200.102(172.30.200.102:3306): Resetting slave info succeeded.
    Master failover to 172.30.200.102(172.30.200.102:3306) completed successfully.

    从上述日志来看,172.30.200.102已经成为了新的master,而172.30.200.103还是slave数据库。

3.2.8 修复已损坏的节点加入MHA中

由于这里是实验环境,可以不到处mysqldump的备份。如果是生产环境恢复,可以停掉slave的SQL thread,记住对应的pos的位置,然后备份出数据,保证数据一致性之后,同步数据,恢复损坏的结点。

change master to master_host='172.30.200.102',
master_user='repl',
master_password='repl7101',
master_log_file='mysql-bin.000003',
master_log_pos=401;

查看slave状态:

MariaDB [(none)]> start slave;

MariaDB [(none)]> show slave status\G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 172.30.200.102
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000003
Read_Master_Log_Pos: 401
Relay_Log_File: relay-bin.000002
Relay_Log_Pos: 555
Relay_Master_Log_File: mysql-bin.000003
Slave_IO_Running: Yes
Slave_SQL_Running: Yes

再次启动,如下:

[root@localhost ~]# nohup masterha_manager --conf=/etc/mha_master/mha.cnf &> /etc/mha_master/manager.log &
[root@localhost ~]# masterha_check_status --conf=/etc/mha_master/mha.cnf
mha (pid:31905) is running(0:PING_OK), master:172.30.200.101

至此,MHA实验完毕。由于生产环境会用到VIP,后续会继续编写。

MHA 问题集锦

错误一

日志错误:

Thu Jan  9 11:31:36 2020 - [info]   Connecting to root@172.30.200.102(172.30.200.102:22)..
Can't exec "mysqlbinlog": 没有那个文件或目录 at /usr/share/perl5/vendor_perl/MHA/BinlogManager.pm line 106.
mysqlbinlog version command failed with rc 1:0, please verify PATH, LD_LIBRARY_PATH, and client options

解决方法:

ln -s /usr/local/mysql/bin/mysqlbinlog /usr/bin/mysqlbinlog

错误二

日志错误:

Checking if super_read_only is defined and turned on..DBI connect(';host=172.30.200.102;port=3306','mhaadmin',...) failed: Access denied for user 'mhaadmin'@'arpslave1' (using password: YES) at /usr/share/perl5/vendor_perl/MHA/SlaveUtil.pm line 239

解决方法:

manager节点,执行:

grant all on *.* to 'mhaadmin'@'arpmgr' identified by 'mha7101';
grant all on *.* to 'mhaadmin'@'arpmaster' identified by 'mha7101';
grant all on *.* to 'mhaadmin'@'arpslave1' identified by 'mha7101';
grant all on *.* to 'mhaadmin'@'arpslave2' identified by 'mha7101';

错误三

日志如下:

    Testing mysql connection and privileges..sh: mysql: 未找到命令
mysql command failed with rc 127:0!

解决方法:

ln -s /usr/local/mysql/bin/mysql /usr/local/bin/mysql

[MariaDB]MHA高可用部署-实验的更多相关文章

  1. Rancher Server HA的高可用部署实验-学习笔记

    转载于https://blog.csdn.net/csdn_duomaomao/article/details/78771731 Rancher Server HA的高可用部署实验-学习笔记 一.机器 ...

  2. MySQL MHA 高可用集群部署及故障切换

    MySQL MHA 高可用集群部署及故障切换 1.概念 2.搭建MySQL + MHA 1.概念: a)MHA概念 : MHA(MasterHigh Availability)是一套优秀的MySQL高 ...

  3. MHA 高可用集群搭建(二)

    MHA 高可用集群搭建安装scp远程控制http://www.cnblogs.com/kevingrace/p/5662839.html yum install openssh-clients mys ...

  4. MHA高可用架构与Atlas读写分离

    1.1 MHA简介 1.1.1 MHA软件介绍 MHA(Master High Availability)目前在MySQL高可用方面是一个相对成熟的解决方案,它由日本DeNA公司youshimaton ...

  5. Mysql MHA高可用集群架构

    ** 记得之前发过一篇文章,名字叫<浅析MySQL高可用架构>,之后一直有很多小伙伴在公众号后台或其它渠道问我,何时有相关的深入配置管理文章出来,因此,民工哥,也将对前面的各类架构逐一进行 ...

  6. MHA高可用 MHA+Keepalive

    MHA高可用 MHA简介 MHA(Master High Availability)目前在MySQL高可用方面是一个相对成熟的解决方案,它由日本DeNA公司youshimaton(现就职于Facebo ...

  7. Keepalived+LVS实现LNMP网站的高可用部署

    Keepalived+LVS实现LNMP网站的高可用部署   项目需求   当我们访问某个网站的时候可以在浏览器中输入IP或者域名链接到Web Server进行访问,如果这个Web Server挂了, ...

  8. MHA高可用配置及故障切换

    MHA高可用配置及故障切换 目录 MHA高可用配置及故障切换 一.案例概述 二.案例前置知识点 1. MHA概述 2. MHA的组成 (1)MHA Manager(管理节点) (2)MHA Node( ...

  9. kubernetes 1.15.1 高可用部署 -- 从零开始

    这是一本书!!! 一本写我在容器生态圈的所学!!! 重点先知: 1. centos 7.6安装优化 2. k8s 1.15.1 高可用部署 3. 网络插件calico 4. dashboard 插件 ...

随机推荐

  1. Node.js 安装第三方模块包(npm),通过 package.json配置信息安装项目依赖的模块

    npm下载安装的第三方模块包官网(提供包名和使用方法):https://www.npmjs.com/ 淘宝镜像(国内,比较快):https://npm.taobao.org/ commonjs01.j ...

  2. C#面向对象--练习题

    public class Program { //用户自定义:我们自己定义这种数据类型 public struct jiegouti// 结构体前面加上public, 可供外界访问 { public ...

  3. uni-app 快速认识

    uni-app 是一个使用 Vue.js 开发所有前端应用的框架,开发者编写一套代码,可发布到iOS.Android.H5.以及各种小程序(微信/阿里/百度/头条/QQ)等多个平台. 即使不跨端,un ...

  4. HDU 1286

    欧拉函数 φ函数的值 通式:φ(x)=x(1-1/p1)(1-1/p2)(1-1/p3)(1-1/p4)…..(1-1/pn),其中p1, p2……pn为x的所有质因数,x是不为0的整数.φ(1)=1 ...

  5. H3C RIP基本配置举例

  6. linux加载和卸载模块

    模块建立之后, 下一步是加载到内核. 如我们已指出的, insmod 为你完成这个工作. 这个 程序加载模块的代码段和数据段到内核, 接着, 执行一个类似 ld 的函数, 它连接模块中 任何未解决的符 ...

  7. ASP.NET Core 开启后台任务

    本文告诉大家如何通过 Microsoft.Extensions.Hosting.BackgroundService 开启后台任务 实现 BackManagerService 类继承 Backgroun ...

  8. [转载]sublime用法精华

    Sublime Text 全程指南 九月 03.2015. 暂无评论 永远站 作者:Lucida 原文链接:http://lucida.me/blog/sublime-text-complete-gu ...

  9. JQ表单选择器和CSS3表单选择器

    JQ表单选择器和CSS3表单选择器 JQ表单选择器 为了使用户能够更加灵活地操作表单,jQuery中加入了表单选择器,利用这个选择器能极其方便的获取到表单的某个或者某类型的元素.表单选择器的介绍如图: ...

  10. hdu 6579 Operation (在线线性基)

    传送门 •题意 一个数组a有n个数 m个操作 操作① 询问$[l,r]$区间的异或值 操作② 在数组末尾追加一个数x,数组长度变为$n+1$ 其中$l,r$不直接给出,其中$l=l%n+1,r=r%n ...