mysql group replication 主节点宕机恢复
一、mysql group replication 生来就要面对两个问题:
一、主节点宕机如何恢复。
二、多数节点离线的情况下、余下节点如何继续承载业务。
在这里我们只讨论第一个问题、也就是说当主结点宕机之后、我们怎么把它从新加入到高可用集群中去。这个问题又可以细分成
两种情况:
1、温和打击:主结点的数据还在、宕机期间集群中的其它结点的binlog日志也都还在
这种情况下重新启动mysql group replication 就可修复问题。
2、毁灭打击:主结点的数据都没有了
这种情况下要从其余结点备份恢复宕机结点、然后再重启mysql group replication 就可修复问题。
详细的修复步骤请看后面的例子
二、环境介绍:
环境简介
主机名 ip地址 mgr角色
mtls17 10.186.19.17 primary
mtls18 10.186.19.18 seconde
mtls19 10.186.19.19 seconde
集群状态:
mysql> select * from replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| group_replication_applier | 12b6f8d9-d655-11e7-936a-9a17854b700d | mtls17 | 3306 | ONLINE |
| group_replication_applier | 12bfe200-d655-11e7-a264-1e1b3511358e | mtsl18 | 3306 | ONLINE |
| group_replication_applier | 1453bcac-d655-11e7-a503-8a7c439b72d9 | mtls19 | 3306 | ONLINE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
3 rows in set (0.00 sec) mysql> show global status like 'group_replication_primary_member';
+----------------------------------+--------------------------------------+
| Variable_name | Value |
+----------------------------------+--------------------------------------+
| group_replication_primary_member | 12b6f8d9-d655-11e7-936a-9a17854b700d |
+----------------------------------+--------------------------------------+
1 row in set (0.00 sec)
说明:
由上面的信息可以看出mtls17上的mysql为集群当前的primary结点、并且集群的各结点的状态正常。
三、情况下的故障模拟 + 解决:
1、模拟mtls17结点宕机
ps -ef | grep mysql
mysql : ? :: /usr/local/mysql/bin/mysqld --defaults-file=/etc/my.cnf
root : pts/ :: grep --color=auto mysql
[root@mtls17 data]# kill -
[root@mtls17 data]# ps -ef | grep mysql
root : pts/ :: grep --color=auto mysql
2、查看余下两个结点的情况
mysql> melect * from replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| group_replication_applier | 12bfe200-d655-11e7-a264-1e1b3511358e | mtsl18 | 3306 | ONLINE |
| group_replication_applier | 1453bcac-d655-11e7-a503-8a7c439b72d9 | mtls19 | 3306 | ONLINE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
2 rows in set (0.00 sec) mysql> show global status like 'group_replication_primary_member';
+----------------------------------+--------------------------------------+
| Variable_name | Value |
+----------------------------------+--------------------------------------+
| group_replication_primary_member | 12bfe200-d655-11e7-a264-1e1b3511358e |
+----------------------------------+--------------------------------------+
1 row in set (0.00 sec)
由上面可以看出在mtls17结点上的mysql被kill掉之后、余下的两个结点组成了新的集群、并且mtls18上的mysql
成为了primary
3、解决primary宕机恢复的问题
systemctl start mysql
[root@mtls17 data]# mysql -uroot -pmtls0352
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is
Server version: 5.7.-log MySQL Community Server (GPL) Copyright (c) , , Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> start group_replication;
Query OK, rows affected (4.03 sec) mysql>
4、检查问题是否正常解决
select * from replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| group_replication_applier | 12b6f8d9-d655-11e7-936a-9a17854b700d | mtls17 | 3306 | ONLINE |
| group_replication_applier | 12bfe200-d655-11e7-a264-1e1b3511358e | mtsl18 | 3306 | ONLINE |
| group_replication_applier | 1453bcac-d655-11e7-a503-8a7c439b72d9 | mtls19 | 3306 | ONLINE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
3 rows in set (0.00 sec) mysql> show global status like 'group_replication_primary_member';
+----------------------------------+--------------------------------------+
| Variable_name | Value |
+----------------------------------+--------------------------------------+
| group_replication_primary_member | 12bfe200-d655-11e7-a264-1e1b3511358e |
+----------------------------------+--------------------------------------+
1 row in set (0.00 sec)
总论:之前的主结点在宕机之后、通过重启服务、重启mysql-group-replication成功的解决了问题。
四、模拟primary结点上的数据已经丢失的情况下、如果恢复结点:
1、退出服务、删除数据
[root@mtsl18 ~]# ps -ef | grep mysql
mysql : ? :: /usr/local/mysql/bin/mysqld --defaults-file=/etc/my.cnf
root : pts/ :: grep --color=auto mysql
[root@mtsl18 ~]# kill -
[root@mtsl18 ~]# rm -rf /database/mysql/data/
[root@mtsl18 ~]# ps -ef | grep mysql
root : pts/ :: grep --color=auto mysql
这个实验是接着情况一做下去的、所以primary在mtls18上、所以我们在mtls18上做退出服务、删除数据的动作
2、查看集群的状态:
mysql> select * from replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| group_replication_applier | 12b6f8d9-d655-11e7-936a-9a17854b700d | mtls17 | 3306 | ONLINE |
| group_replication_applier | 1453bcac-d655-11e7-a503-8a7c439b72d9 | mtls19 | 3306 | ONLINE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
2 rows in set (0.00 sec) mysql> show global status like 'group_replication_primary_member';
+----------------------------------+--------------------------------------+
| Variable_name | Value |
+----------------------------------+--------------------------------------+
| group_replication_primary_member | 12b6f8d9-d655-11e7-936a-9a17854b700d |
+----------------------------------+--------------------------------------+
1 row in set (0.01 sec)
说明:当mtls18宕机后primary就从mtls18切到了mtls17上去了
3、通过meb备份mtls19用于还原宕机的mtls18
mysqlbackup --defaults-file=/etc/my.cnf --with-timestamp \
--host=localhost --user=root --password=mtls0352 \
--backup-dir=/tmp/ --backup-image=/tmp/2017-12-01T12:30:00.mbi --no-history-logging \
backup-to-image MySQL Enterprise Backup version 4.1. Linux-2.6.-400.215..el5uek-x86_64 [//]
Copyright (c) , , Oracle and/or its affiliates. All Rights Reserved. :: MAIN INFO: A thread created with Id ''
:: MAIN INFO: Starting with following command line ...
mysqlbackup --defaults-file=/etc/my.cnf --with-timestamp --host=localhost
--user=root --password=xxxxxxxx --backup-dir=/tmp/
--backup-image=/tmp/--01T12::.mbi --no-history-logging
backup-to-image :: MAIN INFO:
:: MAIN INFO: MySQL server version is '5.7.20-log'
.......
........
:: MAIN INFO: Full Image Backup operation completed successfully.
:: MAIN INFO: Backup image created successfully.
:: MAIN INFO: Image Path = /tmp/--01T12::.mbi
:: MAIN INFO: MySQL binlog position: filename mysql-bin., position -------------------------------------------------------------
Parameters Summary
-------------------------------------------------------------
Start LSN :
End LSN :
------------------------------------------------------------- mysqlbackup completed OK!
4、传输备份到mtls18
scp /tmp/--01T12::.mbi mtls18:/tmp/
5、还原备份
mysqlbackup --defaults-file=/etc/my.cnf --backup-image=/tmp/2017-12-01T12:30:00.mbi \
> --backup-dir=/tmp/ --datadir=/database/mysql/data/3306/ \
> copy-back-and-apply-log
MySQL Enterprise Backup version 4.1. Linux-2.6.-400.215..el5uek-x86_64 [//]
Copyright (c) , , Oracle and/or its affiliates. All Rights Reserved. :: MAIN INFO: A thread created with Id ''
:: MAIN INFO: Starting with following command line ...
mysqlbackup --defaults-file=/etc/my.cnf
--backup-image=/tmp/--01T12::.mbi --backup-dir=/tmp/
--datadir=/database/mysql/data// copy-back-and-apply-log :: MAIN INFO:
IMPORTANT: Please check that mysqlbackup run completes successfully.
.....
.....
:: PCR1 INFO: The first data file is '/database/mysql/data/3306/ibdata1'
and the new created log files are at '/database/mysql/data/3306/'
:: MAIN INFO: MySQL server version is '5.7.20-log'
:: MAIN INFO: Restoring ...5.7.-log version
:: MAIN INFO: Apply-log operation completed successfully.
:: MAIN INFO: Full Backup has been restored successfully. mysqlbackup completed OK!
6、重启mtls18上的mysql
[root@mtsl18 tmp]# chown -R mysql:mysql /database/mysql/data/
[root@mtsl18 tmp]# systemctl start mysql
[root@mtsl18 tmp]# ps -ef | grep mysql
mysql : ? :: /usr/local/mysql/bin/mysqld --defaults-file=/etc/my.cnf
root : pts/ :: grep --color=auto mysql
7、重启mysql group replication
mysql -uroot -pmtls0352
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 4
Server version: 5.7.20-log MySQL Community Server (GPL) Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> reset master;
Query OK, 0 rows affected (0.10 sec) mysql> reset slave;
Query OK, 0 rows affected (0.00 sec) mysql> set sql_log_bin=0;
Query OK, 0 rows affected (0.00 sec) mysql> source /database/mysql/data/3306/backup_gtid_executed.sql ;
Query OK, 0 rows affected (0.10 sec) mysql> set sql_log_bin=1;
Query OK, 0 rows affected (0.00 sec) mysql> change master to
-> master_user='mgr_usr',
-> master_password='mgr10352'
-> for channel 'group_replication_recovery';
Query OK, 0 rows affected, 2 warnings (0.21 sec) mysql> start group_replication;
Query OK, 0 rows affected (3.46 sec)
8、检查集群的状态是否正常
mysql> select * from replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| group_replication_applier | 12b6f8d9-d655-11e7-936a-9a17854b700d | mtls17 | 3306 | ONLINE |
| group_replication_applier | 1453bcac-d655-11e7-a503-8a7c439b72d9 | mtls19 | 3306 | ONLINE |
| group_replication_applier | 85f82fce-d65e-11e7-9e92-1e1b3511358e | mtsl18 | 3306 | ONLINE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
3 rows in set (0.01 sec) mysql> show global status like 'group_replication_primary_member';
+----------------------------------+--------------------------------------+
| Variable_name | Value |
+----------------------------------+--------------------------------------+
| group_replication_primary_member | 12b6f8d9-d655-11e7-936a-9a17854b700d |
+----------------------------------+--------------------------------------+
1 row in set (0.01 sec)
五、总结:
对于两种primary宕故障的修复总结:
1、数据没有丢、binlog日志也没有丢 那直接重启mysql group replication 就行、它会自动修复问题。
2、数据丢失的情况、先备份还原-->重启mysql group replication 就行。
对于mysql group replication 维护操作复杂性的总结:
总的来说mysql group replication 对dba还是比较友好的、几个小小的操作就能恢复故障的集群。
六、我写的关于mysql group replication 的相关文章
1、mysql group replication 安装与配置详解:http://www.cnblogs.com/JiangLe/p/6727281.html#3849996
2、mysql group replication 在mysql-5.7.20版本下的可用性报告:http://www.cnblogs.com/JiangLe/p/7809229.html
3、mysql group replication 主节宕机点恢复 https://i.cnblogs.com/EditPosts.aspx?postid=7941929
4、mysql group replication 多数据结点丢失情况下的恢复
5、我写的全自动化安装mysql-group-replication 开源工具 https://github.com/Neeky/mysqltools
----
mysql group replication 主节点宕机恢复的更多相关文章
- CDH集群主节点宕机恢复
1 情况概述 公司的开发集群在周末莫名其妙的主节点Hadoop-1的启动固态盘挂了,由于CM.HDFS的NameNode.HBase的Master都安装在Hadoop-1,导致了整个集群都 ...
- Mysql 5.7 基于组复制(MySQL Group Replication) - 运维小结
之前介绍了Mysq主从同步的异步复制(默认模式).半同步复制.基于GTID复制.基于组提交和并行复制 (解决同步延迟),下面简单说下Mysql基于组复制(MySQL Group Replication ...
- Mysql Group Replication 简介及单主模式组复制配置【转】
一 Mysql Group Replication简介 Mysql Group Replication(MGR)是一个全新的高可用和高扩张的MySQL集群服务. 高一致性,基于原生复制及p ...
- MySQL Group Replication 介绍
2016-12-12,一个重要的日子,mysql5.7.17 GA版发布,正式推出Group Replication(组复制) 插件,通过这个插件增强了MySQL原有的高可用方案(原有的Replica ...
- 使用ProxySQL实现MySQL Group Replication的故障转移、读写分离(一)
导读: 在之前,我们搭建了MySQL组复制集群环境,MySQL组复制集群环境解决了MySQL集群内部的自动故障转移,但是,组复制并没有解决外部业务的故障转移.举个例子,在A.B.C 3台机器上搭建了组 ...
- MySQL group replication介绍
“MySQL group replication” group replication是MySQL官方开发的一个开源插件,是实现MySQL高可用集群的一个工具.第一个GA版本正式发布于MySQL5.7 ...
- mysql group replication观点及实践
一:个人看法 Mysql Group Replication 随着5.7发布3年了.作为技术爱好者.mgr 是继 oracle database rac 之后. 又一个“真正” 的群集,怎么做到“ ...
- MySQL Group Replication配置
MySQL Group Replication简述 MySQL 组复制实现了基于复制协议的多主更新(单主模式). 复制组由多个 server成员构成,并且组中的每个 server 成员可以独立地执行事 ...
- MySQL Group Replication 技术点
mysql group replication,组复制,提供了多写(multi-master update)的特性,增强了原有的mysql的高可用架构.mysql group replication基 ...
随机推荐
- https 简介学习
https://program-think.blogspot.com/2014/11/https-ssl-tls-1.html https://program-think.blogspot.com/2 ...
- js 判断是否是空对象
主要思路 我们要考虑到的主要有:js原生对象,宿主对象(浏览器上面的). 首先对于宿主对象 主要判断是DOM 对象 和是否是window 对象 是否是DOM对象 value.nodeType 是否存 ...
- mysql系统变量
http://dev.mysql.com/doc/refman/5.5/en/server-system-variables.html Table 5.2 System Variable Summar ...
- ES6学习笔记三:Symbol、Set、Map
一:Symbol ES6 引入了一种新的原始数据类型Symbol,表示独一无二的值.它是 JavaScript 语言的第七种数据类型,前六种是:undefined.null.布尔值(Boolean). ...
- 单链表Java实现
近期在复习基本数据结构,本文是单链表的Java实现,包含对单链表的实现插入删除查找遍历等.最后还实现了单链表的逆置. 实现了多项式相加,多项式相乘. 原文章及完整源码在这里 http://binhua ...
- 〖Linux〗穿越城墙之后,直接连接国内网站的路由配置
因为有需要做Android相关的开发工作,很多时候要穿越之后才能做事情: 如Android文件加密预研.Android NDK/SDK的下载,都需要使用得到Google: 但是穿越之后,访问国内网站就 ...
- SpringCloud之搭建配置中心
一.搭建config-server 1.引入pom <dependencies> <dependency> <groupId>org.springframework ...
- Kettle7.1在window启动报错
实验环境: window10 x64 kettle7.1 pdi-ce-7.1.0.0-12.zip 错误现象: a java exception has occurred 问题解决: 运行调试工具 ...
- 微信小程序支付(java后端)
第一步 进入小程序,下单,请求下单支付,调用小程序登录API来获取Openid(https://mp.weixin.qq.com/debug/w ... .html#wxloginobject), ...
- awbeci—一个帮助你快速处理日常工作的网址收集网站
大家好,我是awbeci作者,awbeci网站是一个能够快速处理日常工作的网址收集网站,为什么这样说呢?下面我将为大家介绍这个网站的由来,以及设计它的初衷和如何使用以及对未来的展望和计划,以及bug反 ...