一、mysql group replication 生来就要面对两个问题:

  一、主节点宕机如何恢复。

  二、多数节点离线的情况下、余下节点如何继续承载业务。

  在这里我们只讨论第一个问题、也就是说当主结点宕机之后、我们怎么把它从新加入到高可用集群中去。这个问题又可以细分成

  两种情况:

    1、温和打击:主结点的数据还在、宕机期间集群中的其它结点的binlog日志也都还在

          这种情况下重新启动mysql group replication 就可修复问题。

    2、毁灭打击:主结点的数据都没有了

          这种情况下要从其余结点备份恢复宕机结点、然后再重启mysql group replication 就可修复问题。

  详细的修复步骤请看后面的例子

二、环境介绍:

  环境简介

主机名         ip地址        mgr角色

mtls17        10.186.19.17      primary    

mtls18        10.186.19.18      seconde

mtls19        10.186.19.19      seconde

  集群状态:

mysql> select * from replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| group_replication_applier | 12b6f8d9-d655-11e7-936a-9a17854b700d | mtls17 | 3306 | ONLINE |
| group_replication_applier | 12bfe200-d655-11e7-a264-1e1b3511358e | mtsl18 | 3306 | ONLINE |
| group_replication_applier | 1453bcac-d655-11e7-a503-8a7c439b72d9 | mtls19 | 3306 | ONLINE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
3 rows in set (0.00 sec) mysql> show global status like 'group_replication_primary_member';
+----------------------------------+--------------------------------------+
| Variable_name | Value |
+----------------------------------+--------------------------------------+
| group_replication_primary_member | 12b6f8d9-d655-11e7-936a-9a17854b700d |
+----------------------------------+--------------------------------------+
1 row in set (0.00 sec)

  说明:

  由上面的信息可以看出mtls17上的mysql为集群当前的primary结点、并且集群的各结点的状态正常。

三、情况下的故障模拟 + 解决:

  1、模拟mtls17结点宕机

ps -ef | grep mysql
mysql : ? :: /usr/local/mysql/bin/mysqld --defaults-file=/etc/my.cnf
root : pts/ :: grep --color=auto mysql
[root@mtls17 data]# kill -
[root@mtls17 data]# ps -ef | grep mysql
root : pts/ :: grep --color=auto mysql

  

  2、查看余下两个结点的情况

mysql> melect * from replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| group_replication_applier | 12bfe200-d655-11e7-a264-1e1b3511358e | mtsl18 | 3306 | ONLINE |
| group_replication_applier | 1453bcac-d655-11e7-a503-8a7c439b72d9 | mtls19 | 3306 | ONLINE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
2 rows in set (0.00 sec) mysql> show global status like 'group_replication_primary_member';
+----------------------------------+--------------------------------------+
| Variable_name | Value |
+----------------------------------+--------------------------------------+
| group_replication_primary_member | 12bfe200-d655-11e7-a264-1e1b3511358e |
+----------------------------------+--------------------------------------+
1 row in set (0.00 sec)

  由上面可以看出在mtls17结点上的mysql被kill掉之后、余下的两个结点组成了新的集群、并且mtls18上的mysql

  成为了primary

  

  3、解决primary宕机恢复的问题

systemctl start mysql
[root@mtls17 data]# mysql -uroot -pmtls0352
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is
Server version: 5.7.-log MySQL Community Server (GPL) Copyright (c) , , Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> start group_replication;
Query OK, rows affected (4.03 sec) mysql>

  4、检查问题是否正常解决

select * from replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| group_replication_applier | 12b6f8d9-d655-11e7-936a-9a17854b700d | mtls17 | 3306 | ONLINE |
| group_replication_applier | 12bfe200-d655-11e7-a264-1e1b3511358e | mtsl18 | 3306 | ONLINE |
| group_replication_applier | 1453bcac-d655-11e7-a503-8a7c439b72d9 | mtls19 | 3306 | ONLINE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
3 rows in set (0.00 sec) mysql> show global status like 'group_replication_primary_member';
+----------------------------------+--------------------------------------+
| Variable_name | Value |
+----------------------------------+--------------------------------------+
| group_replication_primary_member | 12bfe200-d655-11e7-a264-1e1b3511358e |
+----------------------------------+--------------------------------------+
1 row in set (0.00 sec)

  总论:之前的主结点在宕机之后、通过重启服务、重启mysql-group-replication成功的解决了问题。

四、模拟primary结点上的数据已经丢失的情况下、如果恢复结点:

  1、退出服务、删除数据

[root@mtsl18 ~]# ps -ef | grep mysql
mysql : ? :: /usr/local/mysql/bin/mysqld --defaults-file=/etc/my.cnf
root : pts/ :: grep --color=auto mysql
[root@mtsl18 ~]# kill -
[root@mtsl18 ~]# rm -rf /database/mysql/data/
[root@mtsl18 ~]# ps -ef | grep mysql
root : pts/ :: grep --color=auto mysql

  这个实验是接着情况一做下去的、所以primary在mtls18上、所以我们在mtls18上做退出服务、删除数据的动作

  2、查看集群的状态:

mysql> select * from replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| group_replication_applier | 12b6f8d9-d655-11e7-936a-9a17854b700d | mtls17 | 3306 | ONLINE |
| group_replication_applier | 1453bcac-d655-11e7-a503-8a7c439b72d9 | mtls19 | 3306 | ONLINE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
2 rows in set (0.00 sec) mysql> show global status like 'group_replication_primary_member';
+----------------------------------+--------------------------------------+
| Variable_name | Value |
+----------------------------------+--------------------------------------+
| group_replication_primary_member | 12b6f8d9-d655-11e7-936a-9a17854b700d |
+----------------------------------+--------------------------------------+
1 row in set (0.01 sec)

  说明:当mtls18宕机后primary就从mtls18切到了mtls17上去了

  3、通过meb备份mtls19用于还原宕机的mtls18

mysqlbackup --defaults-file=/etc/my.cnf --with-timestamp \
--host=localhost --user=root --password=mtls0352 \
--backup-dir=/tmp/ --backup-image=/tmp/2017-12-01T12:30:00.mbi --no-history-logging \
backup-to-image MySQL Enterprise Backup version 4.1. Linux-2.6.-400.215..el5uek-x86_64 [//]
Copyright (c) , , Oracle and/or its affiliates. All Rights Reserved. :: MAIN INFO: A thread created with Id ''
:: MAIN INFO: Starting with following command line ...
mysqlbackup --defaults-file=/etc/my.cnf --with-timestamp --host=localhost
--user=root --password=xxxxxxxx --backup-dir=/tmp/
--backup-image=/tmp/--01T12::.mbi --no-history-logging
backup-to-image :: MAIN INFO:
:: MAIN INFO: MySQL server version is '5.7.20-log'
.......
........
:: MAIN INFO: Full Image Backup operation completed successfully.
:: MAIN INFO: Backup image created successfully.
:: MAIN INFO: Image Path = /tmp/--01T12::.mbi
:: MAIN INFO: MySQL binlog position: filename mysql-bin., position -------------------------------------------------------------
Parameters Summary
-------------------------------------------------------------
Start LSN :
End LSN :
------------------------------------------------------------- mysqlbackup completed OK!

  4、传输备份到mtls18

scp /tmp/--01T12::.mbi mtls18:/tmp/

  5、还原备份

mysqlbackup --defaults-file=/etc/my.cnf --backup-image=/tmp/2017-12-01T12:30:00.mbi \
> --backup-dir=/tmp/ --datadir=/database/mysql/data/3306/ \
> copy-back-and-apply-log
MySQL Enterprise Backup version 4.1. Linux-2.6.-400.215..el5uek-x86_64 [//]
Copyright (c) , , Oracle and/or its affiliates. All Rights Reserved. :: MAIN INFO: A thread created with Id ''
:: MAIN INFO: Starting with following command line ...
mysqlbackup --defaults-file=/etc/my.cnf
--backup-image=/tmp/--01T12::.mbi --backup-dir=/tmp/
--datadir=/database/mysql/data// copy-back-and-apply-log :: MAIN INFO:
IMPORTANT: Please check that mysqlbackup run completes successfully.
.....
.....
:: PCR1 INFO: The first data file is '/database/mysql/data/3306/ibdata1'
and the new created log files are at '/database/mysql/data/3306/'
:: MAIN INFO: MySQL server version is '5.7.20-log'
:: MAIN INFO: Restoring ...5.7.-log version
:: MAIN INFO: Apply-log operation completed successfully.
:: MAIN INFO: Full Backup has been restored successfully. mysqlbackup completed OK!

  6、重启mtls18上的mysql

[root@mtsl18 tmp]# chown -R mysql:mysql /database/mysql/data/
[root@mtsl18 tmp]# systemctl start mysql
[root@mtsl18 tmp]# ps -ef | grep mysql
mysql : ? :: /usr/local/mysql/bin/mysqld --defaults-file=/etc/my.cnf
root : pts/ :: grep --color=auto mysql

  7、重启mysql group replication

mysql -uroot -pmtls0352
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 4
Server version: 5.7.20-log MySQL Community Server (GPL) Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> reset master;
Query OK, 0 rows affected (0.10 sec) mysql> reset slave;
Query OK, 0 rows affected (0.00 sec) mysql> set sql_log_bin=0;
Query OK, 0 rows affected (0.00 sec) mysql> source /database/mysql/data/3306/backup_gtid_executed.sql ;
Query OK, 0 rows affected (0.10 sec) mysql> set sql_log_bin=1;
Query OK, 0 rows affected (0.00 sec) mysql> change master to
-> master_user='mgr_usr',
-> master_password='mgr10352'
-> for channel 'group_replication_recovery';
Query OK, 0 rows affected, 2 warnings (0.21 sec) mysql> start group_replication;
Query OK, 0 rows affected (3.46 sec)

  8、检查集群的状态是否正常

mysql> select * from replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| group_replication_applier | 12b6f8d9-d655-11e7-936a-9a17854b700d | mtls17 | 3306 | ONLINE |
| group_replication_applier | 1453bcac-d655-11e7-a503-8a7c439b72d9 | mtls19 | 3306 | ONLINE |
| group_replication_applier | 85f82fce-d65e-11e7-9e92-1e1b3511358e | mtsl18 | 3306 | ONLINE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
3 rows in set (0.01 sec) mysql> show global status like 'group_replication_primary_member';
+----------------------------------+--------------------------------------+
| Variable_name | Value |
+----------------------------------+--------------------------------------+
| group_replication_primary_member | 12b6f8d9-d655-11e7-936a-9a17854b700d |
+----------------------------------+--------------------------------------+
1 row in set (0.01 sec)

五、总结:

  对于两种primary宕故障的修复总结:

    1、数据没有丢、binlog日志也没有丢 那直接重启mysql group replication 就行、它会自动修复问题。

    2、数据丢失的情况、先备份还原-->重启mysql group replication 就行。

  对于mysql group replication 维护操作复杂性的总结:  

    总的来说mysql group replication 对dba还是比较友好的、几个小小的操作就能恢复故障的集群。

六、我写的关于mysql group replication 的相关文章 

  1、mysql group replication 安装与配置详解:http://www.cnblogs.com/JiangLe/p/6727281.html#3849996

  2、mysql group replication 在mysql-5.7.20版本下的可用性报告:http://www.cnblogs.com/JiangLe/p/7809229.html

  3、mysql group replication 主节宕机点恢复 https://i.cnblogs.com/EditPosts.aspx?postid=7941929

  4、mysql group replication 多数据结点丢失情况下的恢复

  5、我写的全自动化安装mysql-group-replication 开源工具 https://github.com/Neeky/mysqltools

----

mysql group replication 主节点宕机恢复的更多相关文章

  1. CDH集群主节点宕机恢复

    1       情况概述 公司的开发集群在周末莫名其妙的主节点Hadoop-1的启动固态盘挂了,由于CM.HDFS的NameNode.HBase的Master都安装在Hadoop-1,导致了整个集群都 ...

  2. Mysql 5.7 基于组复制(MySQL Group Replication) - 运维小结

    之前介绍了Mysq主从同步的异步复制(默认模式).半同步复制.基于GTID复制.基于组提交和并行复制 (解决同步延迟),下面简单说下Mysql基于组复制(MySQL Group Replication ...

  3. Mysql Group Replication 简介及单主模式组复制配置【转】

    一 Mysql Group Replication简介    Mysql Group Replication(MGR)是一个全新的高可用和高扩张的MySQL集群服务.    高一致性,基于原生复制及p ...

  4. MySQL Group Replication 介绍

    2016-12-12,一个重要的日子,mysql5.7.17 GA版发布,正式推出Group Replication(组复制) 插件,通过这个插件增强了MySQL原有的高可用方案(原有的Replica ...

  5. 使用ProxySQL实现MySQL Group Replication的故障转移、读写分离(一)

    导读: 在之前,我们搭建了MySQL组复制集群环境,MySQL组复制集群环境解决了MySQL集群内部的自动故障转移,但是,组复制并没有解决外部业务的故障转移.举个例子,在A.B.C 3台机器上搭建了组 ...

  6. MySQL group replication介绍

    “MySQL group replication” group replication是MySQL官方开发的一个开源插件,是实现MySQL高可用集群的一个工具.第一个GA版本正式发布于MySQL5.7 ...

  7. mysql group replication观点及实践

    一:个人看法 Mysql  Group Replication  随着5.7发布3年了.作为技术爱好者.mgr 是继 oracle database rac 之后. 又一个“真正” 的群集,怎么做到“ ...

  8. MySQL Group Replication配置

    MySQL Group Replication简述 MySQL 组复制实现了基于复制协议的多主更新(单主模式). 复制组由多个 server成员构成,并且组中的每个 server 成员可以独立地执行事 ...

  9. MySQL Group Replication 技术点

    mysql group replication,组复制,提供了多写(multi-master update)的特性,增强了原有的mysql的高可用架构.mysql group replication基 ...

随机推荐

  1. flume监控

    Flume本身提供了http, ganglia的监控服务,而我们目前主要使用zabbix做监控.因此,我们为Flume添加了zabbix监控模块,和sa的监控服务无缝融合. 另一方面,净化Flume的 ...

  2. Java 基础【14】@注解

    1.注解简介 JDK 1.5 中引入的 java.lang.annotation 包提供注解编程支持,可以让类在编译.类加载.运行时被读取,并执行相应的处理. 在 Java EE应用的时候,总是免不了 ...

  3. Maven项目继承与聚合

    转载请注明原文地址:http://www.cnblogs.com/ygj0930/p/6628534.html  一:继承 在Java编程中,如果多个类都使用到了相同的内容.方法时,我们可以用继承的方 ...

  4. django之创建第11个项目-页面整合

    目的:将如下众多html页面整合到一个index.html页面中. 百度云盘:django之创建第11个项目-页面整合 用下面的方式实现: <!DOCTYPE html> <head ...

  5. Mingw编译DLib

    Mingw编译DLib 因为机器上安装了qt-opensource-windows-x86-mingw530-5.8.0,所以准备使用其自带的mingw530来编译DLib使用. 因为DLib使用CM ...

  6. php反射API 获取属性/注释/方法 执行方法 实例

    <?php class Person { /** * 这里是对$_allowDynamicAttributes的注释信息 */ private $_allowDynamicAttributes ...

  7. jenkins 搭建过程中遇到的问题

    1.[ERROR] Unknown lifecycle phase "mvn". You must specify a valid lifecycle phase or a goa ...

  8. Unbound服务的安装与运行管理

    一.Unbound服务的安装与运行管理 1.获取Unbound软件包 RHEL7.x自带了Bind和Unbound两种DNS服务包,Unbound是红帽公司推荐使用的DNS服务器.目前,虽然Bind在 ...

  9. C#让控制台程序不显示闪退窗口的方法

    新建一个控制台程序,然后编译为 窗体程序.即可........

  10. Linux手工添加swap

    swap是一把双刃剑,在实践中发现,严重的会导致linux负载超高,失去响应kswap内存的信息转存到swap(硬盘)!,在内存较大的情况下不建议建立swap!!! 师夷长技以制夷! 1.root权限 ...