Mysql 主主复制失败



故障描述

原因描述 因为机柜PDU老化, 导致整个机柜掉电.
故障时间 20160923-10:09
发现时间 20160929-13:56

架构信息

Tomcat Memcache Keepalive Mysql主主复制

节点信息

序号 节点名称 IP地址 报错信息
1 aipprd1 10.66.1.52 Got fatal error 1236 from master when reading data from binary log: ‘binlog truncated in the middle of event; consider out of disk space on master; the first event ‘mysql-bin.000084’ at 91941417, the last event read from ‘/aip/mysql/data/log/mysql-bin.000084’ at 91941783, the last byte read from ‘/aip/mysql/data/log/mysql-bin.000084’ at 91942912.’
2 aipprd2 10.66.1.51 Got fatal error 1236 from master when reading data from binary log: ‘binlog truncated in the middle of event; consider out of disk space on master; the first event ‘mysql-bin.000082’ at 6369026, the last event read from ‘/aip/mysql/data/log/mysql-bin.000082’ at 6369026, the last byte read from ‘/aip/mysql/data/log/mysql-bin.000082’ at 6369280.’

故障分析

  • 由于Zabbix的Mysql监控脚本的缘故, 没有触发事件, 所以直至20160929检查Zabbix日志的时候, 才发现该故障, 当时KeepaliveVIPaipprd1上, 这个节点上, 数据库是对外服务的;
  • 那么首先以aipprd1为主, 先将aipprd2的从环境同步起来;
  • aipprd2从环境同步完成后, 再将aipprd1的从环境同步起来.

同步AIPPRD2的从环境

检查aipprd2的Slave状态 
根据代码区的描述, Last_Errno的错误代码为1062, 需要手动修改Position.

[root@aipprd2 ~]# mysql -uzabbixmoniter -ppassw0rd -hlocalhost -e "show slave status\G;"
*************************** . row ***************************
Slave_IO_State:
Master_Host: 10.66.1.52
Master_User: root
Master_Port:
Connect_Retry:
Master_Log_File: mysql-bin.
Read_Master_Log_Pos:
Relay_Log_File: mysql-relay-bin.
Relay_Log_Pos:
Relay_Master_Log_File: mysql-bin.
Slave_IO_Running: No
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno:
Last_Error: Error 'Duplicate entry '93FF91EF92866D23E80E4A57D55ED538-n1.tomcat604' for key 'PRIMARY'' on query. Default database: 'aipprd'. Query: 'INSERT INTO eahttpsession ( sessionid, username, account, createtime, loginip,userid,explorer,userDomain,computerName,computerUserName) VALUES ('93FF91EF92866D23E80E4A57D55ED538-n1.tomcat604', '李花', 'XS003_4200', '-- ::', , '','MSIE 7.0','','','')'
Skip_Counter:
Exec_Master_Log_Pos:
Relay_Log_Space:
Until_Condition: None
Until_Log_File:
Until_Log_Pos:
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno:
Last_IO_Error: Got fatal error from master when reading data from binary log: 'binlog truncated in the middle of event; consider out of disk space on master; the first event 'mysql-bin.' at 6369026, the last event read from '/aip/mysql/data/log/mysql-bin.' at 6369026, the last byte read from '/aip/mysql/data/log/mysql-bin.' at 6369280.'
Last_SQL_Errno:
Last_SQL_Error: Error 'Duplicate entry '93FF91EF92866D23E80E4A57D55ED538-n1.tomcat604' for key 'PRIMARY'' on query. Default database: 'aipprd'. Query: 'INSERT INTO eahttpsession ( sessionid, username, account, createtime, loginip,userid,explorer,userDomain,computerName,computerUserName) VALUES ('93FF91EF92866D23E80E4A57D55ED538-n1.tomcat604', '李花', 'XS003_4200', '-- ::', , '','MSIE 7.0','','','')'
Replicate_Ignore_Server_Ids:
Master_Server_Id:

aipprd2上按照Last_IO_Error中的PosfilePos修改. 
按照报错给出的提示PosfilePos修改后, 报错依旧.

mysql> slave stop;
mysql> CHANGE MASTER TO master_host='10.66.1.52', master_port=, master_user='root', master_password='passw0rd', master_log_file='mysql-bin.000082', master_log_pos=;
mysql> slave start;

aipprd2检查Mysql的日志 
Mysql的日志中记录了Crash开始的时间, 并给出了建议,Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.000082' position 6367510

 :: [Note] Starting crash recovery...
:: [Note] Crash recovery finished.
:: [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.000082' position

aipprd2上按照日志中建议的PosfilePos修改. 
按照日志中建议的PosfilePos修改后, 报错依旧.

mysql> slave stop;
mysql> CHANGE MASTER TO master_host='10.66.1.52', master_port=, master_user='root', master_password='passw0rd', master_log_file='mysql-bin.000082', master_log_pos=;
mysql> slave start;

aipprd1上检查posfile
首先检查show slave status\G;中给出的pos, 发现日志中根本不存在.

[root@aipprd1 log]# mysqlbinlog --no-defaults --start-position= mysql-bin.
/*!40019 SET @@session.max_insert_delayed_threads=0*/;
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/; # at # :: server id end_log_pos Start: binlog v , server v 5.5.-log created :: at startup ROLLBACK/*!*/;
BINLOG '
47HkVw8BAAAAZwAAAGsAAAAAAAQANS41LjI0LWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAADjseRXEzgNAAgAEgAEBAQEEgAAVAAEGggAAAAICAgCAA==
'/*!*/;
ERROR: Error in Log_event::read_log_event(): 'read error', data_len: , event_type:
ERROR: Could not read entry at offset : Error in log format or read error.
DELIMITER # End of log file ROLLBACK /* added by mysqlbinlog */;
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;

此后检查Mysql日志中建议的pos, 发现posfile中是存在此记录的, 而此posfile的最后一个pos6368660, 而show slave status\G;pos6369026, 显然不存在日志文件中.

[root@aipprd1 log]# mysqlbinlog --no-defaults  --start-position= mysql-bin.
/*!40019 SET @@session.max_insert_delayed_threads=0*/;
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/; # at # :: server id end_log_pos Start: binlog v , server v 5.5.-log created :: at startup # Warning: this binlog is either in use or was not closed properly. ROLLBACK/*!*/;
BINLOG '
05XkVw8BAAAAZwAAAGsAAAABAAQANS41LjI0LWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAADTleRXEzgNAAgAEgAEBAQEEgAAVAAEGggAAAAICAgCAA==
'/*!*/;
# at
# at
# at
# at
# at
# at
# at
# at
# at
# at
# at
# End of log file
ROLLBACK /* added by mysqlbinlog */;
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;

最后在mysql-bin.000083中检查show slave status\G;提示pos:6369026, 也不存在.

[root@aipprd1 log]# mysqlbinlog --no-defaults  --start-position= mysql-bin. 

aipprd2上重新发起修改posfilepos操作 
检查aipprd1上的日志, 既然mysql-bin.000082日志末不存在pos:6369026, 且mysql-bin.000083为下一个日志, 那么重新发起修改posfilepos的操作.

[root@aipprd1 log]# ll
total
-rw-rw---- mysql mysql Sep : mysql-bin.
-rw-rw---- mysql mysql Sep : mysql-bin.
-rw-rw---- mysql mysql Sep : mysql-bin.
-rw-rw---- mysql mysql Sep : mysql-bin.
-rw-rw---- mysql mysql Sep : mysql-bin.
-rw-rw---- mysql mysql Sep : mysql-bin.
-rw-rw---- mysql mysql Sep : mysql-bin.
-rw-rw---- mysql mysql Sep : mysql-bin.
-rw-rw---- mysql mysql Sep : mysql-bin.log.index
-rw-rw---- mysql mysql Sep : mysql-relay-bin.
-rw-rw---- mysql mysql Sep : mysql-relay-bin.index

重新修改posfilemysql-bin.000083pos0, 启动Slave后, 现在同步正常.

mysql> slave stop;
mysql> CHANGE MASTER TO master_host='10.66.1.52', master_port=, master_user='root', master_password='passw0rd', master_log_file='mysql-bin.000083', master_log_pos=;
mysql> slave start;
mysql> show slave status\G;
*************************** . row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 10.66.1.52
Master_User: root
Master_Port:
Connect_Retry:
Master_Log_File: mysql-bin.
Read_Master_Log_Pos:
Relay_Log_File: mysql-relay-bin.
Relay_Log_Pos:
Relay_Master_Log_File: mysql-bin.
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno:
Last_Error:
Skip_Counter:
Exec_Master_Log_Pos:
Relay_Log_Space:
Until_Condition: None
Until_Log_File:
Until_Log_Pos:
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master:
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno:
Last_IO_Error:
Last_SQL_Errno:
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id:
row in set (0.00 sec)
  • 在同步的过程中, 发现有几个Last_SQL_Error: Error ‘Duplicate entry 1026SQL Error, 这个是因为重复主键导致Slave停止工作, 执行以下操作解决(如果有多条重复的主键, 需要执行多次):

    mysql> slave stop;
    mysql> set GLOBAL SQL_SLAVE_SKIP_COUNTER=;
    mysql> slave start;
  • 还有另一种办法就是修改mysql配置文件/etc/my.cnf[mysqld]下加一行slave_skip_errors = 1062 ,保存后重启mysql,mysql slave可以正常同步了.


同步AIPPRD1的从环境

  • 检查aipprd1的Slave状态

    mysql> show slave status\G;
    *************************** . row ***************************
    Slave_IO_State:
    Master_Host: 10.66.1.51
    Master_User: root
    Master_Port:
    Connect_Retry:
    Master_Log_File: mysql-bin.
    Read_Master_Log_Pos:
    Relay_Log_File: mysql-relay-bin.
    Relay_Log_Pos:
    Relay_Master_Log_File: mysql-bin.
    Slave_IO_Running: No
    Slave_SQL_Running: Yes
    Replicate_Do_DB:
    Replicate_Ignore_DB:
    Replicate_Do_Table:
    Replicate_Ignore_Table:
    Replicate_Wild_Do_Table:
    Replicate_Wild_Ignore_Table:
    Last_Errno:
    Last_Error:
    Skip_Counter:
    Exec_Master_Log_Pos:
    Relay_Log_Space:
    Until_Condition: None
    Until_Log_File:
    Until_Log_Pos:
    Master_SSL_Allowed: No
    Master_SSL_CA_File:
    Master_SSL_CA_Path:
    Master_SSL_Cert:
    Master_SSL_Cipher:
    Master_SSL_Key:
    Seconds_Behind_Master: NULL
    Master_SSL_Verify_Server_Cert: No
    Last_IO_Errno:
    Last_IO_Error: Got fatal error from master when reading data from binary log: 'binlog truncated in the middle of event; consider out of disk space on master; the first event 'mysql-bin.' at 91941417, the last event read from '/aip/mysql/data/log/mysql-bin.' at 91941783, the last byte read from '/aip/mysql/data/log/mysql-bin.' at 91942912.'
    Last_SQL_Errno:
    Last_SQL_Error:
    Replicate_Ignore_Server_Ids:
    Master_Server_Id:
    row in set (0.00 sec)
  • aipprd2上检查日志文件 
    aipprd1上检查show slave status\G;后, 提示需要修改posfile为mysql-bin.000084pos91942912, 因为在aipprd2同步完成后, 实际同步的数据是从aipprd1过来的, 这些数据在aipprd1上本身就存在的.

    [root@aipprd2 log]# ll
    total
    -rw-rw---- mysql mysql Sep : mysql-bin.
    -rw-rw---- mysql mysql Sep : mysql-bin.
    -rw-rw---- mysql mysql Sep : mysql-bin.
    -rw-rw---- mysql mysql Sep : mysql-bin.
    -rw-rw---- mysql mysql Sep : mysql-bin.
    -rw-rw---- mysql mysql Sep : mysql-bin.
    -rw-rw---- mysql mysql Sep : mysql-bin.
    -rw-rw---- mysql mysql Sep : mysql-bin.log.index
    -rw-rw---- mysql mysql Sep : mysql-relay-bin.
    -rw-rw---- mysql mysql Sep : mysql-relay-bin.
    -rw-rw---- mysql mysql Sep : mysql-relay-bin.
    -rw-rw---- mysql mysql Sep : mysql-relay-bin.
    -rw-rw---- mysql mysql Sep : mysql-relay-bin.
    -rw-rw---- mysql mysql Sep : mysql-relay-bin.index
  • aipprd2上检查show master status;后, 记录Posfilemysql-bin.000089, 既然aipprd1的数据为最新的, 且aipprd2已经从aipprd1后同步完成了(通过检查show slave status\G;中的Seconds_Behind_Master:, 如果此项值很小, 应该是同步完成了.), 那么两边的数据应该差不多的.

    mysql> show master status;
    +------------------+-----------+--------------+------------------+
    | File | Position | Binlog_Do_DB | Binlog_Ignore_DB |
    +------------------+-----------+--------------+------------------+
    | mysql-bin. | | | |
    +------------------+-----------+--------------+------------------+
    row in set (0.00 sec)
  • aipprd1上发起修改posfilepos操作 
    所以在此用posfilemysql-bin.000089pos0来修改, 启动Slave后, 开始同步.

    mysql> slave stop;
    Query OK, rows affected (0.11 sec)
    mysql> CHANGE MASTER TO master_host='10.66.1.51', master_port=, master_user='root', master_password='passw0rd', master_log_file='mysql-bin.000089', master_log_pos=;
    Query OK, rows affected (0.06 sec)
    mysql> slave start;
    Query OK, rows affected (0.00 sec)
    mysql> show slave status\G;
    *************************** . row ***************************
    Slave_IO_State: Waiting for master to send event
    Master_Host: 10.66.1.51
    Master_User: root
    Master_Port:
    Connect_Retry:
    Master_Log_File: mysql-bin.
    Read_Master_Log_Pos:
    Relay_Log_File: mysql-relay-bin.
    Relay_Log_Pos:
    Relay_Master_Log_File: mysql-bin.
    Slave_IO_Running: Yes
    Slave_SQL_Running: Yes
    Replicate_Do_DB:
    Replicate_Ignore_DB:
    Replicate_Do_Table:
    Replicate_Ignore_Table:
    Replicate_Wild_Do_Table:
    Replicate_Wild_Ignore_Table:
    Last_Errno:
    Last_Error:
    Skip_Counter:
    Exec_Master_Log_Pos:
    Relay_Log_Space:
    Until_Condition: None
    Until_Log_File:
    Until_Log_Pos:
    Master_SSL_Allowed: No
    Master_SSL_CA_File:
    Master_SSL_CA_Path:
    Master_SSL_Cert:
    Master_SSL_Cipher:
    Master_SSL_Key:
    Seconds_Behind_Master:
    Master_SSL_Verify_Server_Cert: No
    Last_IO_Errno:
    Last_IO_Error:
    Last_SQL_Errno:
    Last_SQL_Error:
    Replicate_Ignore_Server_Ids:
    Master_Server_Id:
    row in set (0.00 sec)

转自

Mysql 主主复制失败 - bluetom520的博客 - CSDN博客
http://blog.csdn.net/bluetom520/article/details/54893183

Mysql 主主复制失败恢复【转】的更多相关文章

  1. MySQL灾备恢复在线主从复制变成主主复制及多源复制【转】

    生产主主复制(A<--->B),和灾备主从复制(B--->C).当生产出现问题时,数据写入切换到灾备数据库,待生产恢复后,将灾备回写到生产.步骤如下: 1.灾备与生产其中一台建立主主 ...

  2. MySQL数据的主从复制、半同步复制和主主复制详解

    一.MySQL复制概述 ⑴.MySQL数据的复制的基本介绍 目前MySQL数据库已经占去数据库市场上很大的份额,其一是由于MySQL数据的开源性和高性能,当然还有重要的一条就是免费~不过不知道还能免费 ...

  3. MySQL数据的主从复制、半同步复制和主主复制详解-转

    一.MySQL复制概述 ⑴.MySQL数据的复制的基本介绍 目前MySQL数据库已经占去数据库市场上很大的份额,其一是由于MySQL数据的开源性和高性能,当然还有重要的一条就是免费~不过不知道还能免费 ...

  4. keepalived+mysql双主复制高可用方案

    MySQL双主复制,即互为Master-Slave(只有一个Master提供写操作),可以实现数据库服务器的热备,但是一个Master宕机后不能实现动态切换.而Keepalived通过虚拟IP,实现了 ...

  5. MySQL主从复制,主主复制,半同步复制

    实验环境: 系统:CentOS Linux release 7.4.1708 (Core) mariadb:mariadb-server-5.5.56-2.el7.x86_64 node1:172.1 ...

  6. mysql+mycat搭建稳定高可用集群,负载均衡,主备复制,读写分离

    数据库性能优化普遍采用集群方式,oracle集群软硬件投入昂贵,今天花了一天时间搭建基于mysql的集群环境. 主要思路 简单说,实现mysql主备复制-->利用mycat实现负载均衡. 比较了 ...

  7. mysql 主主复制(双主复制)+ 配置KEEPALIVED实现热备

    binlog-do-db和replicate-do-db表示需要同步的数据库 binlog-ignore-db和replicate-ignore-db表示不需要同步的数据库 云端服务器为master配 ...

  8. mysql主从复制 主主复制 读写分离

    首先是mysql的主从复制很简单 主主复制也就是互相主从最麻烦的最难的就是日志恢复,增量恢复什么的比较复杂 首先如果你不会安装mysql版本最好一样,或者往上的版本,因为mysql是向下兼容 请注意不 ...

  9. MYSQL的主从和主主复制模式

    一.复制介绍 MySQL支持单向.异步复制,复制过程中一个服务器充当主服务器,而一个或多个其它服务器充当从服务器.主服务器将更新写入二进制日志文件,并维护文件的一个索引以跟踪日志循环.这些日志可以记录 ...

随机推荐

  1. 第101天:CSS3中transform-style和perspective

    一.transform-style 1.transform-style属性是3D空间一个重要属性,指定嵌套元素如何在3D空间中呈现. 有两个属性值:flat和preserve-3d. transfor ...

  2. Codeforces 627D Preorder Test(二分+树形DP)

    题意:给出一棵无根树,每个节点有一个权值,现在要让dfs序的前k个结点的最小值最大,求出这个值. 考虑二分答案,把>=答案的点标记为1,<答案的点标记为0,现在的任务时使得dfs序的前k个 ...

  3. Wedding UVA - 11294(2-SAT男女分点)

    题意: 有N-1对夫妻参加一个婚宴,所有人都坐在一个长长的餐桌左侧或者右侧,新郎和新娘面做面坐在桌子的两侧.由于新娘的头饰很复杂,她无法看到和她坐在同一侧餐桌的人,只能看到对面餐桌的人.任意一对夫妻不 ...

  4. Ckeditor与Ckfinder(java)整合实现富媒体内容编辑(支持文件上传)

    先来看一下最终的效果图 一.编辑器界面 二.上传图片界面 <!-------------------------------------------------------> 一.安装包下 ...

  5. CF891E [数学题]

    1.答案=初始乘积-最终乘积的期望.然后直接dp+ntt是O(nklogk) 2.考虑展开式子ans=sum(a[i]-b[i]),大概感受一下未知数个数相同的项系数相同,问题在于如何求系数 3.没思 ...

  6. 【入门OJ】2003: [Noip模拟题]寻找羔羊

    这里可以复制样例: 样例输入: agnusbgnus 样例输出: 6 这里是链接:[入门OJ]2003: [Noip模拟题]寻找羔羊 这里是题解: 题目是求子串个数,且要求简单去重. 对于一个例子(a ...

  7. BZOJ3771 Triple 【NTT + 容斥】

    题目链接 BZOJ3771 题解 做水题放松一下 先构造\(A_i\)为\(x\)指数的生成函数\(A(x)\) 再构造\(2A_i\)为指数的生成函数\(B(x)\) 再构造\(3A_i\)为指数的 ...

  8. git用户名和邮箱配置

    1. 设置全局用户名和邮箱 git config --global user.name "xxx" git config --global user.email "xxx ...

  9. 浴谷八连测R6题解(收获颇丰.jpg)

    这场的题都让我收获颇丰啊QWQ 感谢van♂老师 T1 喵喵喵!当时以为经典题只能那么做, 思维定势了... 因为DP本质是通过一些条件和答案互相递推的一个过程, 实际上就是把条件和答案分配在DP的状 ...

  10. oracle 插入每年每天数据

    create or replace procedure PROC_P_ABC is v_sumday ; i ; v_calendar_date number :=null; v_day number ...