Preface
 
    In my last test of pt-heartbeat,both of master and slave were out of disk.And the mysql client was hang.In order to resolve the issue,I've tryed to fix the replicaiton environment without using mysqldump to reconfigure the slave.Let's see the details.
 
Procedure
 
I dropped test tables in database "sysbench" to release the disk space on master.
 [root@zlm2 :: /data/mysql/mysql3306/logs]
#sysbench oltp_read_write.lua --mysql-host=192.168.1.101 --mysql-port= --mysql-user=zlm --mysql-password=zlmzlm --mysql-db=sysbench --tables= --table-size= --mysql-storage-engine=innodb cleanup
sysbench 1.0. (using bundled LuaJIT 2.1.-beta2) Dropping table 'sbtest1'... (zlm@192.168.1.101 )[(none)]>use sysbench;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A Database changed
(zlm@192.168.1.101 )[sysbench]>show tables;
+--------------------+
| Tables_in_sysbench |
+--------------------+
| hb |
| sbtest2 |
| sbtest3 |
| sbtest4 |
| sbtest5 |
+--------------------+
rows in set (0.00 sec) //Only sbtest1 was deleted.It's not enough. [root@zlm2 :: ~/sysbench-1.0/src/lua]
#sysbench oltp_read_write.lua --mysql-host=192.168.1.101 --mysql-port= --mysql-user=zlm --mysql-password=zlmzlm --mysql-db=sysbench --tables= --table-size= --mysql-storage-engine=innodb cleanup
sysbench 1.0. (using bundled LuaJIT 2.1.-beta2) Dropping table 'sbtest1'...
Dropping table 'sbtest2'...
Dropping table 'sbtest3'...
Dropping table 'sbtest4'...
Dropping table 'sbtest5'... [root@zlm2 :: ~/sysbench-1.0/src/lua]
#df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/centos-root .4G .9G .5G % / //I'd got 27% free space.
devtmpfs 488M 488M % /dev
tmpfs 497M 497M % /dev/shm
tmpfs 497M 6.6M 491M % /run
tmpfs 497M 497M % /sys/fs/cgroup
/dev/sda1 497M 118M 379M % /boot
none 87G 80G .6G % /vagrant (zlm@192.168.1.101 )[(none)]>drop database sysbench;
Query OK, row affected (0.04 sec) //Further more,I dropped the "sysbench".
The slave hung still and disk space was full.
 [root@zlm3 :: ~]
#df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/centos-root .4G .4G 20K % /
devtmpfs 488M 488M % /dev
tmpfs 497M 497M % /dev/shm
tmpfs 497M 6.5M 491M % /run
tmpfs 497M 497M % /sys/fs/cgroup
/dev/sda1 497M 118M 379M % /boot
none 87G 80G .6G % /vagrant (zlm@192.168.1.102 )[(none)]>show slave status\G
^C^C -- query aborted ^Z
[]+ Stopped mysql [root@zlm3 :: ~]
#pkill mysqld [root@zlm3 :: ~]
#./mysqld.sh [root@zlm3 :: ~]
#mysql
ERROR (HY000): Can't connect to MySQL server on '192.168.1.102' (111) [root@zlm3 :: ~]
#cd /data/mysql/mysql3306/data [root@zlm3 :: /data/mysql/mysql3306/data]
#cat error.log |tail -n
--19T08::02.581937+: [Note] InnoDB: Log scan progressed past the checkpoint lsn
--19T08::02.581958+: [Note] InnoDB: Doing recovery: scanned up to log sequence number
--19T08::02.581963+: [Note] InnoDB: Database was not shutdown normally!
--19T08::02.581965+: [Note] InnoDB: Starting crash recovery.
--19T08::02.696292+: [Note] InnoDB: Transaction was in the XA prepared state.
--19T08::02.700688+: [Note] InnoDB: Transaction was in the XA prepared state.
--19T08::02.700814+: [Note] InnoDB: transaction(s) which must be rolled back or cleaned up in total row operations to undo
--19T08::02.700821+: [Note] InnoDB: Trx id counter is
--19T08::02.701719+: [Note] InnoDB: Last MySQL binlog file position , file name mysql-bin.
--19T08::02.805965+: [Note] InnoDB: Ignoring tablespace `zlm`.`sbtest2` because the DISCARD flag is set .
--19T08::02.806462+: [Note] InnoDB: Creating shared tablespace for temporary tables
--19T08::02.807316+: [Note] InnoDB: Setting file './ibtmp1' size to MB. Physically writing the file full; Please wait ...
--19T08::02.807568+: [Note] InnoDB: Starting in background the rollback of uncommitted transactions
--19T08::02.807594+: [Note] InnoDB: Rollback of non-prepared transactions completed
--19T08::02.871396+: [Warning] InnoDB: Retry attempts for writing partial data failed.
--19T08::02.871423+: [ERROR] InnoDB: Write to file ./ibtmp1failed at offset , bytes should have been written, only were written. Operating system error number . Check that your OS and file system support files of this size. Check also that the disk is not full or a disk quota exceeded.
--19T08::02.871441+: [ERROR] InnoDB: Error number means 'No space left on device'
--19T08::02.871446+: [Note] InnoDB: Some operating system error numbers are described at http://dev.mysql.com/doc/refman/5.7/en/operating-system-error-codes.html
--19T08::02.871451+: [ERROR] InnoDB: Could not set the file size of './ibtmp1'. Probably out of disk space
--19T08::02.871456+: [ERROR] InnoDB: Unable to create the shared innodb_temporary
--19T08::02.871459+: [ERROR] InnoDB: Plugin initialization aborted with error Generic error
--19T08::03.273011+: [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1"
--19T08::03.273029+: [ERROR] Plugin 'InnoDB' init function returned error.
--19T08::03.273033+: [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
--19T08::03.273037+: [ERROR] Failed to initialize builtin plugins.
--19T08::03.273040+: [ERROR] Aborting --19T08::03.273046+: [Note] Binlog end
--19T08::03.273389+: [Note] mysqld: Shutdown complete //The mysqld process could not run again because of no free disk space.
I decided to drop all the binlogs on slave to release the disk space.
 [root@zlm3 :: /data/mysql/mysql3306]
#cd logs [root@zlm3 :: /data/mysql/mysql3306/logs]
#ls -l
total
-rw-r----- mysql mysql Jul : mysql-bin.
-rw-r----- mysql mysql Jul : mysql-bin.
-rw-r----- mysql mysql Jul : mysql-bin.
-rw-r----- mysql mysql Jul : mysql-bin.
-rw-r----- mysql mysql Jul : mysql-bin.
-rw-r----- mysql mysql Jul : mysql-bin.
-rw-r----- mysql mysql Jul : mysql-bin.
-rw-r----- mysql mysql Jul : mysql-bin.
-rw-r----- mysql mysql Jul : mysql-bin.
-rw-r----- mysql mysql Jul : mysql-bin.
-rw-r----- mysql mysql Jul : mysql-bin.
-rw-r----- mysql mysql Jul : mysql-bin.
-rw-r----- mysql mysql Jul : mysql-bin.
-rw-r----- mysql mysql Jul : mysql-bin.
-rw-r----- mysql mysql Jul : mysql-bin.
-rw-r----- mysql mysql Jul : mysql-bin.
-rw-r----- mysql mysql Jul : mysql-bin.index [root@zlm3 :: /data/mysql/mysql3306/logs]
#rm -f * [root@zlm3 :: /data/mysql/mysql3306/logs]
#ls -l
total [root@zlm3 :: ~]
#df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/centos-root .4G .5G .0G % / //The free disk space had been reduced to 47%.
devtmpfs 488M 488M % /dev
tmpfs 497M 497M % /dev/shm
tmpfs 497M 6.5M 491M % /run
tmpfs 497M 497M % /sys/fs/cgroup
/dev/sda1 497M 118M 379M % /boot
none 87G 80G .6G % /vagrant
Ran the mysqld again and dropped the database "sysbench" on slave.
 [root@zlm3 :: /data/mysql/mysql3306/logs]
#sh /root/mysqld.sh [root@zlm3 :: /data/mysql/mysql3306/logs]
#ps aux|grep mysqld
mysql 7.0 17.8 pts/ Sl : : mysqld --defaults-file=/data/mysql/mysql3306/my.cnf
root 0.0 0.0 pts/ R+ : : grep --color=auto mysqld [root@zlm3 :: /data/mysql/mysql3306/logs]
#mysql
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is
Server version: 5.7.-log MySQL Community Server (GPL) Copyright (c) , , Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. (zlm@192.168.1.102 )[(none)]>drop database sysbench;
Query OK, rows affected (0.11 sec)

Started the replication threads of slave.

 (zlm@192.168.1.102 )[(none)]>start slave;
Query OK, rows affected (0.00 sec) (zlm@192.168.1.102 )[(none)]>show slave status\G
*************************** . row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.1.101
Master_User: repl
Master_Port:
Connect_Retry:
Master_Log_File: mysql-bin.
Read_Master_Log_Pos:
Relay_Log_File: relay-bin.
Relay_Log_Pos:
Relay_Master_Log_File: mysql-bin.
Slave_IO_Running: Yes
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno:
Last_Error: Error executing row event: 'Table 'sysbench.sbtest1' doesn't exist'
Skip_Counter:
Exec_Master_Log_Pos:
Relay_Log_Space:
Until_Condition: None
Until_Log_File:
Until_Log_Pos:
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno:
Last_IO_Error:
Last_SQL_Errno:
Last_SQL_Error: Error executing row event: 'Table 'sysbench.sbtest1' doesn't exist' //Since the database had been droppted.This error was notable.
Replicate_Ignore_Server_Ids:
Master_Server_Id:
Master_UUID: 1b7181ee-6eaf-11e8-998e-080027de0e0e
Master_Info_File: mysql.slave_master_info
SQL_Delay:
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State:
Master_Retry_Count:
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp: ::
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set: 1b7181ee-6eaf-11e8-998e-080027de0e0e:- //It was stuck on transaction 3714549(which contained error).
Executed_Gtid_Set: 1b7181ee-6eaf-11e8-998e-080027de0e0e:-,
5c77c31b-4add-11e8-81e2-080027de0e0e:-
Auto_Position:
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:
row in set (0.00 sec) [root@zlm3 :: ~]
#perror
MySQL error code (ER_NO_SUCH_TABLE): Table '%-.192s.%-.192s' doesn't exist //Error 1146 indicated the absence of table "sbtest1" in "sysbench" database.
//Obviously,the slave was replaying the operations relevant to this table on master.The table even the database had been dropped.
//How could I do next step?Do I have to generate a new mysqldump file and reconfigure the slave again?
//There's One thing I'm rather sure that there were no other transactions generated in the whole course except the operations on "sysbench" database.
//Since I'd drop "sysbentch" database on both master and slave.Maybe I can fix the issue easily.

Checked the Executed_Gtid_Set on master.

 (zlm@192.168.1.101 )[(none)]>show master status;
+------------------+-----------+--------------+------------------+------------------------------------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+-----------+--------------+------------------+------------------------------------------------+
| mysql-bin. | | | | 1b7181ee-6eaf-11e8-998e-080027de0e0e:- |
+------------------+-----------+--------------+------------------+------------------------------------------------+
row in set (0.00 sec) //The executed gtid was upto "3730021".

Tryed to fix the replica of master.

 (zlm@192.168.1.102 )[(none)]>show slave status\G
*************************** . row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.1.101
Master_User: repl
Master_Port:
Connect_Retry:
Master_Log_File: mysql-bin.
Read_Master_Log_Pos:
Relay_Log_File: relay-bin.
Relay_Log_Pos:
Relay_Master_Log_File: mysql-bin.
Slave_IO_Running: Yes
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno:
Last_Error: Error executing row event: 'Table 'sysbench.sbtest1' doesn't exist'
Skip_Counter:
Exec_Master_Log_Pos:
Relay_Log_Space:
Until_Condition: None
Until_Log_File:
Until_Log_Pos:
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno:
Last_IO_Error:
Last_SQL_Errno:
Last_SQL_Error: Error executing row event: 'Table 'sysbench.sbtest1' doesn't exist'
Replicate_Ignore_Server_Ids:
Master_Server_Id:
Master_UUID: 1b7181ee-6eaf-11e8-998e-080027de0e0e
Master_Info_File: mysql.slave_master_info
SQL_Delay:
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State:
Master_Retry_Count:
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp: ::
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set: 1b7181ee-6eaf-11e8-998e-080027de0e0e:-
Executed_Gtid_Set:
Auto_Position:
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:
row in set (0.00 sec) (zlm@192.168.1.102 )[(none)]>reset master;
Query OK, rows affected (0.02 sec) (zlm@192.168.1.102 )[(none)]>set @@global.gtid_purged='1b7181ee-6eaf-11e8-998e-080027de0e0e:1-3730021';
Query OK, rows affected (0.00 sec) //On account of surely knowing there were no other transactions at all.I set the "gtid_purged" variable to the value of "gtid_executed" on master.
//It means I guised that all the transactions generated on master had been replayed on slave already.The slave could retrieve new GTID at the moment. (zlm@192.168.1.102 )[(none)]>start slave sql_thread;
Query OK, rows affected (0.02 sec) (zlm@192.168.1.102 )[(none)]>show slave status\G
*************************** . row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.1.101
Master_User: repl
Master_Port:
Connect_Retry:
Master_Log_File: mysql-bin.
Read_Master_Log_Pos:
Relay_Log_File: relay-bin.
Relay_Log_Pos:
Relay_Master_Log_File: mysql-bin.
Slave_IO_Running: Yes
Slave_SQL_Running: Yes //The sql_thread became "Yes".
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno:
Last_Error:
Skip_Counter:
Exec_Master_Log_Pos:
Relay_Log_Space:
Until_Condition: None
Until_Log_File:
Until_Log_Pos:
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master:
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno:
Last_IO_Error:
Last_SQL_Errno:
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id:
Master_UUID: 1b7181ee-6eaf-11e8-998e-080027de0e0e
Master_Info_File: mysql.slave_master_info
SQL_Delay:
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
Master_Retry_Count:
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set: 1b7181ee-6eaf-11e8-998e-080027de0e0e:-
Executed_Gtid_Set: 1b7181ee-6eaf-11e8-998e-080027de0e0e:- //The slave had skipped those GTID(which contained error 1146) of master and waited for newer GTID.The replica had been fixed up.
Auto_Position:
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:
row in set (0.00 sec)

Summary

  • The variable "gtid_purged" cannot be set if "gtid_executed" is not empty.
  • Caution,"reset master" can only be used on slave.Keep in mind that don't do it on master anytime.
  • This case can be followed only in test environment 'cause you cannot guarantee whether all the transactions are really replayed on slave.

GTID环境中手动修复主从故障一例(Error 1146)的更多相关文章

  1. GTID环境中手动修复主从故障一例(Error 1236/Error 1396)

      Preface       I got an replication error 1236 when I modified the password of a user without start ...

  2. SqlServer 禁止架构更改的复制中手动修复使发布和订阅中分别增加的字段同步

    原文:SqlServer 禁止架构更改的复制中手动修复使发布和订阅中分别增加的字段同步 由于之前的需要,禁止了复制架构更改,以至在发布中添加一个字段,并不会同步到订阅中,而现在又在订阅中添加了一个同名 ...

  3. 企业运维 | MySQL关系型数据库在Docker与Kubernetes容器环境中快速搭建部署主从实践

    [点击 关注「 WeiyiGeek」公众号 ] 设为「️ 星标」每天带你玩转网络安全运维.应用开发.物联网IOT学习! 希望各位看友[关注.点赞.评论.收藏.投币],助力每一个梦想. 本章目录 目录 ...

  4. 业务零影响!如何在Online环境中巧用MySQL传统复制技术【转】

    业务零影响!如何在Online环境中巧用MySQL传统复制技术 这篇文章我并不会介绍如何部署一个MySQL复制环境或keepalived+双主环境,因为此类安装搭建的文章已经很多,大家也很熟悉.在这篇 ...

  5. .NET 环境中使用RabbitMQ RabbitMQ与Redis队列对比 RabbitMQ入门与使用篇

    .NET 环境中使用RabbitMQ   在企业应用系统领域,会面对不同系统之间的通信.集成与整合,尤其当面临异构系统时,这种分布式的调用与通信变得越发重要.其次,系统中一般会有很多对实时性要求不高的 ...

  6. 5.7 并行复制配置 基于GTID 搭建中从 基于GTID的备份与恢复,同步中断处理

    5.7 并行复制配置 基于GTID 搭建中从 基于GTID的备份与恢复,同步中断处理 这个文章包含三个部分 1:gtid的多线程复制2:同步中断处理3:GTID的备份与恢复 下面文字相关的东西 大部分 ...

  7. 生产环境中的kubernetes 优先级与抢占

    kubernetes 中的抢占功能是调度器比较重要的feature,但是真正使用起来还是比较危险,否则很容易把低优先级的pod给无辜kill.为了提高GPU集群的资源利用率,决定勇于尝试一番该feat ...

  8. Redis 哨兵模式实现主从故障互切换

    200 ? "200px" : this.width)!important;} --> 介绍 Redis Sentinel 是一个分布式系统, 你可以在一个架构中运行多个 S ...

  9. 生产环境中nginx既做web服务又做反向代理

    一.写对于初入博客园的感想 众所周知,nginx是一个高性能的HTTP和反向代理服务器,在以前工作中要么实现http要么做反向代理或者负载均衡.尚未在同一台nginx或者集群上同时既实现HTTP又实现 ...

随机推荐

  1. 微信的 rpx

    微信小程序新单位rpx与自适应布局   rpx是微信小程序新推出的一个单位,按官方的定义,rpx可以根据屏幕宽度进行自适应,在rpx出现之前,web页面的自适应布局已经有了多种解决方案,为什么微信还捣 ...

  2. 在vue中使用animate库

    <style> @keyframes bounce-in { 0% { transform: scale(0); } 50% { transform: scale(1.5) } 100% ...

  3. 虚方法(virsual method)

    虚方法(virsual method)挺起来玄乎其玄,向从未听说过这个概念的人解释清楚是一件相当困难的事情. 因为这是一个很不容易理解的概念,但它在比较抽象的代码里边是不可少的. 那么既然用枯燥的文字 ...

  4. 几位it 前辈的博客

    赵劼 http://blog.zhaojie.me/?page=2 陈硕 http://www.cnblogs.com/Solstice/ 轮子哥 http://www.cnblogs.com/gen ...

  5. java的四个元注解 @Retention @Target @Document @Inherited

    1.  @Retention  :注解的保留位置 @Retention(RetentionPolicy.SOURCE)  //注解仅存在于源码中,在class字节码文件中不包含 @Retention( ...

  6. javaweb基础(35)_jdbc处理oracl大数据

    一.Oracle中大数据处理 在Oracle中,LOB(Large Object,大型对象)类型的字段现在用得越来越多了.因为这种类型的字段,容量大(最多能容纳4GB的数据),且一个表中可以有多个这种 ...

  7. C# 命名空间与语句

    C#采用命名空间(namespace)来组织程序.命名空间可以嵌套.using指示符可以用来简化命名空间类型的引用.using指示符有两种用法."using System;"语句可 ...

  8. Awt & Swing

    AWT 是抽象窗口组件工具包,是 java 最早的用于编写图形节目应用程序的开发包. Swing 是为了解决 AWT 存在的问题而新开发的包,它以 AWT 为基础的. 具体的说就是: AWT 是Abs ...

  9. ios xmppFramework框架的导入步骤和介绍

    一个将要开发xmpp的项目,建议在项目刚创建就导入框架,这样可以避免一些自己操作失误造成不必要的损失. xmpp中最常用的框架就是 xmppFrameWork 第一种方法直接拖 1> 拖入文件夹 ...

  10. JS浏览器的三种弹框:

    1.alert:使用alert弹框提示信息,最后都会被转化为字符串输出(因为调用了toString这个方法).比如alert(1+1)弹出的结果应该是字符串形式的“2”. 2.Confirm:在ale ...