使用pt-table-checksum校验MySQL主从复制【转】

pt-table-checksum是一个基于MySQL数据库主从架构在线数据一致性校验工具。其工作原理在主库上运行，通过对同步的表在主从段执行checksum，从而判断数据是否一致。在校验完毕时，该工具将列出与主库存在差异的对象结果。

一、主从不一致的情形

    Master端使用了不确定的语句(如:CURRENT_USER(), UUID())

    不正确的故障转移(failover)流程

    误操作或直接在Slave进行DML操作

    持续的升级更新(Rolling upgrades)

    混合使用事务引擎和非事务引擎的表

    跳过了复制事件 (SET GLOBAL SQL_SLAVE_SKIP_COUNTER = N)

    使用临时表(Temporary Tables)

    复制过滤(Replication Filters)

    使用含LIMIT且没有order by的更新语句(update/delete with LIMIT clause without order by)

二、pt-table-checksum特性

    pt-table-checksum connects to the server you specify, and finds databases and tables that

    match the filters you specify  (if any). It works one table at a time, so it does not accumulate

    large amounts of memory or do a lot of work before beginning to checksum. This makes it usable

    on very large servers. We have used it on servers with hundreds of thousands of databases and tables,

    and trillions of rows. No matter how large the server is, pt-table-checksum works equally well.

    One reason it can work on very large tables is that it divides each table into chunks of rows,

    and checksums each chunk with a single REPLACE..SELECT query. It varies the chunk size to make

    the checksum queries run in the desired amount of time. The goal of chunking the tables, instead of

    doing each table with a single big query, is to ensure that checksums are unintrusive and don’t cause too

    much replication lag or load on the server. That’s why the target time for each chunk is 0.5 seconds by default.

    The tool keeps track of how quickly the server is able to execute the queries, and adjusts the chunks

    as it learns more about the server’s performance. It uses an exponentially decaying weighted average

    to keep the chunk size stable, yet remain responsive if the server’s performance changes during checksumming

    for any reason. This means that the tool will quickly throttle itself if your server becomes heavily loaded during

    a trafficc spike or a background task, for example.

    After pt-table-checksum finishes checksumming all of the chunks in a table, it pauses and waits for all

    detected replicas to finish executing the checksum queries. Once that is finished, it checks all of the replicas to

    see if they have the same data as the master, and then prints a line of output with the results.

三、演示pt-table-checksum

-- 环境：Master 192.168.1.8， Slave 192.168.1.12，主从已构建

-- 演示中，mysql提示符为：用户名@主机名[库名]

-- 如master@localhost[test]，表示master用户表示在主，slave表示用户在slave上

-- 复制过滤器如下：

[root@vdbsrv4 ~]# mysql -uroot -p -e "show slave status\G"|grep "Replicate

Enter password:

              Replicate_Do_DB: sakila,test

          Replicate_Ignore_DB: mysql

a、环境准备

--对用于执行checksum的用户授权，注，如果主从复制未开启mysql系统库复制，则从库也同样执行用户创建

master@localhost[test]> grant select, process, super, replication slave on *.* to

 ->  'checksums'@'192.168.1.%' identified by 'xxx';

Query OK, 0 rows affected (0.00 sec)

--主库建表及插入记录

master@localhost[test]> create table t(id tinyint primary key auto_increment,ename varchar(20));

Query OK, 0 rows affected (0.01 sec)

master@localhost[test]> insert into t(ename) values('Leshami'),('Henry'),('Jack');

Query OK, 3 rows affected (0.01 sec)

Records: 3  Duplicates: 0  Warnings: 0

--从库查询结果

slave@localhost[test]> select * from t;

+----+---------+

| id | ename  |

+----+---------+

|  1 | Leshami |

|  2 | Henry  |

|  3 | Jack    |

+----+---------+

--模拟数据不一致,slave端删除记录

slave@localhost[test]> delete from t where id=2;

b、单表校验

-- 执行pt-table-checksum

[root@vdbsrv3 ~]# pt-table-checksum h='192.168.1.8',u='checksums',p='xxx',P=3306 \

> -dtest -tt --nocheck-replication-filters \

> --no-check-binlog-format  --replicate=test.checksum

            TS ERRORS  DIFFS    ROWS  CHUNKS SKIPPED    TIME TABLE

08-06T10:14:32      0      1        3      1      0  0.031 test.t

TS            ：完成检查的时间。

ERRORS        ：检查时候发生错误和警告的数量。

DIFFS        ：0表示一致，1表示不一致。当指定--no-replicate-check时，

                会一直为0，当指定--replicate-check-only会显示不同的信息。

ROWS          ：表的行数。

CHUNKS        ：被划分到表中的块的数目。

SKIPPED      ：由于错误或警告或过大，则跳过块的数目。

TIME          ：执行的时间。

TABLE        ：被检查的表名。

--基于从库端SQL脚本查看checksum结果

slave@localhost[test]> system more check_sync_stat.sql;

SELECT

    db, tbl, SUM(this_cnt) AS total_rows, COUNT(*) AS chunks

FROM

    test.checksum

WHERE

    (master_cnt <> this_cnt

        OR master_crc <> this_crc

        OR ISNULL(master_crc) <> ISNULL(this_crc))

GROUP BY db , tbl;

slave@localhost[test]> source check_sync_stat.sql;

+------+-----+------------+--------+

| db  | tbl | total_rows | chunks |

+------+-----+------------+--------+

| test | t  |          2 |      1 |

+------+-----+------------+--------+

--从库端插入记录

slave@localhost[test]> insert into t(ename) values('Robin');

Query OK, 1 row affected (0.00 sec)

slave@localhost[test]> select * from t;

+----+---------+

| id | ename  |

+----+---------+

|  1 | Leshami |   #Author : Leshami

|  3 | Jack    |   #Blog     : http://blog.csdn.net/leshami

|  4 | Robin  |

+----+---------+

-- 再次在master端执行pt-table-checksum(此处略)，后查看结果如下

slave@localhost[test]> source check_sync_stat.sql;

+------+-----+------------+--------+

| db  | tbl | total_rows | chunks |

+------+-----+------------+--------+

| test | t  |          3 |      1 |

+------+-----+------------+--------+

b、查看pt-table-checksum工作原理

-- 使用--explain参数，不执行checksum，列出checksum时真正执行的SQL语句

Show, but do not execute, checksum queries (disables --[no]empty-replicate-table). If specifed

twice, the tool actually iterates through the chunking algorithm, printing the upper and lower boundary values

for each chunk, but not executing the checksum queries.

[root@vdbsrv3 ~]# pt-table-checksum h='192.168.1.8',u='checksums',p='xxx',P=3306 \

> -dtest -tt --nocheck-replication-filters \

> --no-check-binlog-format  --replicate=test.checksum --explain

--

-- test.t

--

REPLACE INTO `test`.`checksum` (db, tbl, chunk, chunk_index, lower_boundary,

upper_boundary, this_cnt, this_crc) SELECT ?, ?, ?, ?, ?, ?, COUNT(*) AS cnt,

COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `id`, `ename`,

CONCAT(ISNULL(`ename`)))) AS UNSIGNED)), 10, 16)), 0) AS crc FROM `test`.`t`

  /*checksum table*/

c、库级别校验

[root@vdbsrv3 ~]# pt-table-checksum h='192.168.1.8',u='checksums',p='xxx',P=3306 \

> --databases=sakila --nocheck-replication-filters --no-check-binlog-format \

> --replicate=test.checksum

            TS ERRORS  DIFFS    ROWS  CHUNKS SKIPPED    TIME TABLE

08-06T13:52:17      0      0      200      1      0  0.083 sakila.actor

08-06T13:52:17      0      0      603      1      0  0.024 sakila.address

08-06T13:52:17      0      0      16      1      0  0.012 sakila.category

08-06T13:52:17      0      0      600      1      0  0.025 sakila.city

08-06T13:52:17      0      0      109      1      0  0.019 sakila.country

08-06T13:52:17      0      0      599      1      0  0.019 sakila.customer

08-06T13:52:17      0      0    1000      1      0  0.035 sakila.film

08-06T13:52:17      0      0    5462      1      0  0.295 sakila.film_actor

08-06T13:52:17      0      0    1000      1      0  0.019 sakila.film_category

08-06T13:52:17      0      0    1000      1      0  0.015 sakila.film_text

08-06T13:52:17      0      0    4581      1      0  0.041 sakila.inventory

08-06T13:52:17      0      0        6      1      0  0.012 sakila.language

08-06T13:52:18      0      0    16049      1      0  0.367 sakila.payment

08-06T13:52:18      0      0    16044      1      0  0.357 sakila.rental

08-06T13:52:18      0      0        2      1      0  0.013 sakila.staff

08-06T13:52:18      0      0        2      1      0  0.012 sakila.store

--在从库删除一张表

slave@localhost[test]> drop table sakila.payment;

Query OK, 0 rows affected (0.01 sec)

-- 再次执行pt-table-checksum，收到如下提示

08-06T13:56:42 Skipping table sakila.payment because it has problems on these replicas:

Table sakila.payment does not exist on replica vdbsrv4

This can break replication.  If you understand the risks, specify --no-check-slave-tables to disable this check.

08-06T13:56:42 Error checksumming table sakila.payment: DBD::mysql::db selectrow_hashref failed:

Table 'sakila.payment' doesn't exist

[for Statement "EXPLAIN SELECT * FROM `sakila`.`payment` WHERE 1=1"] at /usr/bin/pt-table-checksum line 6530.

d、多从校验

-- 下面演示多个从库时主从一致性校验

-- 缺省情况下

-- 参数：--recursion-method ; type: array; default: processlist,hosts.

--            Preferred recursion method for discovering replicas.

--  pt-table-checksum performs several “REPLICACHECKS” before and while running.

master@localhost[(none)]> show slave hosts;

+-----------+------+------+-----------+--------------------------------------+

| Server_id | Host | Port | Master_id | Slave_UUID                          |

+-----------+------+------+-----------+--------------------------------------+

|        11 |      | 3307 |      1002 | 69fc46b6-3c06-11e5-94f0-000c29a05f26 |

|        1 |      | 3306 |      1002 | f2824060-e2cb-11e4-8f18-000c2926f457 |

+-----------+------+------+-----------+--------------------------------------+

root@localhost[(none)]> show variables like 'port';

+---------------+-------+

| Variable_name | Value |

+---------------+-------+

| port          | 3307  |

+---------------+-------+

root@localhost[(none)]> delete from test.t where id=1;

Query OK, 1 row affected (0.00 sec)

[root@vdbsrv3 ~]# pt-table-checksum h='192.168.1.8',u='checksums',p='xxx',P=3306 -dtest \

> -tt --nocheck-replication-filters --no-check-binlog-format --replicate=test.checksum \

> --recursion-method=hosts

# A software update is available:

#  * The current version for Percona::Toolkit is 2.2.14.

            TS ERRORS  DIFFS    ROWS  CHUNKS SKIPPED    TIME TABLE

08-06T16:12:52      0      1        3      1      0  0.034 test.t

四、参数描述

–nocheck-replication-filters
不检查复制过滤器，建议启用。后面可以用–databases来指定需要检查的数据库。
–no-check-binlog-format
不检查复制的binlog模式，要是binlog模式是ROW，则会报错。
–replicate-check-only
只显示不同步的信息。
–replicate=
把checksum的信息写入到指定表中，建议直接写到被检查的数据库当中。
–databases=
指定需要被检查的数据库，多个则用逗号隔开。
–tables=
指定需要被检查的表，多个用逗号隔开
h=127.0.0.1 ：Master的地址
u=root ：用户名
p=123456 ：密码
P=3306 ：端口

五、常见问题

[root@vdbsrv3 ~]# pt-table-checksum h='192.168.1.8',u='checksums',p='xxx',P=3306 -d mysql \

> --nocheck-replication-filters --replicate=test.checksums

Replica vdbsrv4 has binlog_format MIXED which could cause pt-table-checksum to break replication.

Please read "Replicas using row-based replication" in the LIMITATIONS section of the tool's documentation.

  If you understand the risks, specify --no-check-binlog-format to disable this check.

上面描述的是关于使用mixed日志格式时的问题  

[root@vdbsrv3 ~]# pt-table-checksum h='192.168.1.8',u='checksums',p='xxx',P=3306 -d mysql \

> --nocheck-replication-filters --no-check-binlog-format

DBD::mysql::db do failed: Access denied for user 'checksums'@'192.168.1.%' to database 'percona'

[for Statement "CREATE DATABASE IF NOT EXISTS `percona` /* pt-table-checksum */"]

at /usr/bin/pt-table-checksum line 10743.

07-29T08:42:03 --replicate database percona does not exist and it cannot be created automatically.

You need to create the database.

[root@vdbsrv3 ~]# pt-table-checksum h='192.168.1.8',u='checksums',p='xxx',P=3306 -dtest -tt \

> --nocheck-replication-filters --no-check-binlog-format  --replicate=test.checksum

Cannot connect to P=3306,h=vdbsrv4,p=...,u=checksums

Diffs cannot be detected because no slaves were found.

Please read the --recursion-method documentation for information.

            TS ERRORS  DIFFS    ROWS  CHUNKS SKIPPED    TIME TABLE

08-06T10:03:10      0      0        3      1      0  0.023 test.t

[root@vdbsrv3 ~]# pt-table-checksum h='192.168.1.8',u='checksums',p='xxx',P=3306 -dtest -tt \

> --nocheck-replication-filters --no-check-binlog-format \

> --replicate=test.checksum --recursion-method=hosts

Cannot connect to P=3306,h=,p=...,u=checksums

Cannot connect to P=3307,h=,p=...,u=checksums

Diffs cannot be detected because no slaves were found.

Please read the --recursion-method documentation for information.

            TS ERRORS  DIFFS    ROWS  CHUNKS SKIPPED    TIME TABLE

08-06T16:02:27      0      0        3      1      0  0.016 test.t

master@localhost[(none)]> show slave hosts;

+-----------+------+------+-----------+--------------------------------------+

| Server_id | Host | Port | Master_id | Slave_UUID                          |

+-----------+------+------+-----------+--------------------------------------+

|        1 |      | 3306 |      1002 | f2824060-e2cb-11e4-8f18-000c2926f457 |

|        11 |      | 3307 |      1002 | 69fc46b6-3c06-11e5-94f0-000c29a05f26 |

+-----------+------+------+-----------+--------------------------------------+

-- 增加参数report_host后重启从库

[root@vdbsrv4 ~]# grep report_host /etc/my.cnf

report_host='192.168.1.12'

master@localhost[(none)]> show slave hosts;

+-----------+--------------+------+-----------+--------------------------------------+

| Server_id | Host        | Port | Master_id | Slave_UUID                          |

+-----------+--------------+------+-----------+--------------------------------------+

|        11 | 192.168.1.12 | 3307 |      1002 | 69fc46b6-3c06-11e5-94f0-000c29a05f26 |

|        1 | 192.168.1.12 | 3306 |      1002 | f2824060-e2cb-11e4-8f18-000c2926f457 |

+-----------+--------------+------+-----------+--------------------------------------+

转自
使用pt-table-checksum校验MySQL主从复制 - CSDN博客 
https://blog.csdn.net/leshami/article/details/78377444

使用pt-table-checksum校验MySQL主从复制【转】的更多相关文章

MySQL主从复制中常见的3个错误及填坑方案
一.问题描述主从复制错误一直是MySQL DBA一直填不完的坑,如鲠在喉,也有人说mysql主从复制不稳定云云,其实MySQL复制比我们想象中要坚强得多,而绝大部分DBA却认为只要跳过错误继续复制就 ...
pt-table-checksum工具MySQL主从复制数据一致性
所使用的工具是pt-table-checksum 原理是: 在主上执行检查语句去检查 mysql主从复制的一致性,生成 replace 语句,然后通过复制传递到从库,再通过update 更新 mast ...
在线建立或重做mysql主从复制架构方法（传统模式和GTID模式）【转】
mysql主从复制架构,是mysql数据库主要特色之一,绝大多数公司都有用到. 而GTID模式是基于事务的复制模式的意思,发展到现在也是越来越多人用. 以前很多文章,介绍搭建mysql主从复制架构,是 ...
mysql主从复制实现数据库同步
mysql主从复制相信已经用得很多了,但是由于工作原因一直没怎么用过.趁着这段时间相对空闲,也就自己实现一遍.尽管互联网上已有大把类似的文章,但是自身实现的仍然值得记录. 环境: 主服务器:cento ...
mysql主从复制配置
使用mysql主从复制的好处有: 1.采用主从服务器这种架构,稳定性得以提升.如果主服务器发生故障,我们可以使用从服务器来提供服务. 2.在主从服务器上分开处理用户的请求,可以提升数据处理效率. 3. ...
【大型网站技术实践】初级篇：搭建MySQL主从复制经典架构
一.业务发展驱动数据发展随着网站业务的不断发展,用户量的不断增加,数据量成倍地增长,数据库的访问量也呈线性地增长.特别是在用户访问高峰期间,并发访问量突然增大,数据库的负载压力也会增大,如果架构方案 ...
MySQL主从复制与读写分离
MySQL主从复制(Master-Slave)与读写分离(MySQL-Proxy)实践 Mysql作为目前世界上使用最广泛的免费数据库,相信所有从事系统运维的工程师都一定接触过.但在实际的生产环境中, ...
mysql主从复制+读写分离菜鸟入门
MYsql主从复制 1.mysql主从复制原理: Master将数据变化记录到二进制日志中[binary log] Slave将master的二进制日志[binary log]拷贝到自己的中继日志[r ...
mysql主从复制（超简单）
mysql主从复制(超简单) 怎么安装mysql数据库,这里不说了,只说它的主从复制,步骤如下: 1.主从服务器分别作以下操作: 1.1.版本一致 1.2.初始化表,并在后台启动mysql ...

随机推荐

Supervised Hashing with Kernels, KSH
Notation 该论文中应用到较多符号,为避免混淆,在此进行解释: n:原始数据集的大小 l:实验中用于监督学习的数据集大小(矩阵S行/列的大小) m:辅助数据集,用于得到基于核的哈希函数 r:比特 ...
从零开始学Kotlin-数据类型（2）
从零开始学Kotlin基础篇系列文章基本数据类型 Kotlin 的基本数值类型包括 Byte.Short.Int.Long.Float.Double 等: 数据-------位宽度 Double-- ...
week5-Internetwork Layer
Technology:Internets and Packets course Layer 2 : Internet Protocol The InterNetwork Internetwork La ...
关于C++内联函数
关于C++内联函数有以下实验: 有三段测试代码 1.手动展开内联函数. 2.非内联函数. 3.inline标记的内联函数.(函数只有一行代码,以确保函数被内联) 测试三种情况: VS工程在Releas ...
第六周可执行代码以及 PSP 燃尽图等等
转眼已经第六周了.这周主要内容有下:(CHECKLIST) 1.完成未完成的功能点. 2.PSP. 3.站立会议. 4.燃尽图. 5.各种图(折线,饼图). 6.checkList 具体任务如下: 1 ...
REQUIRES_NEW 如果不在一个事务那么自己创建一个事务如果在一个事务中自己在这个大事务里面在创建一个子事务相当于嵌套事务双层循环那种
REQUIRES_NEW 如果不在一个事务那么自己创建一个事务如果在一个事务中自己在这个大事务里面在创建一个子事务相当于嵌套事务双层循环那种不管是否存在事务,业务方法总会自己开启一个事 ...
【设计模式】—— 装饰模式Decorator
前言:[模式总览]——————————by xingoo 模式意图在不改变原来类的情况下,进行扩展. 动态的给对象增加一个业务功能,就功能来说,比生成子类更方便. 应用场景 1 在不生成子类的情况下 ...
Integration Guide
This document, along with the samples and Javadoc™ in the IBM Sametime Software Development Kit (SDK ...
02 使用Mybatis的逆向工程自动生成代码
1.逆向工程的作用 Mybatis 官方提供了逆向工程,可以针对数据库表自动生成Mybatis执行所需要的代码(包括mapper.xml.Mapper.java.pojo). 2.逆向工程的使用方法 ...
POJ 2584 T-Shirt Gumbo
T-Shirt Gumbo Time Limit: 1000MS Memory Limit: 65536K Total Submissions: 3689 Accepted: 1755 Des ...