There are several possibilities to end up with inconsistent MySQL replication, This could be accidental or intentional. In this blog I would like to discuss on how to identify the inconsistent slaves with master and fix them. I used here pt-table-checksum (to find the difference between master and slave) and pt-table-sync (to sync. between MySQL master and slave) from Percona Toolkit, The detailed documentation of Percona toolkit is available here for your reference. I expect / recommend you to be careful (as I mentioned above, sometimes records are inserted / deleted on MySQL slave intentionally) before using pt-table-checksum to sync. slave with master because rollbacking this task is even more expensive. The objective of this blog is to show you how to find differences between master and slave in an MySQL replication infrastructure, If you have decided to sync. slave with master then please proceed with pt-table-sync tool. Both pt-table-checksum and pt-table-sync are highly customizable tools, I have used very simple form of them in the examples below:
Master – 192.168.56.3
Slave – 192.168.56.4
Percona Toolkit Version – 3.0.12
MySQL Version -MySQL GA 5.7.23
I have created a user in the master to check and repair:
GRANT ALL ON *.* to chksum@'%' identified by 'Password@2018';
In case you have non default ports (3306) for MySQL, Please set the following variables on the slaves:
# my.cnf [mysqld] report_host = slave report_port = 3307
Confirm data inconsistencies in Master (192.168.56.3) and Slave (192.168.56.4):
Master
mysql> select count(1) from titles; +----------+ | count(1) | +----------+ | 443308 | +----------+ 1 row in set (0.09 sec)
Slave
mysql> select count(1) from titles; +----------+ | count(1) | +----------+ | 443311 | +----------+ 1 row in set (0.09 sec)
Check data inconsistencies using pt-table-checksum:
Check for data inconsistencies by executing the following command** on the Master:
** command I have executed below :- pt-table-checksum h=192.168.56.3,u=chksum,p=Password@2018,P=3306 –set-vars innodb_lock_wait_timeout=30 –no-check-binlog-format –databases=employees –tables=titles
[root@localhost ~]# pt-table-checksum h=192.168.56.3,u=chksum,p=Password@2018,P=3306 --set-vars innodb_lock_wait_timeout=30 --no-check-binlog-format --databases=employees --tables=titles Checking if all tables can be checksummed ... Starting checksum ... TS ERRORS DIFFS ROWS DIFF_ROWS CHUNKS SKIPPED TIME TABLE 09-20T22:53:02 0 2 443308 5 6 0 1.319 employees.titles
Fixing data inconsistencies using pt-table-checksum
We are checking data inconsistency from Master (192.168.56.3) to Slave (192.168.56.3) by executing the following command** on the Master:
** command I have executed below :- pt-table-sync –sync-to-master h=192.168.56.4,u=chksum,p=Password@2018,P=3306 –databases=employees –tables=titles –print
[root@localhost ~]# pt-table-sync --sync-to-master h=192.168.56.4,u=chksum,p=Password@2018,P=3306 --databases=employees --tables=titles --print REPLACE INTO `employees`.`titles`(`emp_no`, `title`, `from_date`, `to_date`) VALUES ('10144', 'Senior Staff', '1992-10-14', '1993-08-10') /*percona-toolkit src_db:employees src_tbl:titles src_dsn:P=3306,h=192.168.56.3,p=...,u=chksum dst_db:employees dst_tbl:titles dst_dsn:P=3306,h=192.168.56.4,p=...,u=chksum lock:1 transaction:1 changing_src:1 replicate:0 bidirectional:0 pid:3789 user:root host:localhost.localdomain*/; REPLACE INTO `employees`.`titles`(`emp_no`, `title`, `from_date`, `to_date`) VALUES ('10144', 'Staff', '1985-10-14', '1992-10-14') /*percona-toolkit src_db:employees src_tbl:titles src_dsn:P=3306,h=192.168.56.3,p=...,u=chksum dst_db:employees dst_tbl:titles dst_dsn:P=3306,h=192.168.56.4,p=...,u=chksum lock:1 transaction:1 changing_src:1 replicate:0 bidirectional:0 pid:3789 user:root host:localhost.localdomain*/; DELETE FROM `employees`.`titles` WHERE `emp_no`='87000' AND `title`='Staff Engineer' AND `from_date`='1990-01-01' LIMIT 1 /*percona-toolkit src_db:employees src_tbl:titles src_dsn:P=3306,h=192.168.56.3,p=...,u=chksum dst_db:employees dst_tbl:titles dst_dsn:P=3306,h=192.168.56.4,p=...,u=chksum lock:1 transaction:1 changing_src:1 replicate:0 bidirectional:0 pid:3789 user:root host:localhost.localdomain*/; DELETE FROM `employees`.`titles` WHERE `emp_no`='97000' AND `title`='Database Engineer' AND `from_date`='1991-01-01' LIMIT 1 /*percona-toolkit src_db:employees src_tbl:titles src_dsn:P=3306,h=192.168.56.3,p=...,u=chksum dst_db:employees dst_tbl:titles dst_dsn:P=3306,h=192.168.56.4,p=...,u=chksum lock:1 transaction:1 changing_src:1 replicate:0 bidirectional:0 pid:3789 user:root host:localhost.localdomain*/; DELETE FROM `employees`.`titles` WHERE `emp_no`='97500' AND `title`='Project Manager' AND `from_date`='1983-04-11' LIMIT 1 /*percona-toolkit src_db:employees src_tbl:titles src_dsn:P=3306,h=192.168.56.3,p=...,u=chksum dst_db:employees dst_tbl:titles dst_dsn:P=3306,h=192.168.56.4,p=...,u=chksum lock:1 transaction:1 changing_src:1 replicate:0 bidirectional:0 pid:3789 user:root host:localhost.localdomain*/; DELETE FROM `employees`.`titles` WHERE `emp_no`='97501' AND `title`='Project Manager' AND `from_date`='1983-04-11' LIMIT 1 /*percona-toolkit src_db:employees src_tbl:titles src_dsn:P=3306,h=192.168.56.3,p=...,u=chksum dst_db:employees dst_tbl:titles dst_dsn:P=3306,h=192.168.56.4,p=...,u=chksum lock:1 transaction:1 changing_src:1 replicate:0 bidirectional:0 pid:3789 user:root host:localhost.localdomain*/; DELETE FROM `employees`.`titles` WHERE `emp_no`='97502' AND `title`='Project Engineer' AND `from_date`='1993-04-11' LIMIT 1 /*percona-toolkit src_db:employees src_tbl:titles src_dsn:P=3306,h=192.168.56.3,p=...,u=chksum dst_db:employees dst_tbl:titles dst_dsn:P=3306,h=192.168.56.4,p=...,u=chksum lock:1 transaction:1 changing_src:1 replicate:0 bidirectional:0 pid:3789 user:root host:localhost.localdomain*/; [root@localhost ~]#
To fix inconsistencies on MySQL Master against the Slave execute the following command on the Master:
[root@localhost ~]# pt-table-sync --sync-to-master h=192.168.56.4,u=chksum,p=Password@2018,P=3306 --databases=employees --tables=titles --execute
Confirm the data inconsistencies in Master (192.168.56.3) and Slave (192.168.56.4) are fixed:
Master
mysql> select count(1) from titles; +----------+ | count(1) | +----------+ | 443308 | +----------+ 1 row in set (0.09 sec)
Slave
mysql> select count(1) from titles; +----------+ | count(1) | +----------+ | 443308 | +----------+ 1 row in set (0.09 sec)
Conclusion
I recommend / encourage my customers to perform pt-table-checksum exercise regularly in their MySQL replication infrastructure to avoid unpleasant experiences due to data consistency issues.