Before explaining how you can improve the speed for performing point-in-time recovery, let’s recall what is Point-In-Time Recovery and how it’s usually performed.
Point-in-Time Recovery, PTR
Point-In-Time recovery is a technique of restoring your data until a certain point (usually until an event that you would like that has never happened).
For example, a user did a mistake and you would like to recover your data up to that mistake to revert its effects like a drop table or a massive delete.
The usual technique consists to restore the last backup and replay the binary logs up to that unfortunate “event”.
So, as you might have already realized, backups and binary logs are required đ
The main spread technique to replace those binary logs event is to use the `mysqlbinlog` command. However, depending on your workload, this process can be quick or slow, depending on how much data there is to process. Moreover, `mysqlbinlog` parses and dumps binary logs in a single thread, therefore sequentially. Imagine you do a daily backup at midnight and one of your user inconveniently deletes some records at 23.59… you have almost a full day of binary logs to process to be able to perform the Point-in-Time Recovery.
Boost binary log processing
Instead of using the `mysqlbinlog` utility to process our MySQL events, in 5.6 and above we have the possibility to use the MySQL server to perform this operation.
In fact, we will use the slave SQL_thread… and as some of you might have realized it already… we could then process those binary logs in parallel using multiple worker threads !
Example
We have a single server running and it’s configured to generate binary logs.
Sysbench is running oltp on 8 tables using 8 threads while we will play on another table not touched by sysbench to make the example easier to follow.
mysql> create table myusers (id int auto_increment primary key, name varchar(20)); Query OK, 0 rows affected (0.47 sec) mysql> insert into myusers values (0,'lefred'),(0,'kennito'),(0,'dim0'); Query OK, 3 rows affected (0.36 sec) Records: 3 Duplicates: 0 Warnings: 0 mysql> insert into myusers values (0,'flyer'),(0,'luis'),(0,'nunno'); Query OK, 3 rows affected (0.13 sec) Records: 3 Duplicates: 0 Warnings: 0 mysql> select * from myusers; +----+---------+ | id | name | +----+---------+ | 1 | lefred | | 2 | kennito | | 3 | dim0 | | 4 | flyer | | 5 | luis | | 6 | nunno | +----+---------+ 6 rows in set (0.05 sec)
Time for a backup ! Let’s use MEB:
[root@mysql1 mysql]# /opt/mysql/meb-4.1/bin/mysqlbackup --host=127.0.0.1 \ --backup-dir=/tmp/backup --user=root backup-and-apply-log
Backup is done, let’s go back to our table (sysbench is running):
mysql> insert into myusers values (0,'alfranio'); Query OK, 1 row affected (0.33 sec) mysql> insert into myusers values (0,'vitor'); Query OK, 1 row affected (0.09 sec)
Then, oups…
delete from my users…. without where clause !
mysql> delete from myusers; Query OK, 8 rows affected (0.23 sec)
and we don’t realize it directly, so we continue…
mysql> insert into myusers values (0,'pedro'); Query OK, 1 row affected (0.19 sec) mysql> insert into myusers values (0,'thiago'); Query OK, 1 row affected (0.16 sec) mysql> select * from myusers; +----+--------+ | id | name | +----+--------+ | 9 | pedro | | 10 | thiago | +----+--------+ 2 rows in set (0.12 sec)
Ouch ! Let’s find what was the problem…
mysql> show master status; +-------------------+-----------+--------------+------------------+----------------------------------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set | +-------------------+-----------+--------------+------------------+----------------------------------------------+ | mysql1-bin.000002 | 232930764 | | | 7766037d-4d1e-11e7-8a51-08002718d305:1-46525 | +-------------------+-----------+--------------+------------------+----------------------------------------------+ 1 row in set (0.12 sec) mysql> pager grep -A 1 -B 2 'sbtest.myusers' | grep -B 4 Delete PAGER set to 'grep -A 1 -B 2 'sbtest.myusers' | grep -B 4 Delete' mysql> show binlog events in 'mysql1-bin.000002'; -- | mysql1-bin.000002 | 195697832 | Gtid | 1 | 195697904 | SET @@SESSION.GTID_NEXT= '7766037d-4d1e-11e7-8a51-08002718d305:25076' | | mysql1-bin.000002 | 195697904 | Query | 1 | 195697978 | BEGIN | | mysql1-bin.000002 | 195697978 | Table_map | 1 | 195698041 | table_id: 203 (sbtest.myusers) | | mysql1-bin.000002 | 195698041 | Delete_rows | 1 | 195698168 | table_id: 203 flags: STMT_END_F | 528101 rows in set (1.97 sec)
OK, we know which GTID we should avoid (`7766037d-4d1e-11e7-8a51-08002718d305:25076`).
Time to stop MySQL, copy our binary logs somewhere (I recommend to also stream binary logs to save a live copy) and restore the backup !
[root@mysql1 ~]# systemctl stop mysqld
[root@mysql1 mysql]# mkdir /tmp/binlogs/ [root@mysql1 mysql]# cp mysql1-bin.* /tmp/binlogs/
We are still in /var/lib/mysql đ
[root@mysql1 mysql]# rm -rf * [root@mysql1 mysql]# /opt/mysql/meb-4.1/bin/mysqlbackup --backup-dir=/tmp/backup copy-back [root@mysql1 mysql]# chown -R mysql. *
It’s time to add some required settings in `my.cnf`:
replicate-same-server-id=1
skip-slave-start
We can now restart MySQL and start the PTR:
[root@mysql1 mysql]# systemctl start mysqld ... mysql> select * from sbtest.myusers; +----+---------+ | id | name | +----+---------+ | 1 | lefred | | 2 | kennito | | 3 | dim0 | | 4 | flyer | | 5 | luis | | 6 | nunno | +----+---------+ 6 rows in set (0.10 sec)
OK we are back at the backup, so it’s time to perform the PTR:
mysql> SET @@GLOBAL.GTID_PURGED='7766037d-4d1e-11e7-8a51-08002718d305:25076';
It’s time to use our binary logs as relay logs, so the first thing to do is to copy those saved earlier and rename them according:
[root@mysql1 mysql]# for i in $(ls /tmp/binlogs/*.0*) do ext=$(echo $i | cut -d'.' -f2); cp $i mysql1-relay-bin.$ext; done
Make sure that all the new files are referenced in `mysql1-relay-bin.index`:
[root@mysql1 mysql]# ls ./mysql1-relay-bin.0* >mysql1-relay-bin.index [root@mysql1 mysql]# chown mysql. *relay*
mysql> CHANGE MASTER TO RELAY_LOG_FILE='mysql1-relay-bin.000001', RELAY_LOG_POS=1, MASTER_HOST='dummy'; Query OK, 0 rows affected (4.98 sec)
Performance
Now to benefit from replication’s internals, we will use parallel appliers.
If you don’t have your workload distributed in multiple databases, since 5.7, it’s better to use a different slave parallel type than the default value before starting the `SQL_THREAD` :
mysql> SET GLOBAL SLAVE_PARALLEL_TYPE='LOGICAL_CLOCK'; mysql> SET GLOBAL SLAVE_PARALLEL_WORKERS=8;
Now you can start the replication using the new relay logs:
mysql> START SLAVE SQL_THREAD;
It’s possible to monitor the parallel applying using the following query in performance_schema:
mysql> select * from performance_schema.replication_applier_status_by_worker\G
We can now check that we could rebuild our complete table just ignoring the bad transaction that was a mistake:
mysql> select * from sbtest.myusers; +----+----------+ | id | name | +----+----------+ | 1 | lefred | | 2 | kennito | | 3 | dim0 | | 4 | flyer | | 5 | luis | | 6 | nunno | | 7 | alfranio | | 8 | vitor | | 9 | pedro | | 10 | thiago | +----+----------+
If for any reason you only want to recover until the wrong transaction and nothing after, it’s also possible, this is how to proceed after the backup’s restore (as it’s the same until then).
We need to start mysqld and copy again the binary logs as relay logs. But this time, no need to set any value to GTID_PURGED.
We setup replication as above but this time we start it differently using the keyword UNTIL:
mysql> CHANGE MASTER TO RELAY_LOG_FILE='mysql1-relay-bin.000001', RELAY_LOG_POS=1, MASTER_HOST='dummy'; mysql> set global slave_parallel_type='LOGICAL_CLOCK'; mysql> SET GLOBAL SLAVE_PARALLEL_WORKERS=8; mysql> START SLAVE SQL_THREAD UNTIL SQL_BEFORE_GTIDS = '7766037d-4d1e-11e7-8a51-08002718d305:25076';
This time, we will replicate until that GTID and then stop the SQL_THREAD.
In both cases, don’t forget after having performed the PTR, to reset all slave information:
mysq> RESET SLAVE ALL;
Conclusion
Of course this is not (yet?) the standard way of doing PTR. Usually, people use mysqlbinlog and replay it through a MySQL client. But this is a nice hack that in some cases may save a lot of time.
Hi SirïŒ
I just repeated your step in MySQL 5.7.18ïŒbut it is not working.
According to my understandingïŒfor the method No 1ïŒif you set gtid_purged like you mentioned, the records of id 7 and 8 will be lostă
for the method No 2, if i just set “START SLAVE SQL_THREAD UNTIL SQL_BEFORE_GTIDS = ‘7766037d-4d1e-11e7-8a51-08002718d305:25076′;”
all the previous statments would be replayïŒincluded the create table,which would definitely break the SQL Thread.
so before that,i set the gtid_purged first,ignording all the dmls when i backup,then i start SQL Thread,but is not cannot replicate normally.
here is the output in my test,
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State:
Master_Host: dummy
Master_User:
Master_Port: 3306
Connect_Retry: 60
Master_Log_File:
Read_Master_Log_Pos: 4
Relay_Log_File: node3-relay-bin.000002
Relay_Log_Pos: 708
Relay_Master_Log_File:
Slave_IO_Running: No
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 708
Relay_Log_Space: 2859
Until_Condition: SQL_BEFORE_GTIDS
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 0
Master_UUID:
Master_Info_File: /var/lib/mysql/master.info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State:
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set: 71d0123f-41e7-11e7-8f92-000c299a5715:1-8
Executed_Gtid_Set: 71d0123f-41e7-11e7-8f92-000c299a5715:3
Auto_Position: 0
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:
1 row in set (0.00 sec)
71d0123f-41e7-11e7-8f92-000c299a5715:4 mapped to insert into myusers values (0,’alfranio’)
71d0123f-41e7-11e7-8f92-000c299a5715:5 mapped to insert into myusers values (0,’vitor’)
Here is my configurationïŒ
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0
server_id=2
log-bin=mysql-server
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid
gtid-mode=on
enforce-gtid-consistency=1
skip-slave-start=1
Hi Vitor,
Thank you for your comment.
I’m sorry that you are not able to reproduce this solution. So let’s me reply to your 3 questions below:
1. using GTID_PURGED:
In fact your understanding is not correct. I remade the example and I will try to explain it again. In this new example the transaction we want to skip has 9d49bc58-5268-11e7-a2a0-08002718d305:327 as GTID.
As you can see after the restore, gtid_executed is already set (so the DDL won’t be executed),
Then we tell that the one we want to skip has been purged:
This means that :325 and :326 still need to be executed:
We do all the work on moving the binlogs as relay logs…
…
Then we start replication:
And as you can see they all have been executed as expected.
For the second question, it’s again working as expected because executed gtids are already set, this is the process:
As you can see there is no issue and it runs as expected. Of course replication (SQL thread) stops just before the GTID we want to skip.
And finally, it seems you are missing one important setting in your my.cnf:
Cheers,
Hi Vitor and Ali,
If you are using xtrabackup, you will find the info in xtrabackup_binlog_info, therefor you will need to set GTID_PURGED to the content of that file and the value of the transaction you want to skip, example:
note: here the trx we want to skip is just the next one so SET @@GLOBAL.GTID_PURGED=’9d49bc58-5268-11e7-a2a0-08002718d305:1-327′ should be used…. but you got how to make it work đ
Cheers
OKïŒthanks for your help,I will try again.
Hi Lefred,
Thanks for you reply.
I tried again,but still not working.
The biggest difference is here,
After the restore, there is not value for gtid_executed,
mysql> show global variables like ‘%gtid%’;
+———————————-+——-+
| Variable_name | Value |
+———————————-+——-+
| binlog_gtid_simple_recovery | ON |
| enforce_gtid_consistency | ON |
| gtid_executed | |
| gtid_executed_compression_period | 1000 |
| gtid_mode | ON |
| gtid_owned | |
| gtid_purged | |
| session_track_gtids | OFF |
+———————————-+——-+
8 rows in set (0.02 sec)
what was wrong with that?
By the way,i use Xtrabackup for the backup and restore.
Was it the root cause?
And i test that GTID_PURGED can only be set if the GTID_EXECUTED is empty,
so how can you set GTID_PURGED after the restore?
mysql> show global variables like ‘%gtid%’;
+———————————-+—————————————-+
| Variable_name | Value |
+———————————-+—————————————-+
| binlog_gtid_simple_recovery | ON |
| enforce_gtid_consistency | ON |
| gtid_executed | 71d0123f-41e7-11e7-8f92-000c299a5715:6 |
| gtid_executed_compression_period | 1000 |
| gtid_mode | ON |
| gtid_owned | |
| gtid_purged | 71d0123f-41e7-11e7-8f92-000c299a5715:6 |
| session_track_gtids | OFF |
+———————————-+—————————————-+
8 rows in set (0.01 sec)
mysql> SET @@GLOBAL.GTID_PURGED= ’71d0123f-41e7-11e7-8f92-000c299a5715:7′;
ERROR 1840 (HY000): @@GLOBAL.GTID_PURGED can only be set when @@GLOBAL.GTID_EXECUTED is empty.
I got the same problemïŒ have you solved it?
not yet
Have you seen my reply above ?
HiïŒafter you have restored from the Xtrabackup, you need first issue “reset master”
it will purge the gtid_executed variable and also truncate the mysql.gtid_executed table.
After that you can set your gtid_purged.
[…] Howto make MySQL point-in-time recovery faster ? […]
[…] to the above steps, there is a similar approach that eliminates the usage of a fake master (check here). Both approaches allow us (among other […]
[…] to the above steps, there is a similar approach that eliminates the usage of a fake master (check here). Both approaches allow us (among other […]
[…] http://lefred.be/content/howto-make-mysql-point-in-time-recovery-faster/ […]
Excellent article. Hit the following roadblocks. Can you please shed some light here :
This solution expects GTID to be enabled on MySql. Did the same :
—
gtid_mode=ON
enforce_gtid_consistency=true
log_bin=/var/lib/mysql/bin_log
log_slave_updates=true
—
However, when I add “replicate-same-server-id and skip-slave-start” to my.cnf, mysql refuse to start saying :
2018-04-11 15:53:50 26953 [ERROR] using –replicate-same-server-id in conjunction with –log-slave-updates is impossible, it would lead to infinite loops in this server.
2018-04-11 15:53:50 26953 [ERROR] Aborting”
And when I disable log-slave-updates it throws below error:
“2018-04-11 15:54:57 27667 [ERROR] –gtid-mode=ON or UPGRADE_STEP_1 or UPGRADE_STEP_2 requires –log-bin and –log-slave-updates
2018-04-11 15:54:57 27667 [ERROR] Aborting”
Finally removed “skip-slave-start” from my.cnf to start database.
Please advise.
Hi, there is a bug (#89375) related to this. The easiest way is to change server_id=2 (for example) in your my.cnf.
see https://www.slideshare.net/lefred.descamps/fosdem-mysql-friends-devroomffebruary-2018-mysql-pointintime-recovery-like-a-rockstar
[…] to the above steps, there is a similar approach that eliminates the usage of a fake master (check here). Both approaches allow us (among other […]
[…] http://lefred.be/content/howto-make-mysql-point-in-time-recovery-faster/ […]
I would like to ask you a question
Just leave a comment like the previous one đ
very intersting point in time recovery in Mysql
It seems that binlog and relay log have different formats.
relay log MBP-relay-bin.000774
#210320 22:35:10 server id 10 end_log_pos 123 CRC32 0xef4ddc35 Start: binlog v 4, server v 5.7.25-log created 210320 22:35:10 at startup
# This Format_description_event appears in a relay log and was generated by the slave thread.
binlog cp mysql-bin.0000x MBP-relay-bin.000773
#210320 21:54:44 server id 10 end_log_pos 123 CRC32 0x15ed57fc Start: binlog v 4, server v 5.7.25-log created 210320 21:54:44
# Warning: this binlog is either in use or was not closed properly.
BINLOG ‘
…
CHANGE MASTER TO RELAY_LOG_FILE=’MBP-relay-bin.000773′, RELAY_LOG_POS=1, MASTER_HOST=’dummy’;
ERROR 1380 (HY000): Failed initializing relay log position: Could not find target log file mentioned in relay log info in the index file ‘./MBP-relay-bin.index’ during relay log initialization
I wrote this blog some times ago, I need to check
How’s it going?
From your error, it seems that this is not related to any file format, you are just missing the file in the MBP-relay-bin.index file.
Did you perform these operations?
[root@mysql1 mysql]# ls ./mysql1-relay-bin.0* >mysql1-relay-bin.index
[root@mysql1 mysql]# chown mysql. *relay*
hongbin@MBP ~/s/msb_5_7_25> ll data/
total 432544
-rw-r—– 1 hongbin staff 1.2K 3 20 22:33 MBP-relay-bin.000773
-rw-r—– 1 hongbin staff 177B 3 20 22:35 MBP-relay-bin.000774
-rw-r—– 1 hongbin staff 177B 3 23 12:02 MBP-relay-bin.000775
-rw-r—– 1 hongbin staff 177B 3 23 12:07 MBP-relay-bin.000776
-rw-r—– 1 hongbin staff 177B 3 23 12:17 MBP-relay-bin.000777
-rw-r—– 1 hongbin staff 177B 3 23 12:20 MBP-relay-bin.000778
-rw-r—– 1 hongbin staff 177B 3 23 12:22 MBP-relay-bin.000779
-rw-r—– 1 hongbin staff 177B 3 23 12:22 MBP-relay-bin.000780
-rw-r—– 1 hongbin staff 177B 3 23 12:23 MBP-relay-bin.000781
-rw-r—– 1 hongbin staff 177B 3 23 12:24 MBP-relay-bin.000782
-rw-r—– 1 hongbin staff 154B 3 23 12:24 MBP-relay-bin.000783
-rw-r—– 1 hongbin staff 230B 3 23 12:24 MBP-relay-bin.index
..
hongbin@MBP ~/s/msb_5_7_25 [1]> cat data/MBP-relay-bin.index
./MBP-relay-bin.000773
./MBP-relay-bin.000774
./MBP-relay-bin.000776
./MBP-relay-bin.000777
./MBP-relay-bin.000778
./MBP-relay-bin.000779
./MBP-relay-bin.000780
./MBP-relay-bin.000781
./MBP-relay-bin.000782
./MBP-relay-bin.000783
look at the ownership of the files… they should belong to MySQL user.
or you are running it with your user (dbdeployer?)
yes
or your file 000773 is corrupted… the relay and binlog format didn’t change
Thank you for the great article. I have this working, but using a replica server. Everything is fine except when the master is restarted, then the repica gets:
Cannot execute the current event group in the parallel mode. Encountered event Anonymous_Gtid, … which prevents execution of this event group in parallel mode. Reason: The master event is logically timestamped incorrectly..
I see some bugs related to this (one filed by you). Do you know of any workarounds?
https://bugs.mysql.com/bug.php?id=102945
https://bugs.mysql.com/bug.php?id=89375
Thank you
Hi Greg,
Is your replication source (aka master, but we try to banish that offensive term), using GTIDs ?
Thank you for the quick reply!
The master is not using GTIDs. We are developing a recovery strategy based on your technique. We would consider enabling them if there are no drawbacks.
The only use for replication in our situation is to recover faster to an alternate server.
-Greg
Hello lefred, thank you for your article. I’ve been looking for a way to parallelize point-in-time-recovery (the way read-replicas do it during regular replication), but your article is the only lead I’ve found.
Your article dates from 2017. Are you aware of any MySQL/MariaDB/… initiatives to provide support for parallel point-in-time-recovery out of the box? Thank you.
Hello, the technique illustrated in this article is still valid.
[…] excellent article on fast point-in-time recovery came to mind. We already established that binlogs transfer fast using scp and mysqlbinlog. If we […]
Hi Lefred,
Thanks for the post. I am testing out your method but run into an issue.
When I replay the binlogs I get this error at least once per binlog being recovered.
ERROR] [MY-010411] [Repl] Transactionâs sequence number is inconsistent with that of a preceding one: sequence_number (1) <= previous sequence_number (xxxxxxx)
Any ideas on what would cause an LSN to be out of order?
Thanks!
Which version of MySQL are you using?
Did you try to set replica_preserve_commit_order=ON?