Howto make MySQL point-in-time recovery faster ?

on
Share on FacebookShare on Google+Tweet about this on TwitterShare on LinkedIn

Before explaining how you can improve the speed for performing point-in-time recovery, let’s recall what is Point-In-Time Recovery and how it’s usually performed.

Point-in-Time Recovery, PTR

Point-In-Time recovery is a technique of restoring your data until a certain point (usually until an event that you would like that has never happened).

For example, a user did a mistake and you would like to recover your data up to that mistake to revert its effects like a drop table or a massive delete.

The usual technique consists to restore the last backup and replay the binary logs up to that unfortunate “event”.

So, as you might have already realized, backups and binary logs are required 😉

The main spread technique to replace those binary logs event is to use the `mysqlbinlog` command. However, depending on your workload, this process can be quick or slow, depending on how much data there is to process. Moreover, `mysqlbinlog` parses and dumps binary logs in a single thread, therefore sequentially. Imagine you do a daily backup at midnight and one of your user inconveniently deletes  some records at 23.59… you have almost a full day of binary logs to process to be able to perform the Point-in-Time Recovery.

Boost binary log processing

Instead of using the `mysqlbinlog` utility to process our MySQL events, in 5.6 and above we have the possibility to use the MySQL server to perform this operation.
In fact, we will use the slave SQL_thread… and as some of you might have realized it already… we could then process those binary logs in parallel using multiple worker threads !

Example

We have a single server running and it’s configured to generate binary logs.

Sysbench is running oltp on 8 tables using 8 threads while we will play on another table not touched by sysbench to make the example easier to follow.

mysql> create table myusers (id int auto_increment primary key, name varchar(20));
Query OK, 0 rows affected (0.47 sec)

mysql> insert into myusers values (0,'lefred'),(0,'kennito'),(0,'dim0');
Query OK, 3 rows affected (0.36 sec)
Records: 3 Duplicates: 0 Warnings: 0

mysql> insert into myusers values (0,'flyer'),(0,'luis'),(0,'nunno');
Query OK, 3 rows affected (0.13 sec)
Records: 3 Duplicates: 0 Warnings: 0

mysql> select * from myusers;
+----+---------+
| id | name    |
+----+---------+
| 1 | lefred   |
| 2 | kennito  |
| 3 | dim0     |
| 4 | flyer    |
| 5 | luis     |
| 6 | nunno    |
+----+---------+
6 rows in set (0.05 sec)

Time for a backup ! Let’s use MEB:

[root@mysql1 mysql]# /opt/mysql/meb-4.1/bin/mysqlbackup --host=127.0.0.1 \
                  --backup-dir=/tmp/backup --user=root backup-and-apply-log

Backup is done, let’s go back to our table (sysbench is running):

mysql> insert into myusers values (0,'alfranio');
Query OK, 1 row affected (0.33 sec)

mysql> insert into myusers values (0,'vitor');
Query OK, 1 row affected (0.09 sec)

Then, oups…

delete from my users…. without where clause !

mysql> delete from myusers;
Query OK, 8 rows affected (0.23 sec)

and we don’t realize it directly, so we continue…

mysql> insert into myusers values (0,'pedro');
Query OK, 1 row affected (0.19 sec)

mysql> insert into myusers values (0,'thiago');
Query OK, 1 row affected (0.16 sec)

mysql> select * from myusers;
+----+--------+
| id | name   |
+----+--------+
| 9  | pedro  |
| 10 | thiago |
+----+--------+
2 rows in set (0.12 sec)

Ouch ! Let’s find what was the problem…

mysql> show master status;
+-------------------+-----------+--------------+------------------+----------------------------------------------+
| File              | Position  | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set                            |
+-------------------+-----------+--------------+------------------+----------------------------------------------+
| mysql1-bin.000002 | 232930764 |              |                  | 7766037d-4d1e-11e7-8a51-08002718d305:1-46525 |
+-------------------+-----------+--------------+------------------+----------------------------------------------+
1 row in set (0.12 sec)

mysql> pager grep -A 1 -B 2 'sbtest.myusers' | grep -B 4 Delete
PAGER set to 'grep -A 1 -B 2 'sbtest.myusers' | grep -B 4 Delete'
mysql> show binlog events in 'mysql1-bin.000002';
--
| mysql1-bin.000002 | 195697832 | Gtid        | 1 | 195697904 | SET @@SESSION.GTID_NEXT= '7766037d-4d1e-11e7-8a51-08002718d305:25076' |
| mysql1-bin.000002 | 195697904 | Query       | 1 | 195697978 | BEGIN                                                                 |
| mysql1-bin.000002 | 195697978 | Table_map   | 1 | 195698041 | table_id: 203 (sbtest.myusers)                                        |
| mysql1-bin.000002 | 195698041 | Delete_rows | 1 | 195698168 | table_id: 203 flags: STMT_END_F                                       |
528101 rows in set (1.97 sec)

OK, we know which GTID we should avoid (`7766037d-4d1e-11e7-8a51-08002718d305:25076`).

Time to stop MySQL, copy our binary logs somewhere (I recommend to also stream binary logs to save a live copy) and restore the backup !

[root@mysql1 ~]# systemctl stop mysqld
[root@mysql1 mysql]# mkdir /tmp/binlogs/
[root@mysql1 mysql]# cp mysql1-bin.* /tmp/binlogs/

We are still in /var/lib/mysql 😉

[root@mysql1 mysql]# rm -rf *
[root@mysql1 mysql]# /opt/mysql/meb-4.1/bin/mysqlbackup --backup-dir=/tmp/backup copy-back
[root@mysql1 mysql]# chown -R mysql. *

It’s time to add some required settings in `my.cnf`:

replicate-same-server-id=1
skip-slave-start

We can now restart MySQL and start the PTR:

[root@mysql1 mysql]# systemctl start mysqld
...
mysql> select * from sbtest.myusers;
+----+---------+
| id | name    |
+----+---------+
|  1 | lefred  |
|  2 | kennito |
|  3 | dim0    |
|  4 | flyer   |
|  5 | luis    |
|  6 | nunno   |
+----+---------+
6 rows in set (0.10 sec)

OK we are back at the backup, so it’s time to perform the PTR:

mysql> SET @@GLOBAL.GTID_PURGED='7766037d-4d1e-11e7-8a51-08002718d305:25076';

It’s time to use our binary logs as relay logs, so the first thing to do is to copy those saved earlier and rename them according:

[root@mysql1 mysql]# for i in $(ls /tmp/binlogs/*.0*) 
do  
  ext=$(echo $i | cut -d'.' -f2); 
 cp $i mysql1-relay-bin.$ext; 
done

Make sure that all the new files are referenced in `mysql1-relay-bin.index`:

[root@mysql1 mysql]# ls ./mysql1-relay-bin.0* >mysql1-relay-bin.index
[root@mysql1 mysql]# chown mysql. *relay*
mysql> CHANGE MASTER TO RELAY_LOG_FILE='mysql1-relay-bin.000001', 
       RELAY_LOG_POS=1, MASTER_HOST='dummy';
Query OK, 0 rows affected (4.98 sec)

Performance

Now to benefit from replication’s internals, we will use parallel appliers.

If you don’t have your workload distributed in multiple databases, since 5.7, it’s better to use a different slave parallel type than the default value before starting the `SQL_THREAD` :

mysql> SET GLOBAL SLAVE_PARALLEL_TYPE='LOGICAL_CLOCK';
mysql> SET GLOBAL SLAVE_PARALLEL_WORKERS=8;

Now you can start the replication using the new relay logs:

mysql> START SLAVE SQL_THREAD;

It’s possible to monitor the parallel applying using the following query in performance_schema:

mysql> select * from performance_schema.replication_applier_status_by_worker\G

We can now check that we could rebuild our complete table just ignoring the bad transaction that was a mistake:

mysql> select * from sbtest.myusers;
+----+----------+
| id | name     |
+----+----------+
|  1 | lefred   |
|  2 | kennito  |
|  3 | dim0     |
|  4 | flyer    |
|  5 | luis     |
|  6 | nunno    |
|  7 | alfranio |
|  8 | vitor    |
|  9 | pedro    |
| 10 | thiago   |
+----+----------+

If for any reason you only want to recover until the wrong transaction and nothing after, it’s also possible, this is how to proceed after the backup’s restore (as it’s the same until then).

We need to start mysqld and copy again the binary logs as relay logs. But this time, no need to set any value to GTID_PURGED.

We setup replication as above but this time we start it differently using the keyword UNTIL:

mysql> CHANGE MASTER TO RELAY_LOG_FILE='mysql1-relay-bin.000001', 
       RELAY_LOG_POS=1, MASTER_HOST='dummy';
mysql> set global slave_parallel_type='LOGICAL_CLOCK';
mysql> SET GLOBAL SLAVE_PARALLEL_WORKERS=8;
mysql> START SLAVE SQL_THREAD UNTIL 
       SQL_BEFORE_GTIDS = '7766037d-4d1e-11e7-8a51-08002718d305:25076';

This time, we will replicate until that GTID and then stop the SQL_THREAD.

In both cases, don’t forget after having performed the PTR, to reset all slave information:

mysq> RESET SLAVE ALL;

Conclusion

Of course this is not (yet?) the standard way of doing PTR. Usually, people use mysqlbinlog and replay it through a MySQL client. But this is a nice hack that in some cases may save a lot of time.

Share on FacebookShare on Google+Tweet about this on TwitterShare on LinkedIn

10 thoughts on “Howto make MySQL point-in-time recovery faster ?

  1. Hi Sir,

    I just repeated your step in MySQL 5.7.18,but it is not working.

    According to my understanding,for the method No 1,if you set gtid_purged like you mentioned, the records of id 7 and 8 will be lost。

    for the method No 2, if i just set “START SLAVE SQL_THREAD UNTIL SQL_BEFORE_GTIDS = ‘7766037d-4d1e-11e7-8a51-08002718d305:25076′;”
    all the previous statments would be replay,included the create table,which would definitely break the SQL Thread.

    so before that,i set the gtid_purged first,ignording all the dmls when i backup,then i start SQL Thread,but is not cannot replicate normally.

    here is the output in my test,

    mysql> show slave status\G
    *************************** 1. row ***************************
    Slave_IO_State:
    Master_Host: dummy
    Master_User:
    Master_Port: 3306
    Connect_Retry: 60
    Master_Log_File:
    Read_Master_Log_Pos: 4
    Relay_Log_File: node3-relay-bin.000002
    Relay_Log_Pos: 708
    Relay_Master_Log_File:
    Slave_IO_Running: No
    Slave_SQL_Running: No
    Replicate_Do_DB:
    Replicate_Ignore_DB:
    Replicate_Do_Table:
    Replicate_Ignore_Table:
    Replicate_Wild_Do_Table:
    Replicate_Wild_Ignore_Table:
    Last_Errno: 0
    Last_Error:
    Skip_Counter: 0
    Exec_Master_Log_Pos: 708
    Relay_Log_Space: 2859
    Until_Condition: SQL_BEFORE_GTIDS
    Until_Log_File:
    Until_Log_Pos: 0
    Master_SSL_Allowed: No
    Master_SSL_CA_File:
    Master_SSL_CA_Path:
    Master_SSL_Cert:
    Master_SSL_Cipher:
    Master_SSL_Key:
    Seconds_Behind_Master: NULL
    Master_SSL_Verify_Server_Cert: No
    Last_IO_Errno: 0
    Last_IO_Error:
    Last_SQL_Errno: 0
    Last_SQL_Error:
    Replicate_Ignore_Server_Ids:
    Master_Server_Id: 0
    Master_UUID:
    Master_Info_File: /var/lib/mysql/master.info
    SQL_Delay: 0
    SQL_Remaining_Delay: NULL
    Slave_SQL_Running_State:
    Master_Retry_Count: 86400
    Master_Bind:
    Last_IO_Error_Timestamp:
    Last_SQL_Error_Timestamp:
    Master_SSL_Crl:
    Master_SSL_Crlpath:
    Retrieved_Gtid_Set: 71d0123f-41e7-11e7-8f92-000c299a5715:1-8
    Executed_Gtid_Set: 71d0123f-41e7-11e7-8f92-000c299a5715:3
    Auto_Position: 0
    Replicate_Rewrite_DB:
    Channel_Name:
    Master_TLS_Version:
    1 row in set (0.00 sec)

    71d0123f-41e7-11e7-8f92-000c299a5715:4 mapped to insert into myusers values (0,’alfranio’)
    71d0123f-41e7-11e7-8f92-000c299a5715:5 mapped to insert into myusers values (0,’vitor’)

    Here is my configuration:

    datadir=/var/lib/mysql
    socket=/var/lib/mysql/mysql.sock

    # Disabling symbolic-links is recommended to prevent assorted security risks
    symbolic-links=0
    server_id=2
    log-bin=mysql-server
    log-error=/var/log/mysqld.log
    pid-file=/var/run/mysqld/mysqld.pid
    gtid-mode=on
    enforce-gtid-consistency=1
    skip-slave-start=1

  2. Hi Vitor,
    Thank you for your comment.

    I’m sorry that you are not able to reproduce this solution. So let’s me reply to your 3 questions below:

    1. using GTID_PURGED:

    In fact your understanding is not correct. I remade the example and I will try to explain it again. In this new example the transaction we want to skip has 9d49bc58-5268-11e7-a2a0-08002718d305:327 as GTID.

    mysql> select * from sbtest.myusers; 
    +----+---------+
    | id | name    |
    +----+---------+
    |  1 | lefred  |
    |  2 | kennito |
    |  3 | dim0    |
    |  4 | flyer   |
    |  5 | luis    |
    |  6 | nunno   |
    +----+---------+
    6 rows in set (0.10 sec)
    
    mysql> show global variables like 'gtid%';
    +----------------------------------+--------------------------------------------+
    | Variable_name                    | Value                                      |
    +----------------------------------+--------------------------------------------+
    | gtid_executed                    | 9d49bc58-5268-11e7-a2a0-08002718d305:1-324 |
    | gtid_executed_compression_period | 1000                                       |
    | gtid_mode                        | ON                                         |
    | gtid_owned                       |                                            |
    | gtid_purged                      |                                            |
    +----------------------------------+--------------------------------------------+
    5 rows in set (0.34 sec)
    

    As you can see after the restore, gtid_executed is already set (so the DDL won’t be executed),

    Then we tell that the one we want to skip has been purged:

    mysql> SET @@GLOBAL.GTID_PURGED='9d49bc58-5268-11e7-a2a0-08002718d305:327';
    Query OK, 0 rows affected (0.14 sec)
    

    This means that :325 and :326 still need to be executed:

    mysql> show global variables like 'gtid%';
    +----------------------------------+------------------------------------------------+
    | Variable_name                    | Value                                          |
    +----------------------------------+------------------------------------------------+
    | gtid_executed                    | 9d49bc58-5268-11e7-a2a0-08002718d305:1-324:327 |
    | gtid_executed_compression_period | 1000                                           |
    | gtid_mode                        | ON                                             |
    | gtid_owned                       |                                                |
    | gtid_purged                      | 9d49bc58-5268-11e7-a2a0-08002718d305:327       |
    +----------------------------------+------------------------------------------------+
    5 rows in set (0.08 sec)
    

    We do all the work on moving the binlogs as relay logs…

    Then we start replication:

    mysql> CHANGE MASTER TO RELAY_LOG_FILE='mysql1-relay-bin.000001', 
        ->        RELAY_LOG_POS=1, MASTER_HOST='dummy';
    Query OK, 0 rows affected (1.57 sec)
    
    mysql> START SLAVE SQL_THREAD;
    Query OK, 0 rows affected (0.05 sec)
    
    mysql> select * from sbtest.myusers;
    +----+----------+
    | id | name     |
    +----+----------+
    |  1 | lefred   |
    |  2 | kennito  |
    |  3 | dim0     |
    |  4 | flyer    |
    |  5 | luis     |
    |  6 | nunno    |
    |  7 | alfranio |
    |  8 | vitor    |
    |  9 | pedro    |
    | 10 | thiago   |
    +----+----------+
    10 rows in set (0.05 sec)
    
    mysql> show global variables like 'gtid%';
    +----------------------------------+----------------------------------------------+
    | Variable_name                    | Value                                        |
    +----------------------------------+----------------------------------------------+
    | gtid_executed                    | 9d49bc58-5268-11e7-a2a0-08002718d305:1-329   |
    | gtid_executed_compression_period | 1000                                         |
    | gtid_mode                        | ON                                           |
    | gtid_owned                       |                                              |
    | gtid_purged                      | 9d49bc58-5268-11e7-a2a0-08002718d305:325-329 |
    +----------------------------------+----------------------------------------------+
    5 rows in set (0.05 sec)
    

    And as you can see they all have been executed as expected.

    For the second question, it’s again working as expected because executed gtids are already set, this is the process:

    mysql> select * from sbtest.myusers;
    +----+---------+
    | id | name    |
    +----+---------+
    |  1 | lefred  |
    |  2 | kennito |
    |  3 | dim0    |
    |  4 | flyer   |
    |  5 | luis    |
    |  6 | nunno   |
    +----+---------+
    6 rows in set (0.09 sec)
    
    mysql> CHANGE MASTER TO RELAY_LOG_FILE='mysql1-relay-bin.000001', 
        ->        RELAY_LOG_POS=1, MASTER_HOST='dummy';
    Query OK, 0 rows affected (1.83 sec)
    
    mysql> set global slave_parallel_type='LOGICAL_CLOCK';
    Query OK, 0 rows affected (0.06 sec)
    
    mysql> SET GLOBAL SLAVE_PARALLEL_WORKERS=8;
    Query OK, 0 rows affected (0.03 sec)
    
    mysql> START SLAVE SQL_THREAD UNTIL 
        ->        SQL_BEFORE_GTIDS = '9d49bc58-5268-11e7-a2a0-08002718d305:327';
    Query OK, 0 rows affected (0.36 sec)
    
    mysql> show global variables like 'gtid%';
    +----------------------------------+----------------------------------------------+
    | Variable_name                    | Value                                        |
    +----------------------------------+----------------------------------------------+
    | gtid_executed                    | 9d49bc58-5268-11e7-a2a0-08002718d305:1-326   |
    | gtid_executed_compression_period | 1000                                         |
    | gtid_mode                        | ON                                           |
    | gtid_owned                       |                                              |
    | gtid_purged                      | 9d49bc58-5268-11e7-a2a0-08002718d305:325-326 |
    +----------------------------------+----------------------------------------------+
    5 rows in set (0.00 sec)
    
    mysql> show slave status\G
    *************************** 1. row ***************************
                   Slave_IO_State: 
                      Master_Host: dummy
                      Master_User: 
                      Master_Port: 3306
                    Connect_Retry: 60
                  Master_Log_File: 
              Read_Master_Log_Pos: 4
                   Relay_Log_File: mysql1-relay-bin.000002
                    Relay_Log_Pos: 152733570
            Relay_Master_Log_File: 
                 Slave_IO_Running: No
                Slave_SQL_Running: No
                  Replicate_Do_DB: 
              Replicate_Ignore_DB: 
               Replicate_Do_Table: 
           Replicate_Ignore_Table: 
          Replicate_Wild_Do_Table: 
      Replicate_Wild_Ignore_Table: 
                       Last_Errno: 0
                       Last_Error: 
                     Skip_Counter: 0
              Exec_Master_Log_Pos: 152733570
                  Relay_Log_Space: 152734904
                  Until_Condition: SQL_BEFORE_GTIDS
                   Until_Log_File: 
                    Until_Log_Pos: 0
               Master_SSL_Allowed: No
               Master_SSL_CA_File: 
               Master_SSL_CA_Path: 
                  Master_SSL_Cert: 
                Master_SSL_Cipher: 
                   Master_SSL_Key: 
            Seconds_Behind_Master: NULL
    Master_SSL_Verify_Server_Cert: No
                    Last_IO_Errno: 0
                    Last_IO_Error: 
                   Last_SQL_Errno: 0
                   Last_SQL_Error: 
      Replicate_Ignore_Server_Ids: 
                 Master_Server_Id: 0
                      Master_UUID: 
                 Master_Info_File: /var/lib/mysql/master.info
                        SQL_Delay: 0
              SQL_Remaining_Delay: NULL
          Slave_SQL_Running_State: 
               Master_Retry_Count: 86400
                      Master_Bind: 
          Last_IO_Error_Timestamp: 
         Last_SQL_Error_Timestamp: 
                   Master_SSL_Crl: 
               Master_SSL_Crlpath: 
               Retrieved_Gtid_Set: 9d49bc58-5268-11e7-a2a0-08002718d305:1-329
                Executed_Gtid_Set: 9d49bc58-5268-11e7-a2a0-08002718d305:1-326
                    Auto_Position: 0
             Replicate_Rewrite_DB: 
                     Channel_Name: 
               Master_TLS_Version: 
    1 row in set (0.00 sec)
    

    As you can see there is no issue and it runs as expected. Of course replication (SQL thread) stops just before the GTID we want to skip.

    And finally, it seems you are missing one important setting in your my.cnf:

    replicate-same-server-id=1

    Cheers,

    1. Hi Vitor and Ali,

      If you are using xtrabackup, you will find the info in xtrabackup_binlog_info, therefor you will need to set GTID_PURGED to the content of that file and the value of the transaction you want to skip, example:

      cat xtrabackup_binlog_info
      mysql-bin.000002    1232        9d49bc58-5268-11e7-a2a0-08002718d305:1-326
      
      mysql> SET @@GLOBAL.GTID_PURGED='9d49bc58-5268-11e7-a2a0-08002718d305:1-326:327';
      

      note: here the trx we want to skip is just the next one so SET @@GLOBAL.GTID_PURGED=’9d49bc58-5268-11e7-a2a0-08002718d305:1-327′ should be used…. but you got how to make it work 😉

      Cheers

  3. Hi Lefred,
    Thanks for you reply.

    I tried again,but still not working.

    The biggest difference is here,

    After the restore, there is not value for gtid_executed,

    mysql> show global variables like ‘%gtid%’;
    +———————————-+——-+
    | Variable_name | Value |
    +———————————-+——-+
    | binlog_gtid_simple_recovery | ON |
    | enforce_gtid_consistency | ON |
    | gtid_executed | |
    | gtid_executed_compression_period | 1000 |
    | gtid_mode | ON |
    | gtid_owned | |
    | gtid_purged | |
    | session_track_gtids | OFF |
    +———————————-+——-+
    8 rows in set (0.02 sec)

    what was wrong with that?
    By the way,i use Xtrabackup for the backup and restore.
    Was it the root cause?

    And i test that GTID_PURGED can only be set if the GTID_EXECUTED is empty,
    so how can you set GTID_PURGED after the restore?

    mysql> show global variables like ‘%gtid%’;
    +———————————-+—————————————-+
    | Variable_name | Value |
    +———————————-+—————————————-+
    | binlog_gtid_simple_recovery | ON |
    | enforce_gtid_consistency | ON |
    | gtid_executed | 71d0123f-41e7-11e7-8f92-000c299a5715:6 |
    | gtid_executed_compression_period | 1000 |
    | gtid_mode | ON |
    | gtid_owned | |
    | gtid_purged | 71d0123f-41e7-11e7-8f92-000c299a5715:6 |
    | session_track_gtids | OFF |
    +———————————-+—————————————-+
    8 rows in set (0.01 sec)

    mysql> SET @@GLOBAL.GTID_PURGED= ’71d0123f-41e7-11e7-8f92-000c299a5715:7′;
    ERROR 1840 (HY000): @@GLOBAL.GTID_PURGED can only be set when @@GLOBAL.GTID_EXECUTED is empty.

    1. Hi,after you have restored from the Xtrabackup, you need first issue “reset master”
      it will purge the gtid_executed variable and also truncate the mysql.gtid_executed table.
      After that you can set your gtid_purged.

Leave a Reply

Your email address will not be published. Required fields are marked *

recent

Last Tweets Last Tweets

Locations of visitors to this page
categories