Point-in-Time Recovery in OCI MDS with Object Storage – part 2

In part 1 of the series about Point-in-Time Recovery in OCI MDS, we saw how to stream the binary log to Object Storage.

In this blog post, we will see how we can restore the data up to certain point.

The high level process is simple:

  1. we need to restore the last backup just before the point we want to return to
  2. find the last GTID that was executed in that backup (not 100% mandatory but can accelerate the process)
  3. choose what is the objective:
    1. just skip one transaction (following transaction could depend on it)
    2. recover up to that transaction and skip all the following ones
  4. replay the binary logs taking in consideration the chosen option of point 3

The scenario

We have an application that constantly write data (sysbench) and at the same time, we have another table with our most valuable users (a very important table !!).

This is the content of our table:

 SQL > select * from vip;
+----+--------+---------------------+
| id | name   | inserted            |
+----+--------+---------------------+
|  1 | kenny  | 2021-08-26 11:46:36 |
|  2 | airton | 2021-08-26 11:46:36 |
|  3 | dave   | 2021-08-26 11:46:36 |
|  4 | miguel | 2021-08-26 11:46:36 |
|  5 | luis   | 2021-08-26 11:46:36 |
|  6 | mark   | 2021-08-26 13:33:12 |
+----+--------+---------------------+

Then, one of the operator entered the following statements:

SQL > insert into vip (name) values ('mr mike');
SQL > insert into vip (name) values ('lenka');
SQL > insert into vip (name) values ('dimo');

The operator realized he made a mistake… and wanted to update that last record:

SQL > update vip set name = "dim0";
Query OK, 9 rows affected (0.0043 sec)

The operator didn’t realize his mistake, and added another record:

SQL > insert into vip (name) values ('lefred');

The day after, another operator got some complains… and when he checked the table:

 SQL > select * from vip;
+----+--------+---------------------+
| id | name   | inserted            |
+----+--------+---------------------+
|  1 | dim0   | 2021-08-26 11:46:36 |
|  2 | dim0   | 2021-08-26 11:46:36 |
|  3 | dim0   | 2021-08-26 11:46:36 |
|  4 | dim0   | 2021-08-26 11:46:36 |
|  5 | dim0   | 2021-08-26 11:46:36 |
|  6 | dim0   | 2021-08-26 13:33:12 |
|  7 | dim0   | 2021-08-26 21:01:23 |
|  8 | dim0   | 2021-08-26 21:01:42 |
|  9 | dim0   | 2021-08-26 21:02:34 |
| 10 | lefred | 2021-08-27 14:48:13 |
+----+--------+---------------------+
10 rows in set (0.0009 sec)

Oups…

Who were the other VIPs ? Of course meanwhile many other writes happened to the database that we don’t want to loose !

So the first thing is to stop all the applications using this table (the vip one). Others can still run while we find the transaction we need to avoid…

Finding the transaction

This is of course the tricky part… we need to find the ID of the transaction that made this error. We need to try to get as much as info as possible to find it out.

In our example, we know that somebody changed all the value of the name column of table vip to ‘dim0’. It also seems that 9 rows where modified. And this seems to have happened yesterday.

Before parsing all the binary logs using mysqlbinlog, we can already try to identify a potential one having the ‘dim0’ word in it. So on the Object Storage mount point (see part 1), we do a simple grep:

[root@mysql-shell-lefred my-mds]# grep dim0 *
Binary file binary-log.000140 matches

Great, we have a candidate, let’s verify:

[root@mysql-shell-lefred my-mds]# mysqlbinlog -v \
                                  --base64-output=DECODE-ROWS \
                                  binary-log.000140 | less

We have identified the binary log file and the GTID (3fecfeaf-0651-11ec-a7e9-0200170484c8:537264). We can move forward and restore the last backup we have before 2021-08-26 21:03:12 GMT.

Backup Restore

After we have identified the backup we can restore it:

When the restore is done, it’s the finally the time to put all the applications on maintenance (unless you only want to recover the needed data, dump it and re-import it on production).

On the current production, after having put the applications on maintenance (or simply stopped), we can check the GTID set:

 SQL > select @@gtid_executed;
+-----------------------------------------------+
| @@gtid_executed                               |
+-----------------------------------------------+
| 3fecfeaf-0651-11ec-a7e9-0200170484c8:1-605971 |
+-----------------------------------------------+
1 row in set (0.0007 sec)

Point-in-Time Recovery

We need to connect to our new restored instance from our compute instance using MySQL Shell:

We can verify the value of the executed GTIDs and the content of the table we want to get back the initial data at restore time:

 SQL > select @@gtid_executed;
+-----------------------------------------------+
| @@gtid_executed                               |
+-----------------------------------------------+
| 3fecfeaf-0651-11ec-a7e9-0200170484c8:1-527817 |
+-----------------------------------------------+
1 row in set (0.0003 sec)

 SQL > select * from mydb.vip;
+----+--------+---------------------+
| id | name   | inserted            |
+----+--------+---------------------+
|  1 | kenny  | 2021-08-26 11:46:36 |
|  2 | airton | 2021-08-26 11:46:36 |
|  3 | dave   | 2021-08-26 11:46:36 |
|  4 | miguel | 2021-08-26 11:46:36 |
|  5 | luis   | 2021-08-26 11:46:36 |
|  6 | mark   | 2021-08-26 13:33:12 |
+----+--------+---------------------+
6 rows in set (0.0044 sec)

We can finally see some of the VIPs… but we are missing some of course… that all the concept of point-in-time recovery !

Now the important is also to to answer the question in point 3:

choose what is the objective:

  1. just skip one transaction (following transaction could depend on it)
  2. recover up to that transaction and skip all the following ones

In case of choice number 1, we need to add the GTID of the transaction we want to skip and then replay all the binary logs from one having the last GTID in gtid_executed on the restore (in our example: 3fecfeaf-0651-11ec-a7e9-0200170484c8:527817).

In case of option 2, we also need to replay all the binary logs from one having the last GTID in gtid_executed on the restore and stop just before the GTID we want to skip.

For both options we need to start feeding the MySQL instance with events from the same binary log, the one having the last committed transaction.

On the restored system, we need to get the first binary log file:

 SQL > show binary logs;
+-------------------+-----------+-----------+
| Log_name          | File_size | Encrypted |
+-------------------+-----------+-----------+
| binary-log.000138 |   1937572 | No        |
| binary-log.000139 |       219 | No        |
| binary-log.000140 |       244 | No        |
| binary-log.000141 |       244 | No        |
+-------------------+-----------+-----------+

And we verify is the first one contains the last transaction as expected:

[root@mysql-shell-lefred my-mds]# mysqlbinlog -v \
     --base64-output=DECODE-ROWS binary-log.000138 \
     | grep '3fecfeaf-0651-11ec-a7e9-0200170484c8:527817'
SET @@SESSION.GTID_NEXT= '3fecfeaf-0651-11ec-a7e9-0200170484c8:527817'/*!*/;

Perfect, we know we need to start from binary-log.000138.

Option 1: just skip one transaction

Now we will tell MySQL to ignore the transaction we want to skip (3fecfeaf-0651-11ec-a7e9-0200170484c8:537264). To do so we first need to retrieve the value of gtid_purged and add the one we want to avoid:

 SQL > select @@gtid_purged;
+-----------------------------------------------+
| @@gtid_purged                                 |
+-----------------------------------------------+
| 3fecfeaf-0651-11ec-a7e9-0200170484c8:1-525206 |
+-----------------------------------------------+
1 row in set (0.0005 sec)
 
 SQL > call sys.set_gtid_purged("3fecfeaf-0651-11ec-a7e9-0200170484c8:1-525206:537264")

And it’s time so send all binary logs to our instance (the new restore one):

[root@mysql-shell-lefred my-mds]# mysqlbinlog --require-row-format  \
binary-log.000138 binary-log.000139 binary-log.0001[4-9]* \
binary-log.000[2-3]* | mysql -u admin -h 10.0.1.118 -p

We started with binary-log.000138 and the last one was binary-log.000384. It’s recommended to load them all at once and not one by one as described in the manual.

If you don’t use –require-row-format in MDS you will get the following error:

ERROR 1227 (42000) at line 28: Access denied; you need (at least one of) the SUPER, SYSTEM_VARIABLES_ADMIN or SESSION_VARIABLES_ADMIN privilege(s) for this operation

This is because the binary log contains the following statements that is not allowed:

SET @@session.pseudo_thread_id=75/*!*/;

When the operation is finished, we are done ! We can verify:

 SQL > select * from vip;
+----+---------+---------------------+
| id | name    | inserted            |
+----+---------+---------------------+
|  1 | kenny   | 2021-08-26 11:46:36 |
|  2 | airton  | 2021-08-26 11:46:36 |
|  3 | dave    | 2021-08-26 11:46:36 |
|  4 | miguel  | 2021-08-26 11:46:36 |
|  5 | luis    | 2021-08-26 11:46:36 |
|  6 | mark    | 2021-08-26 13:33:12 |
|  7 | mr mike | 2021-08-26 21:01:23 |
|  8 | lenka   | 2021-08-26 21:01:42 |
|  9 | dimo    | 2021-08-26 21:02:34 |
| 10 | lefred  | 2021-08-27 14:48:13 |
+----+---------+---------------------+
10 rows in set (0.0006 sec)

We did an excellent job !

We can also compare the GTID executed on both servers:

my-mds > SQL > select @@gtid_executed, @@gtid_purged\G
*************************** 1. row ***************************
@@gtid_executed: 3fecfeaf-0651-11ec-a7e9-0200170484c8:1-605971
@@gtid_purged: 3fecfeaf-0651-11ec-a7e9-0200170484c8:1-605971

my-mds-restore > SQL > select @@gtid_executed, @@gtid_purged\G
*************************** 1. row ***************************
@@gtid_executed: 3fecfeaf-0651-11ec-a7e9-0200170484c8:1-605971
@@gtid_purged: 3fecfeaf-0651-11ec-a7e9-0200170484c8:1-527817:537264

Option 2: recover to that point

In case we wanted to restore the data exactly before the wrong statement and stop at that exact point, there is no need to change the value of gtid_purged and mysqlbinlog command would have be:

[root@mysql-shell-lefred my-mds]# mysqlbinlog --require-row-format  \
binary-log.000138 binary-log.000139 binary-log.000140 \
--include-gtids='3fecfeaf-0651-11ec-a7e9-0200170484c8:527817-537263' \
binary-log.000[2-3]* | mysql -u admin -h 10.0.1.118 -p

Where we include all the binary logs we need (from 000138 to 000140) and the GTID set from the start to the point we need.

Back in Production

Now we can point our applications to the new instance and restart them. We also need to change the binary log streaming by creating a new configuration and starting a new service.

Finally we can terminate and delete the previous production server.

Conclusion

You now have the step-by-step guide for Point-in-Time Recovery (PITR) … and as usual, enjoy MySQL and MySQL Database Service !

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.