MySQL InnoDB Cluster with 2 Data Centers for Disaster Recovery: howto – part 2

on

In the first part of this howto, I illustrated how to setup two MySQL InnoDB Cluster linked by an asynchronous replication.

In that solution, I didn’t use any replication filters to ignore the replication of the InnoDB Cluster’s metadata (mysql_innodb_cluster_metadata), but I used the same metadata tables with two different clusters in it.

The benefit is that this allows to backup everything from any node in any of the data center, it works also in MySQL 5.7, and there is not risk to mess up with the replication filters.

In this blog I will show how to use replication filters to link two different clusters. This doesn’t work on MySQL 5.7 because you cannot have filters for a specific channel and if you globally filter the metadata, changes like topology changes, won’t be replicated inside the group and you will have issue with MySQL Shell and MySQL Router. So this solution works only for MySQL 8.0 (since 8.0.1).

Let’s consider we have again the exact same setup: 2 DCs and 6 nodes (see previous post).

On DC1 we have a cluster like this one:

JS> cluster.status();
{
    "clusterName": "clusterDC1", 
    "defaultReplicaSet": {
        "name": "default", 
        "primary": "mysql1:3306", 
        "ssl": "REQUIRED", 
        "status": "OK", 
        "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", 
        "topology": {
            "mysql1:3306": {
                "address": "mysql1:3306", 
                "mode": "R/W", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE"
            }, 
            "mysql2:3306": {
                "address": "mysql2:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE"
            }, 
            "mysql3:3306": {
                "address": "mysql3:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE"
            }
        }
    }, 
    "groupInformationSourceMember": "mysql://clusteradmin@mysql1:3306"
}

Now we will take a backup of this running cluster (in the first post, we were using the backup of a single instance, that I called the “production server”).

We choose one of the 3 members and we run the following command (this is using MySQL Enterprise Backup):

/opt/meb/bin/mysqlbackup \
  --host 127.0.0.1 \
  --backup-dir /vagrant/backup/ \
    backup-and-apply-log

And we restore it on the 3 machines that will be part of the new InnoDB Cluster in DC2:

# systemctl stop mysqld
# rm -rf /var/lib/mysql/*
# /opt/meb/bin/mysqlbackup --backup-dir=/vagrant/backup/ copy-back
# chown -R mysql. /var/lib/mysql
# systemctl start mysqld

We can also already configure the instances using MySQL Shell:

JS> dba.configureInstance('clusteradmin@mysql4')
JS> dba.configureInstance('clusteradmin@mysql5')
JS> dba.configureInstance('clusteradmin@mysql6')

When done, as the backup was performed on a machine that was already part of a Group and as I use different name for relay logs on each nodes (default behavior), I need to perform on all 3 new servers the following SQL statement:

SQL> RESET SLAVE ALL FOR CHANNEL "group_replication_applier";

It’s time to create the second cluster:

JS> \c clusteradmin@mysql4
JS> cluster2=dba.createCluster('clusterDC2')
JS> cluster2.addInstance('clusteradmin@mysql5')
JS> cluster2.addInstance('clusteradmin@mysql6')

And we have now our second cluster:

JS> cluster2.status()
{
    "clusterName": "clusterDC2", 
    "defaultReplicaSet": {
        "name": "default", 
        "primary": "mysql4:3306", 
        "ssl": "REQUIRED", 
        "status": "OK", 
        "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", 
        "topology": {
            "mysql4:3306": {
                "address": "mysql4:3306", 
                "mode": "R/W", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE"
            }, 
            "mysql5:3306": {
                "address": "mysql5:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE"
            }, 
            "mysql6:3306": {
                "address": "mysql6:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE"
            }
        }
    }, 
    "groupInformationSourceMember": "mysql://clusteradmin@mysql4:3306"
}

As recommended, I will create a user to use with replication. On the clusterDC1’s Primary-Master, I enter the following statements in SQL:

SQL> create user 'repl'@'%' identified by 'replication' REQUIRE SSL
SQL> grant replication slave on *.* to 'repl'@'%';

And finally setup asynchronous replication between the Primary-Master of DC2 and another member of DC1:

SQL> CHANGE MASTER TO MASTER_HOST='mysql1', MASTER_PORT=3306, MASTER_USER='repl', 
     MASTER_PASSWORD='replication', MASTER_AUTO_POSITION=1, MASTER_SSL=1 
     FOR CHANNEL 'asyncDC1'; 
       
SQL> CHANGE REPLICATION FILTER REPLICATE_IGNORE_DB=(mysql_innodb_cluster_metadata) 
     FOR CHANNEL 'asyncDC1';

SQL> START SLAVE FOR CHANNEL 'asyncDC1';

Of course we can see that it works as expected:

SQL> show slave status FOR CHANNEL 'asyncDC1'\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: mysql1
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: binlog.000003
          Read_Master_Log_Pos: 2092043
               Relay_Log_File: mysql5-relay-bin-asyncdc1.000002
                Relay_Log_Pos: 57195
        Relay_Master_Log_File: binlog.000003
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: mysql_innodb_cluster_metadata
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 2092043
              Relay_Log_Space: 57405
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: Yes
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 14177781
                  Master_UUID: 21b458bd-f298-11e8-9bf0-08002718d305
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 463c9331-f29e-11e8-9f64-08002718d305:44-77
            Executed_Gtid_Set: 21b458bd-f298-11e8-9bf0-08002718d305:1-12,
4358f6f3-f2a0-11e8-8243-08002718d305:1-14,
463c9331-f29e-11e8-9f64-08002718d305:1-77,
4c63779c-f29f-11e8-918e-08002718d305:1-4
                Auto_Position: 1
         Replicate_Rewrite_DB: 
                 Channel_Name: asyncdc1
           Master_TLS_Version: 
       Master_public_key_path: 
        Get_master_public_key: 0

We can also verify all this in performance_schema:

SQL> select * from replication_applier_filters\G
*************************** 1. row ***************************
 CHANNEL_NAME: asyncdc1
  FILTER_NAME: REPLICATE_IGNORE_DB
  FILTER_RULE: mysql_innodb_cluster_metadata
CONFIGURED_BY: CHANGE_REPLICATION_FILTER_FOR_CHANNEL
 ACTIVE_SINCE: 2018-11-28 00:13:09.089575
      COUNTER: 0

sql> select * from replication_applier_status_by_worker 
     where CHANNEL_NAME='asyncdc1'\G
*************************** 1. row ***************************
                                           CHANNEL_NAME: asyncdc1
                                              WORKER_ID: 0
                                              THREAD_ID: 78
                                          SERVICE_STATE: ON
                                      LAST_ERROR_NUMBER: 0
                                     LAST_ERROR_MESSAGE: 
                                   LAST_ERROR_TIMESTAMP: 0000-00-00 00:00:00.000000
                               LAST_APPLIED_TRANSACTION: 463c9331-f29e-11e8-9f64-08002718d305:77
     LAST_APPLIED_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP: 2018-11-28 00:14:36.764934
    LAST_APPLIED_TRANSACTION_IMMEDIATE_COMMIT_TIMESTAMP: 2018-11-28 00:14:36.794974
         LAST_APPLIED_TRANSACTION_START_APPLY_TIMESTAMP: 2018-11-28 00:14:36.775163
           LAST_APPLIED_TRANSACTION_END_APPLY_TIMESTAMP: 2018-11-28 00:14:36.875206
                                   APPLYING_TRANSACTION: 
         APPLYING_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP: 0000-00-00 00:00:00.000000
        APPLYING_TRANSACTION_IMMEDIATE_COMMIT_TIMESTAMP: 0000-00-00 00:00:00.000000
             APPLYING_TRANSACTION_START_APPLY_TIMESTAMP: 0000-00-00 00:00:00.000000
                 LAST_APPLIED_TRANSACTION_RETRIES_COUNT: 0
   LAST_APPLIED_TRANSACTION_LAST_TRANSIENT_ERROR_NUMBER: 0
  LAST_APPLIED_TRANSACTION_LAST_TRANSIENT_ERROR_MESSAGE: 
LAST_APPLIED_TRANSACTION_LAST_TRANSIENT_ERROR_TIMESTAMP: 0000-00-00 00:00:00.000000
                     APPLYING_TRANSACTION_RETRIES_COUNT: 0
       APPLYING_TRANSACTION_LAST_TRANSIENT_ERROR_NUMBER: 0
      APPLYING_TRANSACTION_LAST_TRANSIENT_ERROR_MESSAGE: 
    APPLYING_TRANSACTION_LAST_TRANSIENT_ERROR_TIMESTAMP: 0000-00-00 00:00:00.000000

And now we can do the same on the Primary-Master of DC1 to point on the Primary-Master of DC2 (or use MySQL Router as explained in part 1). Don’t forget that this is a Disaster Recovery Solution and that you should not write on both DCs at the same time !

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

recent

Last Tweets

Locations of visitors to this page
categories