MySQL HA Architecture #1 : InnoDB Cluster & Consul

lefred
February 19, 2018
Group Replication, HA, InnoDB Cluster, MySQL
12 Comments

I received many request about MySQL High Availability Architecture. There are a lot of tools, but how can we use them to achieve MySQL HA without over engineering everything.

To answer such demand, there is no generic architecture, of course there are leaders in solutions, but put them together can result in many different designs, I will start a series of article on that topic. Feel free, as usual to comment, but also recommend other tools that could be used in a future post.

So today’s post is related to MySQL InnoDB Cluster with Hashicorp’s Consul.

Architecture

This is the overview of the deployed architecture:

As you can see, we have 3 data centers, but in fact we have only two DCs on premises and one in the cloud. A large amount of request are always related on having only 2 DCs, or at least capacity for only two DCs (storage and CPU). But we already know that to achieve High Availability, we need at least 3 (3 nodes, 3 DCs, …). So in our design, we have DC1 and DC2 running each an independent MySQL InnoDB Cluster in Single-Primary Mode. We also have an asynchronous replication going from one member of dc1 to the primary master of dc2 and reverse way too.

In this post, I won’t cover the asynchronous replication failover that moves the asynchronous slave task on another member of the same group when the Primary Role is moved to another member.

Then we have one consul member on DC3 being there only for Quorum and answer DNS requests.

mysqlha service

In consul we create mysqlha service like this :

{
    "services": [
        {
            "name": "mysqlha",
            "tags": [
                "writes",
                "reads"
            ],
            "port": 3306,
            "checks": [

                {
                    "script": "/etc/consul.d/scripts/check_mysql.sh",
                    "interval": "5s"
                }
            ]
        }
    ]
}

The check_mysql.sh script is just an easy bash script connecting on the read/write port of the local MySQL-Router
running on each member of the InnoDB Cluster (Group) and checking if there is a Primary non-R/O member.

This is what we can see in consul’s UI:

Failover rule

Now we will add a rule (query) for the failover:

curl --request POST --data '{
"Name": "mysqlha",
"Service": {
   "Service": "mysqlha",
   "Failover": {
     "Datacenters": ["dc1","dc2"]
   }, "OnlyPassing": true
  }
}' http://127.0.0.1:8500/v1/query

We need to perform this operation on one member of each DC (dc1, dc2 and dc3).

When this is performed, we can test the query on dc3 (where there is no service mysqlha is running):

[root@monitordc3 ~]# dig @127.0.0.1 -p 53 mysqlha.query.mysql

; <<>> DiG 9.9.4-RedHat-9.9.4-29.el7_2.4 <<>> @127.0.0.1 -p 53 mysqlha.query.mysql
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 28736
;; flags: qr aa rd; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;mysqlha.query.mysql.		IN	A

;; ANSWER SECTION:
mysqlha.query.mysql.	0	IN	A	192.168.50.4
mysqlha.query.mysql.	0	IN	A	192.168.50.3
mysqlha.query.mysql.	0	IN	A	192.168.50.2

We can notice that the dns returns all 3 nodes in DC1 (192.168.50.x), perfect ! We can of course connect on any one using the router ports (6446 and 6447).

The next step is related to the application servers, we have here two options, or we use a consul client service to perform the dns resolving or we just setup dns resolving on it but… on all we need to use a the node in dc3 and one node in the other dc (or we can create a fake extra dc only for doing the dns resolving, this is because on dc2 the service is running anyway but we want to be redirected to the primary dc if it’s running… we could query mysqlha.query.dc1.mysql but that will work only if there will be a consul server in dc1 reachable… which is not the case if the full dc goes down).

So if we have an app server in dc2, its name resolving setup should be :

search lan localdomain
nameserver 192.168.70.2
nameserver 192.168.50.2

Asynchronous Slave

As between our two data centers, the bandwidth can be high, we opted for an asynchronous replication between the two DCs (remember, dc2 is used as GEO DR).

So now we need to setup asynchronous replication between both DCs, I opted for the Primary Masters of each DC to be the async master and the async slave. Being the slave is mandatory for the Primary Master as it’s the only one allowed to write data.

I created the replication grants on each Primary Masters (in dc1 and dc2), and I purged the GTID to start the asynchronous replication:

Traditional replication setup

I started with the user and the requested privileges:

mysql> CREATE USER 'async_repl'@'%' IDENTIFIED BY 'asyncpwd';
Query OK, 0 rows affected (0.50 sec)

mysql> GRANT REPLICATION SLAVE ON *.* TO 'async_repl'@'%';
Query OK, 0 rows affected (0.11 sec)

Don’t forget to set GTID_PURGED according to what you see in the other Primary Master.

Example:

First I check on mysqld4dc2 (the current Primary Master in DC2):

mysql4dc2> show global variables like 'GTID_EXECUTED';
+---------------+--------------------------------------------------------------------------------------+
| Variable_name | Value                                                                                |
+---------------+--------------------------------------------------------------------------------------+
| gtid_executed | 0de7c84a-118e-11e8-b11c-08002718d305:1-44,
a6ae95b5-118d-11e8-9ad8-08002718d305:1-27 |
+---------------+--------------------------------------------------------------------------------------+

And then on server in DC1 do:

mysql> SET @@GLOBAL.GTID_PURGED='0de7c84a-118e-11e8-b11c-08002718d305:1-44,
                                 a6ae95b5-118d-11e8-9ad8-08002718d305:1-27'

We do the same on all machines from dc1 with the GTID_EXECUTED from DC2.

Now we will add a new service in consul (on each MySQL nodes in DC1 and DC2):

        {
            "name": "mysqlasync",
            "tags": [
                "slave"
            ],
            "port": 3306,
            "checks": [
                {
                    "script": "/etc/consul.d/scripts/check_async_mysql.sh",
                    "interval": "5s"
                }
            ]
        }

And don’t forget to copy the script check_async_mysql.sh in /etc/consul/scripts/ (this script is not made for production and may contain bugs, it’s just an example). You also need to edit this script to set SLAVEOFDC to the opposite DC’s name.

This script also uses the consul keystore capability to set which node is the Primary-Master in each DC:

[root@mysql1dc1 consul.d]# curl -s "http://localhost:8500/v1/kv/primary?raw&dc=dc1"
mysql1dc1
[root@mysql1dc1 consul.d]# curl -s "http://localhost:8500/v1/kv/primary?raw&dc=dc2"
mysql4dc2

After having reloaded consul, in the UI you can see something like this on the Primary Master being also the asynchronous slave:

And on another node:

In Action

Let’s see it in action:

In the video we can see that when we kill the non Primary member remaining in DC1, the Group lost quorum. Of course the size of the group at that moment is 2 of 2, and when it gets killed even if the Primary one is still active, it has a quorum of 1/2 which is not bigger than 50%. Therefor it stops being available to avoid any split-brain situation.

Then when we restart the nodes we killed in DC1, we have to force quorum in the single remaining node to be able to rejoin the others instances after reboot. This is how:
Force Quorum:

 MySQL  mysql2dc1:3306 ssl  JS > cluster.forceQuorumUsingPartitionOf('root@mysql2dc1:3306')
Restoring replicaset 'default' from loss of quorum, by using the partition 
composed of [mysql2dc1:3306]

Please provide the password for 'root@mysql2dc1:3306': 
Restoring the InnoDB cluster ...

The InnoDB cluster was successfully restored using the partition 
from the instance 'root@mysql2dc1:3306'.

WARNING: To avoid a split-brain scenario, ensure that all other members of the replicaset 
         are removed or joined back to the group that was restored.

Rejoin the first instance:

MySQL mysql2dc1:3306 ssl JS > cluster.rejoinInstance('root@mysql3dc1:3306')
...
The instance 'root@mysql3dc1:3306' was successfully rejoined on the cluster.

The instance 'mysql3dc1:3306' was successfully added to the MySQL Cluster.

And rejoin the last one:

 MySQL  mysql2dc1:3306 ssl  JS > cluster.rejoinInstance('root@mysql1dc1:3306')
...
Rejoining instance to the cluster ...

The instance 'root@mysql1dc1:3306' was successfully rejoined on the cluster.

The instance 'mysql1dc1:3306' was successfully added to the MySQL Cluster.

And finally we can see that all members are back online:

 MySQL  mysql2dc1:3306 ssl  JS > cluster.status();
{
    "clusterName": "MyGroupDC1", 
    "defaultReplicaSet": {
        "name": "default", 
        "primary": "mysql2dc1:3306", 
        "ssl": "REQUIRED", 
        "status": "OK", 
        "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", 
        "topology": {
            "mysql1dc1:3306": {
                "address": "mysql1dc1:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE"
            }, 
            "mysql2dc1:3306": {
                "address": "mysql2dc1:3306", 
                "mode": "R/W", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE"
            }, 
            "mysql3dc1:3306": {
                "address": "mysql3dc1:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE"
            }
        }
    }, 
    "groupInformationSourceMember": "mysql://root@mysql2dc1:3306"

Conclusion

As you noticed, MySQL Group Replication and MySQL InnoDB Cluster can be the base of a full MySQL High Availability architecture using multiple Data Centers. MySQL InnoDB Cluster can of course be used with a multitude of other solutions like Orchestrator, ProxySQL, … I’ll try to illustrate these solutions when I have some time, stay tuned !

12 Comments

Shinya

February 20, 2018 / 01:43 Reply

>We also have an asynchronous replication going from one member of
>dc1 to the primary master of dc2 and reverse way too.

Thank you for nice blog posts.
Just confirmation about replication topology of
the asynchronous replication between dc1 and dc2.

Is it the best for user to implement this configuration after MySQL8.0 GA?
It is include the new feature like “WL#7361: MSR: per-channel replication filters”.
- lefred
  
  February 20, 2018 / 18:10 Reply
  Hi Shinya.
  
  This works also on 5.7. I think you are asking this to filter out the metadata (mysql_innodb_cluster_metadata) schema. I’m not a big fan of filtering out stuff usually but in this case it might help.
  The metadata is already aware of having multiple clusters and instances. The router is the one not being happy with mutliple clusters for bootstrap. But you can set it up manually.
  
  My metadata looks like this:
```
 MySQL  mysql1dc1:3306   mysql_innodb_cluster_metadata  SQL  select * from clusters\G
*************************** 1. row ***************************
         cluster_id: 1
       cluster_name: MyGroupDC1
 default_replicaset: 1
        description: Default Cluster
mysql_user_accounts: NULL
            options: null
         attributes: {"default": true}
*************************** 2. row ***************************
         cluster_id: 2
       cluster_name: MyGroupDC2
 default_replicaset: 2
        description: Default Cluster
mysql_user_accounts: NULL
            options: null
         attributes: {"default": true}
2 rows in set (0.00 sec)
```
  and my instances:
```
 MySQL  mysql1dc1:3306   mysql_innodb_cluster_metadata  SQL  select instance_id, host_id, replicaset_id, instance_name from instances;
+-------------+---------+---------------+----------------+
| instance_id | host_id | replicaset_id | instance_name  |
+-------------+---------+---------------+----------------+
|           1 |       1 |             1 | mysql1dc1:3306 |
|           2 |       2 |             1 | mysql2dc1:3306 |
|           3 |       3 |             1 | mysql3dc1:3306 |
|           7 |       4 |             2 | mysql4dc2:3306 |
|          12 |       5 |             2 | mysql5dc2:3306 |
|          19 |       6 |             2 | mysql6dc2:3306 |
+-------------+---------+---------------+----------------+
6 rows in set (0.00 sec)
```
  - Youcef HILEM
    
    September 23, 2018 / 08:49 Reply
    
    Hi,
    Thanks for this very useful article.
    Can you please tel us how the asynchronous replication failover that moves the asynchronous slave task on another member of the same group when the Primary Role is moved to another member ?
    
    Thanks
    Youcef HILEM
    - lefred
      
      September 23, 2018 / 11:12 Reply
      
      Hi Youcef,
      
      This is the task of the script check_async_mysql.sh that consul uses.
      
      Cheers,
      - Youcef HILEM
        
        September 23, 2018 / 11:58
        
        Hi Fred,
        Thank you for your quick response.
        This help me a lot because I have to make a presentation tomorrow afternoon to describe how to implement an Active/Active Multi-Data Center with MySQL InnoDB Cluster like this https://image.slidesharecdn.com/mysqlinnodbcluster201801-180328095452/95/mysql-57-innodb-cluster-jan-2018-70-638.jpg?cb=1522231146
        
        I also found this article (*) which describes a solution for “InnoDB Cluster’s Dual DC Use Case” with MySQL 8.0 and Dynamic Replication Channel Specific Filtering.
        (*) https://thesubtlepath.com/blog/mysql/dynamic-replication-channel-specific-filtering-for-uses-like-innodb-cluster/
        
        Regards
        Youcef HILEM
Shlomi Noach

February 20, 2018 / 10:41 Reply

> Of course the size of the group at that moment is 2 of 2, and when it gets killed even if the Primary one is still active, it has a quorum of 1/2 which is not bigger than 50%. Therefor it stops being available to avoid any split-brain situation.

The group is still expecting 3 nodes. The sentence should be fixed to:

Of course the size of the group at that moment is 2 of 3, and when it gets killed even if the Primary one is still active, it has a vote of 1/3 which is not bigger than 50%. Therefor it stops being available to avoid any split-brain situation.
- lefred
  
  February 20, 2018 / 10:47 Reply
  
  Hi Shlomi,
  Thank you for your comment, but in fact, at the moment I kill the 2nd node, the group is not 2/3 but 2/2… because the size of the Group is always updated.
  
  This means that at start the group is 3/3, then when node is killed, the view of the world is 2/3… then in GCS/XCOM they all agree that this node is gone and the view of the world is 2/2 (but in the metadata of InnoDB Cluster, the cluster.status() knows that there were 3 members… but not GCS/XCOM, Group Replication). This view of the world is always agreed by the consensus between the members in the group.
  
  I hope this clarifies it a bit more.
  - Shlomi Noach
    
    February 20, 2018 / 22:01 Reply
    
    Hi Fred,
    Then this confuses me. According to the flow you describe, if we begin with 5 good nodes, one goes down, we’re at 4/4. And then another goes down, and we’re at 3/3. And then another goes down and we’re at 2/2 and supposedly there’s a quorum, where in fact 2/5 is not a quorum.
    Nodes that went down need to be considered as being down, or else we lose consensus guarantee.
    
    Also, if I understand correctly your description is inconsistent, because in order to be consistent, this:
    
    > Of course the size of the group at that moment is 2 of 2, and when it gets killed even if the Primary one is still active, it has a quorum of 1/2 which is not bigger than 50%. Therefor it stops being available to avoid any split-brain situation.
    
    should actually change to:
    
    [ Of course the size of the group at that moment is 2 of 2, and when it gets killed even if the Primary one is still active, it has a quorum of 1/1 which is 100%. ]
    
    What is the reason you update the group size from 3 to 2 when the first node dies, but not update from 2 to 1 when the seconds node dies? This is an inconsistency.
    - lefred
      
      February 20, 2018 / 22:34 Reply
      
      I may need to write a full blog post on this but indeed… also it depends how you stop a node, so if you have 3 nodes and you stop (clean shutdown 1), you go from 3/3 to 2/2, if you stop properly again another one, you from 2/2/ to 1/1/ and your cluster/resource is still available. Now if you have a 5/5 group and you kill -9 one node you go from 5/5 to 4/5… but then it’s 4/4 then if you kill -9 another one you go from 4/4 to 3/4…. and after it’s 3/3… (3/3 = tolerate up to ONE failure) … but if you have 5/5 and you kill -9 3 of them (or 1 by 1 but very quickly, so they don’t have time to agree on the view of the world), the 2 remaining will stop because they don’t have quorum as they go from 5/5 to 2/5.
      - Shlomi Noach
        
        February 21, 2018 / 05:42
        
        Ah, I see. Thank you for explaining.
shinya

February 21, 2018 / 04:22 Reply

Hi Fred,

Thank you for your advice.

>This works also on 5.7. I think you are asking this to filter out the metadata >(mysql_innodb_cluster_metadata) schema.
>I’m not a big fan of filtering out stuff usually but in this case it might help.

Yes. I thought it required for bidirectional replication for protecting mysql_innodb_cluster_metadata.

>The metadata is already aware of having multiple clusters and instances.
>The router is the one not being happy with mutliple clusters for bootstrap.
>But you can set it up manually.

I see. I will try it on both MySQL Router and ProxySQL.

Thanks again.
Youcef HILEM

September 23, 2018 / 08:58 Reply

Hi,
Thanks for this very useful article.
Could you please tell us how the asynchronous replication failover that moves the asynchronous slave task on another member of the same group when the Primary Role is moved to another member is done ?

Thanks
Youcef HILEM

Cookie	Duration	Description
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.