MySQL InnoDB Cluster : avoid split-brain while forcing quorum

on

We saw yesterday that when an issue (like network splitting), it’s possible to remain with a partitioned cluster where none of the partition have quorum (majority of members). For more info read how to manage a split-brain situation.

If your read the previous article you notice the red warning about forcing the quorum. As an advice is never too much, let me write it down again here : “Be careful that the best practice is to shutdown the other nodes to avoid any kind of conflicts if they reappear during the process of forcing quorum“.

But if some network problem is happening it might not be possible to shutdown those other nodes. Would it be really bad ?

YES !

Split-Brain

Remember, we were in this situation:

We decided to force the quorum on one of the nodes (maybe the only one we could connect to):

But what could happen if while we do this, or just after, the network problem got resolved ?

In fact we will have that split-brain situation we would like to avoid as much as possible.

Details

So what happen ? And why ?

When we ran cluster.forceQuorumUsingPartitionOf('clusteradmin@mysql1'), this is what we could read in the MySQL error log of that server:

[Warning] [MY-011498] [Repl] Plugin group_replication reported: 
'The member has resumed contact with a majority of the members in the group.
Regular operation is restored and transactions are unblocked.'
[Warning] [MY-011499] [Repl] Plugin group_replication reported:
'Members removed from the group: mysql2:3306, mysql3:3306'

The node ejected the other nodes of the cluster and of course no decision was communicate to these servers are they were not reachable anyway.

Now when the network situation was solved, this is what we could read on mysql2:

[Warning] [MY-011494] [Repl] Plugin group_replication reported: 
'Member with address mysql3:3306 is reachable again.'
[Warning] [MY-011498] [Repl] Plugin group_replication reported: 'The
member has resumed contact with a majority of the members in the group.
Regular operation is restored and transactions are unblocked.'
[Warning] [MY-011499] [Repl] Plugin group_replication reported:
'Members removed from the group: mysql1:3306

Same on mysql3, this means these two nodes reached majority together and ejected mysql1 from “their” cluster.

On mysql1, we can see in performance_schema:

mysql> select * from performance_schema.replication_group_members\G
************************** 1. row **************************
CHANNEL_NAME: group_replication_applier
MEMBER_ID: fb819b30-5b90-11e9-bf8a-08002718d305
MEMBER_HOST: mysql4
MEMBER_PORT: 3306
MEMBER_STATE: ONLINE
MEMBER_ROLE: PRIMARY
MEMBER_VERSION: 8.0.16
1 row in set (0.0013 sec)

An on mysql2 and mysql3:

mysql> select * from performance_schema.replication_group_members\G
************************** 1. row **************************
CHANNEL_NAME: group_replication_applier
MEMBER_ID: 4ff0a33f-5c49-11e9-abc9-08002718d305
MEMBER_HOST: mysql6
MEMBER_PORT: 3306
MEMBER_STATE: ONLINE
MEMBER_ROLE: SECONDARY
MEMBER_VERSION: 8.0.16
************************** 2. row **************************
CHANNEL_NAME: group_replication_applier
MEMBER_ID: f8ac8d14-5b90-11e9-a22a-08002718d305
MEMBER_HOST: mysql5
MEMBER_PORT: 3306
MEMBER_STATE: ONLINE
MEMBER_ROLE: PRIMARY
MEMBER_VERSION: 8.0.16

This is of course the worse situation that could happen when dealing with a cluster.

Solution

The solution is to prevent the nodes not being part of the the forced quorum partition to agree making their own group as they will have a majority.

This can be achieve by setting these variables on an majority of nodes (on two servers if your InnoDB Cluster is made of 3 nodes for example):

When I fixed again my cluster and all were again online, I changed these settings on mysql1 and mysql2:

set global group_replication_unreachable_majority_timeout=30;
set global group_replication_exit_state_action = 'ABORT_SERVER';

This means that if there a problem and the node is not able to join the majority after 30 seconds it will go in ERROR state and then shutdown `mysqld`.

Pay attention that the 30sec is only an example. The time should allow me to remove that timer on the node I want to use for forcing the quorum (mysql1 in the example) but also be sure that time is elapsed on some nodes I can’t access to be sure they removed themselves from the group (mysql2 in the example).

So, if we try again with our example, once the network problem is happening, after 30sec, we can see in mysql2‘s error log that is working as expected:

[ERROR] [MY-011711] [Repl] Plugin group_replication reported: 'This member could not reach 
a majority of the members for more than 30 seconds. The member will now leave
the group as instructed by the group_replication_unreachable_majority_timeout
option.'
[ERROR] [MY-011712] [Repl] Plugin group_replication reported: 'The server was automatically
set into read only mode after an error was detected.'
[Warning] [MY-013373] [Repl] Plugin group_replication reported: 'Started
auto-rejoin procedure attempt 1 of 1'
[ERROR] [MY-011735] [Repl] Plugin group_replication reported:
'[GCS] Timeout while waiting for the group communication engine to exit!'
[ERROR] [MY-011735] [Repl] Plugin group_replication reported:
'[GCS] The member has failed to gracefully leave the group.'
[System] [MY-010597] [Repl] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_applier'
executed'. Previous state master_host='', master_port= 0,
master_log_file='', master_log_pos= 798,
master_bind=''. New state master_host='', master_port= 0,
master_log_file='', master_log_pos= 4, master_bind=''.
[ERROR] [MY-011735] [Repl] Plugin group_replication reported:
'[GCS] Error connecting to the local group communication engine instance.'
[ERROR] [MY-011735] [Repl] Plugin group_replication reported:
'[GCS] The member was unable to join the group. Local port: 33061'
[Warning] [MY-013374] [Repl] Plugin group_replication reported:
'Timeout while waiting for a view change event during the auto-rejoin procedure'
[Warning] [MY-013375] [Repl] Plugin group_replication reported:
'Auto-rejoin procedure attempt 1 of 1 finished.
Member was not able to join the group.'
[ERROR] [MY-013173] [Repl] Plugin group_replication reported:
'The plugin encountered a critical error and will abort:
Could not rejoin the member to the group after 1 attempts'
[System] [MY-013172] [Server] Received SHUTDOWN from user .
Shutting down mysqld (Version: 8.0.16).
[Warning] [MY-010909] [Server] /usr/sbin/mysqld:
Forcing close of thread 10 user: 'clusteradmin'.
[Warning] [MY-010909] [Server] /usr/sbin/mysqld:
Forcing close of thread 35 user: 'root'.
[ERROR] [MY-011735] [Repl] Plugin group_replication reported:
'[GCS] The member is leaving a group without being on one.'
[System] [MY-010910] [Server] /usr/sbin/mysqld:
Shutdown complete (mysqld 8.0.16) MySQL Community Server - GPL.
[Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close
of thread 10 user: 'clusteradmin'.
[Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close
of thread 35 user: 'root'.
[ERROR] [MY-011735] [Repl] Plugin group_replication reported:
'[GCS] The member is leaving a group without being on one.'
[System] [MY-010910] [Server] /usr/sbin/mysqld:
Shutdown complete (mysqld 8.0.16) MySQL Community Server - GPL

And when the quorum has been forced on mysql1, as soon as the network issue is resolved, none will join the Group and the DBA will have to use the shell to perform cluster.rejoinInstance(instance) or restart mysqld on the instances that shutdown themselves.

Conclusion

So as you can see, by default MySQL InnoDB Cluster and Group Replication is very protective for split-brain situation. And it can even be enforced to avoid problem when human interaction is needed.

The rule of the thumb to avoid problem is to set group_replication_unreachable_majority_timeoutto something you can deal with and group_replication_exit_state_action to ABORT_SERVER on (total amount of members in the cluster /2 )+1 as integer 😉

If you have 3 nodes, on 2 then ! Of course it might be much simpler to set it on all nodes.

Be aware that if you don’t react in the time frame defined by group_replication_unreachable_majority_timeout, all your servers will shutdown and you will have to restart one.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

recent

Last Tweets

Locations of visitors to this page
categories