MySQL InnoDB Cluster – how to manage a split-brain situation

Everywhere I go to present MySQL InnoDB Cluster, during the demo of creating a cluster, many people doesn’t understand why when I’ve 2 members, my cluster is not yet tolerant to any failure.

Indeed when you create a MySQL InnoDB Cluster, as soon as you have added your second instance, you can see in the status:

    "status": "OK_NO_TOLERANCE",      
"statusText": "Cluster is NOT tolerant to any failures.",

Quorum

Why is that ? It’s because, to be part of primary partition (the partition that holds the service, the one having a Primary-Master in Single Primary Mode, the default mode), your partition must reach the majority of nodes (quorum). In MySQL InnoDB Cluster (and many other cluster solutions), to achieve quorum, the amount of members in a partition must be > (bigger) than 50%.

So when we have 2 nodes, if there is a network issue between the two servers, the cluster will be split in 2 partitions. And each of it will have 50% of the amount of total members (1 of 2). Is 50% > than 50% ?? No! That’s why none of the partition will reach quorum and none will allow queries in case of MySQL InnoDB Cluster.

Indeed, the first machine will see that it won’t be able to reach the second machine anymore… but why ? Is the second machine who died ? Am I having network interface issues ? We don’t know, so we cannot decide.

Let’s take a look at this cluster of 3 members (3/3 = 100%):


If we take a look in the cluster.status()output, we can see that with 3 nodes we can tolerate one failure:

    "status": "OK",      
"statusText": "Cluster is ONLINE and can tolerate up to ONE failure.",

Now let’s imagine we have a network issue that will isolate one of the members:

We can see in cluster.status()output that the node is missing:

Our cluster will still be able to serve transactions has one partition still has quorum (2/3 = 66%, which is bigger than 50%).

        "mysql6:3306": {
"address": "mysql6:3306",
"mode": "n/a",
"readReplicas": {},
"role": "HA",
"status": "(MISSING)"
}

There is a very important concept I want to cover as this is not always obvious. The cluster is different in InnoDB Cluster and in Group Replication. Indeed, InnoDB Cluster relies on metadata created by the DBA using the MySQL Shell. Those metadata describes how the cluster has been set up. Group Replication sees the cluster differently. It sees it as it was last time it checked and how it’s right now… and updates that view. This is commonly called, the view of the world.

So in the example above, InnoDB Cluster sees 3 nodes: 2 online and 1 missing. For Group Replication, for a short moment, the partitioned node was UNREACHABLE and few second later, after being ejected from the Group by the majority (so only if there is still a majority), the node is not part of the cluster anymore. The Group size is now 2 of 2 (2/2 not 2/3). This information is exposed via performance_schema.replication_group_members

If our network issue would have been more serious and would split our cluster in 3 like the picture below, the cluster would be “offline” as none of the 3 partition would have reached quorum majority, 1/3 = 33% (<50%):

In this case the MySQL service won’t work properly until a human fixes the situation.

Fixing the situation

When there is no more primary partition in the cluster (like the example above), the DBA needs to restore the service. And as usual, there is already some information in the MySQL error log:

2019-04-10T13:34:09.051391Z 0 [Warning] [MY-011493] [Repl] Plugin group_replication 
reported: 'Member with address mysql4:3306 has become unreachable.'
2019-04-10T13:34:09.065598Z 0 [Warning] [MY-011493] [Repl] Plugin group_replication
reported: 'Member with address mysql5:3306 has become unreachable.'
2019-04-10T13:34:09.065615Z 0 [ERROR] [MY-011495] [Repl] Plugin group_replication
reported: 'This server is not able to reach a majority of members in
the group. This server will now block all updates. The server will
remain blocked until contact with the majority is restored. It is
possible to use group_replication_force_members to force a new group
membership.

From the message, we can see that this is exactly the situation we are explaining here. We can see in cluster.status()that the cluster is “blocked” :

    "status": "NO_QUORUM",      
"statusText": "Cluster has no quorum as visible from 'mysql4:3306'
and cannot process write transactions.
2 members are not active",

We have two solutions to fix the problem:

  1. using SQL and Group Replication variables
  2. using the MySQL Shell’s adminAPI

Fixing using SQL and Group Replication variables

This process is explained in the manual (Group Replication: Network Partitioning).

On the node the DBA wants to use to restore the service, if there is only one node left we can use the global variable group_replication_force_members and use the GCS address of the server that you can find in group_replication_local_address(if there are multiple servers online but not reaching the majority, all should be added to this variable):

SQL set global group_replication_force_members=@@group_replication_local_address;

Be careful that the best practice is to shutdown the other nodes to avoid any kind of conflicts if they reappear during the process of forcing quorum.

And the cluster will be again available. We can see in the error log that the situation has been resolved:

2019-04-10T14:41:15.232078Z 0 [Warning] [MY-011498] [Repl] Plugin group_replication 
reported:
'The member has resumed contact with a majority of the members in the group.
Regular operation is restored and transactions are unblocked.'

Don’t forget to remove the value of group_replication_force_members when you are back online:

SQL set global group_replication_force_members='';

When the network issue are resolved, the nodes will try to reconnect but has we forced the membership, those nodes will be rejected. You will need to rejoin the Group by:

  • restarting mysqld
  • or restarting again group replication (stop group_replication; start group_replication)
  • or using MySQL Shell (cluster.rejoinInstance())

Using the MySQL Shell’s adminAPI

The other option is to use the adminAPI from the MySQL Shell. This is the preferable option of course ! With the AdminAPI you don’t even need to know the port used for GCS to restore the quorum.

In the example below, we will use the server called mysql4 to re-activate our cluster:

JS cluster.forceQuorumUsingPartitionOf('clusteradmin@mysql4') 

And when the network issues are resolved, the Shell can also be used to rejoin other instances (in this case mysql6) :

JS cluster.rejoinInstance('clusteradmin@mysql6')

Conclusion

When for any reason you have lost quorum on your MySQL InnoDB Cluster, don’t panic. Choose the node (or list of nodes that can still communicate with each others) you want to use and if possible shutdown or stop mysqld on the other ones. Then MySQL Shell is again your friend and use the adminAPI to force the quorum and reactive your cluster in one single command !

Bonus

If you want to know if your MySQL server is part of the primary partition (the one having the majority), you can run this command:

mysql> SELECT IF( MEMBER_STATE='ONLINE' AND ((
 SELECT COUNT(*) FROM performance_schema.replication_group_members 
 WHERE MEMBER_STATE NOT IN ('ONLINE', 'RECOVERING')) >= 
 ((SELECT COUNT(*) 
   FROM performance_schema.replication_group_members)/2) = 0), 'YES', 'NO' ) 
`<code> in primary partition` 
 FROM performance_schema.replication_group_members 
 JOIN performance_schema.replication_group_member_stats 
 USING(member_id) where member_id=@@global.server_uuid;
+----------------------+
| in primary partition |
+----------------------+
| NO                   |
+----------------------+

Or by using this addition to sys schema: addition_to_sys_GR.sql

SQL select gr_member_in_primary_partition();
+----------------------------------+
| gr_member_in_primary_partition() |
+----------------------------------+
| YES |
+----------------------------------+
1 row in set (0.0288 sec)

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

18 Comments

  1. […] We saw yesterday that when an issue (like network splitting), it’s possible to remain with a partitioned cluster where none of the partition have quorum (majority of members). For more info read how to manage a split-brain situation. […]

  2. […] We saw yesterday that when an issue (like network splitting), it’s possible to remain with a partitioned cluster where none of the partition have quorum (majority of members). For more info read how to manage a split-brain situation. […]

  3. Hello Lefred

    In which case Split Brain situation occur..?

    • Hi Ali,

      Most of the time a split/brain situation happens when a network problem occurs and none of the nodes can see a majority.

      This is why running with a 2 nodes cluster is problematic. If N1 cannot see N2, none of them reaches the majority and cannot decide which “partition” can keep de leadership and run the cluster. In that case, each nodes will ask “Hey I cannot see my buddy… Do I have a problem or does he ??”

      Cheers,

      • Hello Lefred

        I am having strage error, i can’t run the
        cluster.forceQuorumPartitionOf

        It is failing with error run time error 1231

        • What’s the error message ? Which version of MySQL are you using ?

          • It’s 5.7.31 Version,

            Error while running
            cluster.forceQuorumPartitionOf(“admin@db01:3306”)

            Restoring failed with error 1231 Runtime Error.
            =========

            Mysql log file contain below information.

            2021-01-26T12:45:11.499896Z 4202400 [ERROR] Plugin group_replication reported: ‘Timeout on wait for view after setting group_replication_force_members value ‘sihoprdochdb01:33061′ into group communication interfaces’
            2021-01-26T12:49:19.353997Z 4203575 [Note] Unknown error 1156
            2021-01-26T12:49:58.768086Z 4203729 [Note] Plugin group_replication reported: ‘[GCS] Checking members to force a new configuration.’
            2021-01-26T12:49:58.768127Z 4203729 [Note] Plugin group_replication reported: ‘[GCS] ::modify_configuration():: Node[0]=sihoprdochdb01:33061 16106355403399440’
            2021-01-26T12:49:58.768333Z 4203729 [Note] Plugin group_replication reported: ‘The group_replication_force_members value ‘sihoprdochdb01:33061′ was set in the group communication interfaces’
            2021-01-26T12:50:03.552140Z 0 [Note] Plugin group_replication reported: ‘[GCS] Removing members that have failed while processing new view.’
            2021-01-26T12:50:03.552367Z 0 [Note] Plugin group_replication reported: ‘[GCS] Removing members that have failed while processing new view.’
            2021-01-26T12:50:03.555441Z 4191854 [Note]
            2021-01-26T12:50:03.558999Z 4191860 [Note]
            2021-01-26T12:50:58.768438Z 4203729 [ERROR] Plugin group_replication reported: ‘Timeout on wait for view after setting group_replication_force_members value ‘sihoprdochdb01:33061′ into group communication interfaces’

  4. We’ve 3 Node IDC

    2 nodes were showing “UNREACHABLE”
    1 node was showing “R/O”

    i stop 2 nodes and run the command “forceQuorumPartitionOf” which is failed.

    • Hi Ali,
      The cluster object needs to be retrieved from the remaining node:
      n1 – unreachable
      n2 – unreachable
      n3 – r/o
      1) stop gr on n1 and n2
      2) connect to n3
      3) cluster = get.getCluster()
      4) cluster.forceQuorum…(n3)

  5. Thank you ladies

    Please send me the steps involve here.

    • Ladies ?
      The steps are clear, you just need to connect to the remaining node to force quorum.

      • My mistake sorry for that.

        I mean stop group replication on N1 N2
        Force quorum on N3
        Join back N1 N2

        • if n1 and n2 are unreachable, you won’t be able to stop GR but if you can still connect to them, just run ‘stop group_replication’. on N3, do: cluster=dba.getCluster() and cluster.forceQuorumUsingPartitionOf(‘clusteradmin@N3’)

          • mysql@sihoprdochdb01 SQL-LOG]$ mysqlsh cladmin@sihoprdochdb01:3306 — cluster s tatus
            WARNING: Cluster has no quorum and cannot process write transactions: Group has no quorum
            {
            “clusterName”: “prodCluster”,
            “defaultReplicaSet”: {
            “name”: “default”,
            “primary”: “sihoprdochdb01:3306”,
            “ssl”: “DISABLED”,
            “status”: “NO_QUORUM”,
            “statusText”: “Cluster has no quorum as visible from ‘sihoprdochdb01:330 6’ and cannot process write transactions. 2 members are not active”,
            “topology”: {
            “sihoprdochdb01:3306”: {
            “address”: “sihoprdochdb01:3306”,
            “mode”: “R/O”,
            “readReplicas”: {},
            “role”: “HA”,
            “status”: “ONLINE”
            },
            “sihoprdochdb02:3306”: {
            “address”: “sihoprdochdb02:3306”,
            “mode”: “R/O”,
            “readReplicas”: {},
            “role”: “HA”,
            “status”: “UNREACHABLE”
            },
            “sihoprdochdb03:3306”: {
            “address”: “sihoprdochdb03:3306”,
            “mode”: “R/O”,
            “readReplicas”: {},
            “role”: “HA”,
            “status”: “UNREACHABLE”
            }
            },
            “topologyMode”: “Single-Primary”
            },
            “groupInformationSourceMember”: “sihoprdochdb01:3306”
            }
            [mysql@sihoprdochdb01 SQL-LOG]$

          • and if you do cluster.forceQuorumUsingPartitionOf(‘cladmin@sihoprdochdb01’)

  6. Hi Lefred

    Issue being resolved after restarting the MySQL Daemon from 3 nodes.
    And Start the cluster from major outage.

    Thanks for your quick reply,

    Need to understand why this issue happen , where we killed some sessions in database using Router connection from workbench.

  7. Hi Lefred,
    What happened with application that connects via router on primary node in situation when primary node fails? Does, for example, node 2 becomes primary and router automatically transfer connection to another server despite connection string?

    • In case of failover, new connections will be routed to the new Primary. Existing connection will be closed and the affiliation will have to reconnect. The running trx or sessions won’t be replayed , the application needs to handle that and in case of failure reconnect and replay the full trx that was not yet committed.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.