Group Replication is GA with MySQL 5.7.17 – comparison with Galera

It’s a wonderful news, we have released MySQL 5.7.17 with Group Replication Plugin (GA quality).

From the definition, Group Replication is a multi-master update everywhere replication plugin for MySQL with built-in conflict detection and resolution, automatic distributed recovery, and group membership.

So we can indeed compare this solution with Galera from Codership which is a Replication Plugin implementing the WSREP API. WSREP, Write Set Replication,  extends the replication API to provide all the information and hooks required for true multi-master, “virtually synchronous” replication.

With Group Replication, MySQL implemented all this information in the plugin itself. Our engineers leveraged existing standard MySQL infrastructure (GTIDs, Multi-Source Replication, Multi-threaded Slave Applier, Binary Logs,…) and prepared InnoDB since several releases to provide all the necessary features like High Priority Transaction in InnoDB since 5.7.6 for example.

This means that Group Replication is based on well known and trusted components and makes the integration and the adoption an easier process.

Both solutions are based on Replicated Database State Machine theory.

What are the similarities between both solutions ?

MySQL Group Replication and Galera use write sets. A write set is a set of globally unique identifiers of each
logical item changed by the transaction when it executed (item may be a row, a table, a metadata object, …).

So, Group Replication and Galera use ROW binary log events,  and together with the transaction data, its writesets are streamed synchronously from the server that received the write (Master for that specific transaction) to the other members/nodes in the cluster.

Then they will certify the writeset (transaction) locally and asynchronously queue the accepted changes to be applied.
Then both solutions will make use of the write sets to check
for conflicts between concurrent transactions executing on
different replicas. This procedure is named certification. So they will certify the write set locally and asynchronously queue the accepted changes to be applied.

Both implementations use a group communication engine that manages quorums, membership, message passing, …

So what is different then ?

The biggest difference is that Group Replication (GR) is a plugin for MySQL, made by MySQL, packaged and distributed with MySQL by default. Also, GR is available and supported on all MySQL Platforms: Linux, Windows, Solaris, OSX, FreeBSD.

As said before, GR also uses all the same infrastructure that people are used to (binlogs, GTIDs, …). In addition to familiarity and trust, this makes it much easier to integrate a Group Replication cluster into more complex topologies where different asynchronous master/slaves are also involved.

There are many implementation differences. I’ll list them in those categories:

  1. Group Communication
  2. InnoDB
  3. Binary Log & Snapshot
  4. GTID, Master-Master & Master-Slaves
  5. Monitoring

Group Communication

Galera is using a proprietary group communication system layer, which implements a virtual synchrony QoS which is based on the Totem Single-ring Ordering protocol. MySQL Group Replication use a Group Communication System (GCS) based on a variant of the popular Paxos algorithm.

This allows GR to achieve much more optimal network performance, thus greatly reducing the overall latency within the distributed system (more information about this in Vitor’s blog post). In fact the more nodes you add (currently GR supports up to 9 nodes per group), more the commit time will increase in Galera where it will stay almost stable with GR. This is due to GR using a peer-to-peer style communication versus Galera’s token ring.

InnoDB

Compared to Galera that needs to patch MySQL and add an extra layer to be able to kill a local transaction when there are certification conflicts, Group Replication uses the High Priority Transactions in InnoDB, which allows Group Replication to ensure that conflicts are detected and handled properly.

Binary Log

Even if it requires binlog_format=ROW, Galera doesn’t need to have the binary logs enabled. It’s anyway recommended to enable them for point-in-time recovery, asynchronous replication to a slave out of the cluster or for forensic purpose. So Galera doesn’t use the binary log to perform the incremental synchronization between the nodes.

Galera uses an extra file called gcache (Galera Cache). This file was not resilient since the last Galera release (3.19, and it’s not guaranteed). The data stored inside of this file can’t be used for anything else than IST (Incremental State Transfer).

In Group Replication, we keep using the binary log files for that purpose. So if a node was out for a short period, it will perform the synchronization from the binary logs of the node that has been elected as donor. This is called IST in Galera (from the gcache when data is available) and Automated Distributed Recovery in GR.

Basing our solution on binary logs allows us to have the data safely persisted (flushed and sync’d). Also this is a well known format and as mentioned above, binary logs server many purposes too (distributed recovery, asynchronous replication, Point-in-time recovery, streaming or piping to other system like Kafka… and can even be used to perform schema changes!).

The Galera Cache file is used to store the writesets in circular buffer style and has its size pre-defined. So it might happen that IST is impossible and that a full state transfer is required (SST).

And this is maybe one of the advantage of Galera for people having a lot of network or hardware problems: the full data provisioning. It’s true that with Galera, when a new node is added to the cluster, it’s possible to not prepare the new node in advance. This is very convenient for newbies. We understand the need for a better solutions. Currently this process is pretty much the same as provisioning a slave when using regular replication.

However, every Galera experienced DBAs can also tell you that they try to avoid SST as much as possible.

GTID, Master-Master, Master-Slave

Like Galera, GR has one attributed UUID for the cluster. The difference with Galera is that even if all nodes in the same Group share the same UUID,  in GR they have their own sequence number range (defined by group_replication_gtid_assignment_block_size).

And like Galera if your workload allows it (more to come in a future post), you can use a multi-master cluster and write on all the nodes at the same time. But as this is some how synchronized, that won’t scale up writes anyway. So, even if it’s not really advertised in Galera, with Group Replication we recommend to use a single-master at the time to reduce the probability of conflicts.

Writing on one single master also allows to avoid probable issues when dealing with schema changes and modifying data on another node at the same time.

This is why by default, in MySQL Group Replication, your cluster runs in Single Primary Mode (controller by group_replication_single_primary_mode). This means the Group itself will automatically elect a leader and keep managing this task when the group changes (in case of failure of the leader). Don’t forget that Group Replication is first a High Availability solution.

Of course, even when using te cluster in Single Primary Mode, the limitations or recommendations related to Group Replication still apply  (like disabling binlog checksum, using only InnoDB tables, let the Group manage the auto_increment related variables, …), but there are some less.

Monitoring

Unlike Galera that uses only status variables (if I remember correctly), Group Replication uses Performance Schema to expose information. The Galera fork present in Percona XtraDB Cluster also uses performance_schema in its 5.7 version.

For example, in Galera it’s not easy to find from any node which others nodes are in the cluster and what’s their status. With Group Replication we expose all that in performance_schema:

select * from replication_group_members\G
*************************** 1. row ***************************
CHANNEL_NAME: group_replication_applier
MEMBER_ID: e8fe7c39-ada4-11e6-8891-08002718d305
MEMBER_HOST: mysql3
MEMBER_PORT: 3306
MEMBER_STATE: ONLINE
*************************** 2. row ***************************
CHANNEL_NAME: group_replication_applier
MEMBER_ID: e920a7cf-ada4-11e6-8971-08002718d305
MEMBER_HOST: mysql2
MEMBER_PORT: 3306
MEMBER_STATE: ONLINE
*************************** 3. row ***************************
CHANNEL_NAME: group_replication_applier
MEMBER_ID: e92186b1-ada4-11e6-ba00-08002718d305
MEMBER_HOST: mysql1
MEMBER_PORT: 3306
MEMBER_STATE: ONLINE

As you can see, the Performance_Schema tables offer an easy and intuitive way to get information and stats on an individual node and the group as a whole.

If you are using a solution that requires an health-check to monitor the nodes and decide of the routing from the application to the right node(s), you can also base your script on sys schema that provides views with all the information you need to make the right routing decision.

Conclusion

So, it’s really true that Galera benefits from many years of experience and has still many more features, some major like the arbitrator[1], or minor like node weight, sync wait, segments, … but Group Replication is a solid contender, certainly if you are looking for great performance.

If you think that you are missing something to adopt this technology, just drop me a comment explaining your need. Also don’t hesitate to comment this blog post if I missed something or if you don’t agree on some points, I can always review my thoughts.

 

[1] I was never a big fan of the use of an arbitrator in Galera, as all data need to reach the node anyway, for the storage price those days, I consider that it’s much safer to have a real cluster node where data is also replicated. 3 data copies are always better than 2 😉

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

16 Comments

  1. The benefit of the arbitrator (witness, log-only node) is to make it possible to get consensus faster without having an extra replica for deployments that prefer at most one copy per geographic region and have geo regions that are far apart. The incremental cost of an arbitrator is much less than the cost of another replica. But maybe that doesn’t matter to most deployments.

    • Hi Mark,

      Thank you for your comment, but on Galera, the arbitrator, gets all the replication data like any other nodes. In fact, we could see it as mysqld storing the data to /dev/null (but this is not the case, there is no mysqld).

      And for that reason, as the data reaches already the node and the latency is also already added to the process, I think it’s better to save that data too.

      • Google had “witness replicas” in some of their Paxos implementations. We have something like witness replicas with our lossless semisync implementation. It is OK for us to agree that Group Replication is missing this feature. This is a feature and Group Replication isn’t better for not having it because without witness replicas the cost is either slower commit — because the replicas are farther apart on the network — or more hardware — because I added replicas that are only needed to make consensus faster.

        • Unfortunately the Galera arbitrator (nor the mongodb one, for that matter) also doesn’t store a transaction log, so it is not useful for this purpose either.

          • Bonjour Frédéric,

            Ton article est très interessant, toutefois je suis d’accord avec Mark, le witness/arbitre est un must have.
            De nombreuses sociétés travaillent uniquement sur 2DC et ont juste des witness sur un 3° notamment pour VMWare et autre techno d’infras. Avoir un troisième jeu de données est tout de même couteux.

            D’autant plus que sur le front de l’open source, beaucoup de techno permettent de faire cela (PostgreSQL Patroni, PostgreSQL RepMgr).

            Etant donné tes entrées chez MySQL, un petit mot au contributeur sur l’ajout de cette fonctionnalité serait bienvenue 😀

            Bonne journée

            Pierre

          • Merci pour le commentaire.

            Dans patroni, 3 DC sont aussi nécessaire selon la documentation (https://patroni.readthedocs.io/en/latest/ha_multi_dc.html).

            Le soucis d’un arbitrator pour un système de replication comme Groupe Replication (ou Galera) c’est que cet éventuel arbitrator doit également faire partie de la certification et recevra toutes les données qui sont répliquées de toute façon.

            Group Replication (comparé à Galera) utilie Paxos, ce qui permet à une transaction de continuer son processus (son commit) lorsque la majorité des noeuds ont reçu l’info… quid d’un cluster avec 2 noeuds et un arbitrator ? Si l’arbitrator répond avant… et bien on risque de perdre des données. Le plus important pour nous ce sont les données.

            Maintenant si c’est pour utiliser plusieurs DC, il existe d’autres solution comme MySQL InnoDB ClusterSet.

            Bonne journée

      • I believe it’s good the arbiter gets the data so it can act as relay between two nodes that might not be able to communicate directly; Not sure how that does happen within GR?

  2. “Galera is using a proprietary group communication system layer”

    I’m wondering, what exactly is proprietary there? I’m not a Galera expert, but IIRC the code is open-source, it’s based on some PHD Thesis.

    ” MySQL Group Replication use a Group Communication System (GCS) based on a variant of the popular Paxos algorithm.”

    So, is that one less proprietary? Is Galera’s mechanism further from Paxos than GCS in some way?

  3. Hi Fred,

    Have you started using GR please?
    Very much like to see feedbacks from community.

    Thanks for this article.
    James

Leave a Reply to Sergei PetruniaCancel Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

As MySQL Community Manager, I am an employee of Oracle and the views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

You can find articles I wrote on Oracle’s blog.