From the definition, Group Replication is a multi-master update everywhere replication plugin for MySQL with built-in conflict detection and resolution, automatic distributed recovery, and group membership.
So we can indeed compare this solution with Galera from Codership which is a Replication Plugin implementing the WSREP API. WSREP, Write Set Replication, extends the replication API to provide all the information and hooks required for true multi-master, “virtually synchronous” replication.
With Group Replication, MySQL implemented all this information in the plugin itself. Our engineers leveraged existing standard MySQL infrastructure (GTIDs, Multi-Source Replication, Multi-threaded Slave Applier, Binary Logs,…) and prepared InnoDB since several releases to provide all the necessary features like High Priority Transaction in InnoDB since 5.7.6 for example.
This means that Group Replication is based on well known and trusted components and makes the integration and the adoption an easier process.
Both solutions are based on Replicated Database State Machine theory.
What are the similarities between both solutions ?
MySQL Group Replication and Galera use write sets. A write set is a set of globally unique identifiers of each
logical item changed by the transaction when it executed (item may be a row, a table, a metadata object, …).
So, Group Replication and Galera use ROW binary log events, and together with the transaction data, its writesets are streamed synchronously from the server that received the write (Master for that specific transaction) to the other members/nodes in the cluster.
Then they will certify the writeset (transaction) locally and asynchronously queue the accepted changes to be applied.
Then both solutions will make use of the write sets to check
for conflicts between concurrent transactions executing on
different replicas. This procedure is named certification. So they will certify the write set locally and asynchronously queue the accepted changes to be applied.
Both implementations use a group communication engine that manages quorums, membership, message passing, …
So what is different then ?
The biggest difference is that Group Replication (GR) is a plugin for MySQL, made by MySQL, packaged and distributed with MySQL by default. Also, GR is available and supported on all MySQL Platforms: Linux, Windows, Solaris, OSX, FreeBSD.
As said before, GR also uses all the same infrastructure that people are used to (binlogs, GTIDs, …). In addition to familiarity and trust, this makes it much easier to integrate a Group Replication cluster into more complex topologies where different asynchronous master/slaves are also involved.
There are many implementation differences. I’ll list them in those categories:
- Group Communication
- Binary Log & Snapshot
- GTID, Master-Master & Master-Slaves
Galera is using a proprietary group communication system layer, which implements a virtual synchrony QoS which is based on the Totem Single-ring Ordering protocol. MySQL Group Replication use a Group Communication System (GCS) based on a variant of the popular Paxos algorithm.
This allows GR to achieve much more optimal network performance, thus greatly reducing the overall latency within the distributed system (more information about this in Vitor’s blog post). In fact the more nodes you add (currently GR supports up to 9 nodes per group), more the commit time will increase in Galera where it will stay almost stable with GR. This is due to GR using a peer-to-peer style communication versus Galera’s token ring.
Compared to Galera that needs to patch MySQL and add an extra layer to be able to kill a local transaction when there are certification conflicts, Group Replication uses the High Priority Transactions in InnoDB, which allows Group Replication to ensure that conflicts are detected and handled properly.
Even if it requires
binlog_format=ROW, Galera doesn’t need to have the binary logs enabled. It’s anyway recommended to enable them for point-in-time recovery, asynchronous replication to a slave out of the cluster or for forensic purpose. So Galera doesn’t use the binary log to perform the incremental synchronization between the nodes.
Galera uses an extra file called gcache (Galera Cache). This file was not resilient since the last Galera release (3.19, and it’s not guaranteed). The data stored inside of this file can’t be used for anything else than IST (Incremental State Transfer).
In Group Replication, we keep using the binary log files for that purpose. So if a node was out for a short period, it will perform the synchronization from the binary logs of the node that has been elected as donor. This is called IST in Galera (from the gcache when data is available) and Automated Distributed Recovery in GR.
Basing our solution on binary logs allows us to have the data safely persisted (flushed and sync’d). Also this is a well known format and as mentioned above, binary logs server many purposes too (distributed recovery, asynchronous replication, Point-in-time recovery, streaming or piping to other system like Kafka… and can even be used to perform schema changes!).
The Galera Cache file is used to store the writesets in circular buffer style and has its size pre-defined. So it might happen that IST is impossible and that a full state transfer is required (SST).
And this is maybe one of the advantage of Galera for people having a lot of network or hardware problems: the full data provisioning. It’s true that with Galera, when a new node is added to the cluster, it’s possible to not prepare the new node in advance. This is very convenient for newbies. We understand the need for a better solutions. Currently this process is pretty much the same as provisioning a slave when using regular replication.
However, every Galera experienced DBAs can also tell you that they try to avoid SST as much as possible.
GTID, Master-Master, Master-Slave
Like Galera, GR has one attributed UUID for the cluster. The difference with Galera is that even if all nodes in the same Group share the same UUID, in GR they have their own sequence number range (defined by group_replication_gtid_assignment_block_size).
And like Galera if your workload allows it (more to come in a future post), you can use a multi-master cluster and write on all the nodes at the same time. But as this is some how synchronized, that won’t scale up writes anyway. So, even if it’s not really advertised in Galera, with Group Replication we recommend to use a single-master at the time to reduce the probability of conflicts.
Writing on one single master also allows to avoid probable issues when dealing with schema changes and modifying data on another node at the same time.
This is why by default, in MySQL Group Replication, your cluster runs in Single Primary Mode (controller by group_replication_single_primary_mode). This means the Group itself will automatically elect a leader and keep managing this task when the group changes (in case of failure of the leader). Don’t forget that Group Replication is first a High Availability solution.
Of course, even when using te cluster in Single Primary Mode, the limitations or recommendations related to Group Replication still apply (like disabling binlog checksum, using only InnoDB tables, let the Group manage the auto_increment related variables, …), but there are some less.
Unlike Galera that uses only status variables (if I remember correctly), Group Replication uses Performance Schema to expose information. The Galera fork present in Percona XtraDB Cluster also uses performance_schema in its 5.7 version.
For example, in Galera it’s not easy to find from any node which others nodes are in the cluster and what’s their status. With Group Replication we expose all that in performance_schema:
select * from replication_group_members\G *************************** 1. row *************************** CHANNEL_NAME: group_replication_applier MEMBER_ID: e8fe7c39-ada4-11e6-8891-08002718d305 MEMBER_HOST: mysql3 MEMBER_PORT: 3306 MEMBER_STATE: ONLINE *************************** 2. row *************************** CHANNEL_NAME: group_replication_applier MEMBER_ID: e920a7cf-ada4-11e6-8971-08002718d305 MEMBER_HOST: mysql2 MEMBER_PORT: 3306 MEMBER_STATE: ONLINE *************************** 3. row *************************** CHANNEL_NAME: group_replication_applier MEMBER_ID: e92186b1-ada4-11e6-ba00-08002718d305 MEMBER_HOST: mysql1 MEMBER_PORT: 3306 MEMBER_STATE: ONLINE
As you can see, the Performance_Schema tables offer an easy and intuitive way to get information and stats on an individual node and the group as a whole.
If you are using a solution that requires an health-check to monitor the nodes and decide of the routing from the application to the right node(s), you can also base your script on
sys schema that provides views with all the information you need to make the right routing decision.
So, it’s really true that Galera benefits from many years of experience and has still many more features, some major like the arbitrator, or minor like node weight, sync wait, segments, … but Group Replication is a solid contender, certainly if you are looking for great performance.
If you think that you are missing something to adopt this technology, just drop me a comment explaining your need. Also don’t hesitate to comment this blog post if I missed something or if you don’t agree on some points, I can always review my thoughts.
 I was never a big fan of the use of an arbitrator in Galera, as all data need to reach the node anyway, for the storage price those days, I consider that it’s much safer to have a real cluster node where data is also replicated. 3 data copies are always better than 2 😉