When using MySQL Group Replication, it’s possible that some members are lagging behind the group. Due to load, hardware limitation, etc… This lag can become problematic to keep good certification behavior regarding performance and keep the possible certification failure as low as possible. Bigger is the applying queue bigger is the risk to have conflicts with those not yet applied transactions (this is problematic on Multi-Primary Groups).
Galera users are already familiar with such concept. MySQL Group Replication’s implementation is different 2 main aspects:
- the Group is never totally stalled
- the node having issues doesn’t send flow control messages to the rest of the group asking for slowing down
In fact, every member of the Group send some statistics about its queues (applier queue and certification queue) to the other members. Then every node decide to slow down or not if they realize that one node reached the threshold for one of the queue:
group_replication_flow_control_applier_threshold (default is 25000) group_replication_flow_control_certifier_threshold (default is 25000)
group_replication_flow_control_mode is set to
QUOTA on the node seeing that one of the other members of the cluster is lagging behind (threshold reached), it will throttle the write operations to the the minimum quota. This quota is calculated based on the number of transactions applied in the last second, and then it is reduced below that by subtracting the “over the quota” messages from the last period.
This mean that as contrary of Galera where the threshold is decided on the node being slow, for us in MySQL Group Replication, the node writing a transaction check its threshold flow control values and compare them to the statistics from the other nodes to decide to throttle or not.
You can find more information about Group Replication Flow Control reading Vitor’s article Zooming-in on Group Replication Performance