MySQL Group Replication: understanding Flow Control

on
Share on FacebookShare on Google+Tweet about this on TwitterShare on LinkedIn

When using MySQL Group Replication, it’s possible that some members are lagging behind the group. Due to load, hardware limitation, etc… This lag can become problematic to keep good certification behavior regarding performance and keep the possible certification failure as low as possible. Bigger is the applying queue bigger is the risk to have conflicts with those not yet applied transactions (this is problematic on Multi-Primary Groups).

Galera users are already familiar with such concept. MySQL Group Replication’s implementation is different 2 main aspects:

  • the Group is never totally stalled
  • the node having issues doesn’t send flow control messages to the rest of the group asking for slowing down

In fact, every member of the Group send some statistics about its queues (applier queue and certification queue) to the other members. Then every node decide to slow down or not if they realize that one node reached the threshold for one of the queue:

group_replication_flow_control_applier_threshold   (default is 25000)
group_replication_flow_control_certifier_threshold (default is 25000)

So when group_replication_flow_control_mode is set to QUOTA on the node seeing that one of the other members of the cluster is lagging behind (threshold reached), it will throttle the write operations to the the minimum quota. This quota is calculated based on the number of transactions applied in the last second, and then it is reduced below that by subtracting the “over the quota” messages from the last period.

This mean that as contrary of Galera where the threshold is decided on the node being slow, for us in MySQL Group Replication, the node writing a transaction check its threshold flow control values and compare them to the statistics from the other nodes to decide to throttle or not.

You can find more information about Group Replication Flow Control reading Vitor’s article Zooming-in on Group Replication Performance

 

Share on FacebookShare on Google+Tweet about this on TwitterShare on LinkedIn

4 thoughts on “MySQL Group Replication: understanding Flow Control

  1. Fred,

    What happens in MySQL Group Replication if some node drastically slows down. Would quota be adjusted or such overly slow node would leave the cluster ?

  2. Hi Peter,
    Thank you for your comment. In fact, the cluster will just continue to slow down.
    The group quota is calculated based on the number of transactions applied in the last second, and then it is reduced below that by subtracting the “over the quota” messages from the last period (with a 5% minimum). A stopped node would maintain that throughput indefinitely while the blocked node is not applying.

    So even if a node is not applying anything (applying queue growing) the node won’t leave the group. The decision to leave the cluster is only based on network reliability. So if the node is not able to apply but continues to receives the events, keeps certifying them and insert them into its relay log, it won’t be expelled from the group.

Leave a Reply

Your email address will not be published. Required fields are marked *

recent

Last Tweets Last Tweets

Locations of visitors to this page
categories