MySQL Group Replication: understanding Flow Control

When using MySQL Group Replication, it’s possible that some members are lagging behind the group. Due to load, hardware limitation, etc… This lag can become problematic to keep good certification behavior regarding performance and keep the possible certification failure as low as possible. Bigger is the applying queue bigger is the risk to have conflicts with those not yet applied transactions (this is problematic on Multi-Primary Groups).

Galera users are already familiar with such concept. MySQL Group Replication’s implementation is different 2 main aspects:

the Group is never totally stalled
the node having issues doesn’t send flow control messages to the rest of the group asking for slowing down

In fact, every member of the Group send some statistics about its queues (applier queue and certification queue) to the other members. Then every node decide to slow down or not if they realize that one node reached the threshold for one of the queue:

group_replication_flow_control_applier_threshold   (default is 25000)
group_replication_flow_control_certifier_threshold (default is 25000)

So when group_replication_flow_control_mode is set to QUOTA on the node seeing that one of the other members of the cluster is lagging behind (threshold reached), it will throttle the write operations to the the minimum quota. This quota is calculated based on the number of transactions applied in the last second, and then it is reduced below that by subtracting the “over the quota” messages from the last period.

This mean that as contrary of Galera where the threshold is decided on the node being slow, for us in MySQL Group Replication, the node writing a transaction check its threshold flow control values and compare them to the statistics from the other nodes to decide to throttle or not.

You can find more information about Group Replication Flow Control reading Vitor’s article Zooming-in on Group Replication Performance

10 Comments

Peter Zaitsev

July 4, 2017 / 14:12 Reply

Fred,

What happens in MySQL Group Replication if some node drastically slows down. Would quota be adjusted or such overly slow node would leave the cluster ?
lefred

July 14, 2017 / 14:22 Reply

Hi Peter,
Thank you for your comment. In fact, the cluster will just continue to slow down.
The group quota is calculated based on the number of transactions applied in the last second, and then it is reduced below that by subtracting the “over the quota” messages from the last period (with a 5% minimum). A stopped node would maintain that throughput indefinitely while the blocked node is not applying.

So even if a node is not applying anything (applying queue growing) the node won’t leave the group. The decision to leave the cluster is only based on network reliability. So if the node is not able to apply but continues to receives the events, keeps certifying them and insert them into its relay log, it won’t be expelled from the group.
MySQL InnoDB Cluster: how to handle performance issue on one member ? – lefred's blog: tribulations of a MySQL Evangelist

November 9, 2017 / 12:10 Reply

[…] sustain the same throughput all over the cluster, Group Replication uses a flow control mechanism (see this post to understand how it works). In summary, when a node as an apply queue increasing and reaching a threshold, the other ones […]
MySQL InnoDB Cluster: how to handle performance issue on one member ? – Cloud Data Architect

November 10, 2017 / 12:23 Reply

[…] sustain the same throughput all over the cluster, Group Replication uses a flow control mechanism (see this post to understand how it works). In summary, when a node as an apply queue increasing and reaching a threshold, the other ones […]
MySQL InnoDB Cluster Performance Issue | MySQL Unleashed

January 26, 2018 / 00:58 Reply

[…] sustain the same throughput all over the cluster, Group Replication uses a flow control mechanism (see this post to understand how it works). In summary, when a node as an apply queue increasing and reaching a threshold, the other ones […]
jfxu

February 22, 2018 / 11:11 Reply

Fred,

I wonder is there anyway to avoid slave lag? Thank you.
- lefred
  
  February 22, 2018 / 11:21 Reply
  
  Hi jfxu,
  Do you mean in Group Replication ?
  Using multiple workers, using LOGICAL_CLOCK as parallel type and keeping small transactions already helps.
  Now as writesets apply is asynchronous, you can’t be 100% sure lag won’t happen, but you can check it, for example ProxySQL 2.0 (not yet GA), implements the session_track_gtid (see http://lefred.be/content/mysql-group-replication-read-your-own-write-across-the-group/) to point to nodes having the data. See https://fosdem.org/2018/schedule/event/proxysql_gtid/
  Regards,
Fuxkdb

January 8, 2020 / 07:52 Reply

We have a Multi-Primary MGR with three nodes ( GR1, GR2, GR3 ). GR1 is the only writer, GR3 reach the flow control threshold, if we execute “set global group_replication_flow_control_mode=’DISABLED'” on GR3, will the Flow control disappear?
or we should execute “set global group_replication_flow_control_mode=’DISABLED'” on all node in the group replication to stop flow control?
Snehal Bhavsar

April 8, 2021 / 20:36 Reply

I am having three nodes in a cluster, lets say node1, node2 and node3 and node1 is the master of this innodb cluster. If the average load increases on master(node1) then after reaching the threshold, group will start flow control for other slower nodes (say node2 or node3). In this case there are some points which need to have clarification on the Flow Control concept.

1. In this case will the master be able to accepts more RW operation from client? Since, being flow control triggered in the group and if certifier or applier queue has increased on secondary nodes, Will master will have any impact on serving the workloads?
2. Will it wait to send the remaining transactions to secondary servers?
3. Will it wait for the certifications of the upcoming new transactions?
4. Will the performance of the master will be impacted due to flow control? As flow control ensures there should be minimum difference between primary and secondaries in terms of backlog.
- lefred
  
  April 9, 2021 / 14:16 Reply
  
  Hello,
  
  Yes there is an impact. So when group_replication_flow_control_mode is set to QUOTA on the node seeing that one of the other members of the cluster is lagging behind (threshold reached), it will throttle the write operations to the a quota that is calculated based on the number of transactions applied in the last second, and then it is reduced below that by subtracting the “over the quota” messages from the last period.
  Check https://www.slideshare.net/lefred.descamps/dataopsbarcelona-2019-deep-dive-into-mysql-group-replication-the-magic-explained from slide 171

Cookie	Duration	Description
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

MySQL Group Replication: understanding Flow Control

follow me

Subscribe to Blog via Email

10 Comments

Leave a ReplyCancel Reply