In a previous post, I explained how you can collect and plot metrics using MySQL Shell.
This is a new series of article where I will explain how to read and understand some of the generated graphs.
Understanding your workload and seeing the evolution of it over time can help anticipating problems and work on solutions before the breakdown.
Let’s start the series with a concept that is not always well understood or at least not always considered at its true value: MySQL Checkpoint Age.
Before analyzing the graph, we need to understand what is MySQL InnoDB Checkpointing.
Each change to a data page in the InnoDB Buffer Pool are also written into the Write Ahead Logs.
In the literature they are sometimes called Transaction Logs but this term is not totally correct within InnoDB.
Those logs are used only in case of a crash and InnoDB needs to perform a recovery of all transactions that have been committed. This process guarantees the durability, the D in ACID. The transaction logs are also called redo logs.
However at some point, InnoDB will also write the changed page to disk in the tablespaces (data files). The process of writing the dirty pages (pages that have been modified) to the tablespaces is known as flushing.
The checkpoint represent the LSN value of the latest changes written to the data files.
InnoDB flushes small batches of those dirty pages from the buffer pool, this is why it’s called fuzzy checkpointing. MySQL does not flush them all at once to avoid heavy process that could disrupt the normal usage of MySQL.
By default, the redo log is composed of 2 files
ib_logfile1. Those file contain the changes that are made to the InnoDB pages but are not InnoDB pages.
The transaction log can be represented like a circular log like this:
We know that when a transaction that modifies data is committed, the change is written in the Write Ahead Log (flushed and synced to disk for durability). That write is done to the head of the redo log. This makes the head advance.
The flushing process that writes the dirty pages from the Buffer Pool to the tablespaces moves the tail forward the corresponding data changes in the write ahead log.. Then that space can be reused. The tail and the head can only move forward (clockwise on the illustration above).
The Checkpoint Age is the length in bytes between the tail and the head.
As it is a fixed size circular log, the head could reach the tail if the write rate exceeds to flush rate… and that would be a horrible problem !
InnoDB will never let that happen ! If that scenario would be possible, all writes would be blocked!
Async and Sync Flush Points
To avoid the previous chaotic scenario, InnoDB will take actions in case of reaching some threshold points:
- async flush point: write are allowed but page flushing will be increased to reach its maximum capacity. This will lead to a drop of performance.
- sync flush point: at this point all writes are stopped and InnoDB will only perform page flushing as much as it can. Terrible performance.
Async flush point is reached at 75% of the total log file size. Sync fluch point is at 87.5%.
Now you know the reason why it’s important to monitor the Checkpoint Age to avoid poor performance.
We can see here that after 5 minutes, the checkpoint age was very close of the async flush point and can be problematic. Again, 2 minutes later the that checkpoint age increased again.
In case you see that the async point is (almost) reached, you need to identify (and we will see that in future articles in this series) if you reached the I/O capacity of the system, or if you have the possibility to increase the flushing rate.
Another option of course is to increase the size of redo logs (
If you are interested in InnoDB flushing and checkpointing, I recommend you to read Efficient MySQL Performance from Daniel Nichter, O’Reilly 2022 (from page 205).
Enjoy MySQL and keep plotting your MySQL performance metrics !