Dynamic InnoDB Redo Log

Since 8.0.30, you have the possibility to modify the InnoDB Redo Log Capacity online. An undersized Redo Log Capacity is problematic and lead to performance issues.

However, it’s not recommended to oversize the Redo Log either. Redo Log files consume disk space and increases the recovery time in case of a restart (innodb_fast_shutdown=1) or a sudden crash. And it also slows down shutdown when innodb_fast_shutdown=0.

This means that now, you don’t need to restart MySQL if you want to increase or decrease the size of the InnoDB Redo Logs files. In fact, we don’t talk anymore about file size but about capacity ! The DBA doesn’t need to specify any file size and/or amount of files for Redo Logs anymore as it was the case previously.

New Redo Log Architecture

The new redo log size is defined by one single variable: innodb_redo_log_capacity (in bytes). The default is 100MB.

For example, you can change the Redo Log capacity to 200MB by using the following SQL statement:

set global innodb_redo_log_capacity=200*1024*1024;

InnoDB creates 32 redo log files in MySQL’s datadir inside a new dedicated folder: #innodb_redo.

Inside that directory, you will be able to find two types of files:

  • #ib_redoXXX (where XXX is the file_id, a sequence number): those are the active redo log files
  • #ib_redoXXX_tmp: those are spare redo log files
example of InnoDB redo log folder’s content

InnoDB tries to maintain approximately 32 files here, so that it doesn’t need to wait long before one of them becomes no longer needed as it would if you had just 2 big files. This way it can reclaim them one by one when you want to resize them.

This is how it could be represented:

  • checkpoint_lsn (Innodb_redo_log_checkpoint_lsn) : an LSN point up to which all changes to the pages are guaranteed to have already been written and fsynced back to tablespace files – basically, the still needed portion of redo log starts here.
  • current_lsn (Innodb_redo_log_current_lsn) : the last written position in the redo log. That write could still be buffered inside MySQL processes buffer.
  • flushed_to_disk_lsn (Innodb_redo_log_flushed_to_disk_lsn) : the last position in the redo log that InnoDB has been flushed to disk.

Now when we reach the end of the 31st file (90%), the log files governor will perform some cleanup and some active files that are not needed anymore will become the new spare ones:

When the background thread is not able to remove a log file from the left to put it to the right, the user transaction will get stuck waiting for REDO buffers to be written to disk. DBAs get warning in the error log notifying them to increase the InnoDB Redo Log Capacity:

 [Warning] [MY-013865] [InnoDB] Redo log writer is waiting for a new redo log file.
 Consider increasing innodb_redo_log_capacity.

Checkpointing

Let’s recap what is the role of the Redo Log. As you may already know, each time data is changed in InnoDB, the page(s) containing the data is modified in memory (in the InnoDB Buffer Pool). The page(s) is (are) noted as dirty. In case of a sudden crash, we cannot loose all those changes… but the data in memory is gone ! This is why diff data of the pages are also written (and by default flushed to disk) on the redo logs. The data in those logs will be read only in case of InnoDB Recovery. During that process the modified pages will be reconstructed with the modified data.

InnoDB flushes those dirty pages from the Buffer Pool (memory) to the table spaces (disk) in small batches, step by step. This operation is called Fuzzy Checkpointing.

Once the pages are written to the data files on disk (InnoDB tablespaces), the corresponding entries in the Redo Log are not required anymore. The position up to which InnoDB has written the data to the disk is the value of Innodb_redo_log_checkpoint_lsn.

InnoDB Checkpointing is adaptive. This means that considering the checkpoint age (log_lsn_checkpoint_age) InnoDB will decide to flush less or more aggressively.

The manual has a section about adaptive flushing.

For info, log_lsn_checkpoint_age and inndob_redo_log_logical_size are almost equivalent:

Back to the lsn checkpoint age and the redo log capacity thresholds:

  • soft limit for logical capacity: to avoid deadlocks InnoDB doesn’t let the user transactions to use up the whole innodb_redo_log_capacity – instead it keeps them below soft logical capacity which is roughly 30/32 of it. When this limitation is exceeded, all user threads are paused and a message is sent to the error_log
  • hard limit for logical capacity: this limitation is never exceeded. If space isn’t reclaimed after 1 second wait when the limit is reached, log are written as much as possible or crash InnoDB !
  • async flush point (log_max_modified_age_async): writes are allowed but page flushing will be gradually increased to reach the next threshold. This will lead to a drop of performance. In the code, async flush point can be called adaptive_flush_min_age. This is 7/8 of the soft logical capacity.
    However, in practice, it seems that the adaptive flushing already starts at innodb_adaptive_flushing_lwm (by default 10% of soft logical capacity), and reaches maximum allowed IO capacity already at 82% of the async flush point.
  • sync flush point (log_max_modified_age_sync): at this point the checkpointer will request page cleaners to flush as much of dirty pages to get the checkpoint age below this threshold and will wait for it synchronously. Terrible performance. This is also called adaptive_flush_max_age. This is 15/16 of the soft logical capacity.
  • aggressive_checkpoint_min_age: this represents 31/32 of soft logical capacity. When this point is reached, InnoDB already asked to InnoDB to flush dirty pages from the Buffer Pool at full speed. The checkpointer will not sleep for 1 second between attempting updating checkpoint lsn – instead it will request a sync checkpoint as often as possible and will also update checkpoint_lsn value to the redo log header as soon as possible afterwards. This is performed to be able to reclaim the space faster. As we are already at the top speed, this doesn’t add any more pressure to the page cleaners.

Instrumentation

The new Redo Log is instrumented in Performance_Schema in the table innodb_redo_log_files:

This means there are 5 active redo log files and 27 (32-5) spare ones (_tmp):

All the files are also instrumented in Performance_Schema‘s file instance tables (file_instances and file_summary_by_instance):

As usual, there are status variables and InnoDB Metrics entries providing information about the “flushpointing” operations:

global status variables

innodb metrics

The Snake Metaphor

The new Redo Log Architecture can be seen as a snake (the useful redo log information that is still needed to insure the Durability of InnoDB) laying across cages (the redo log files). Those cages are consecutively connected with each other so that snake can keep going forward. When the snake is reaching the end of the penultimate cage on the right, InnoDB takes the no longer needed cages from the left to place them on the right. The amount of cages is always 32 (unless very special circumstances).

The size of the snake can grow longer or shrink. When InnoDB flushes dirty pages from the Buffer Pool to the data files, the redo log information is not needed and snake’s tails (checkpoint_lsn) moves also to the right. The remaining cages on the left of the snake’s tail are containing the snake’s old skin.

Some Examples

Let’s have a look at some output and see how this can be illustrated.

Example 1 – no load

In the example above, we can see that the current_lsn, the checkpoint_lsn and the flushed_to_disk_lsn have the same value. They are all in the last active log (id 10844). In fact there are no transaction for the moment.

The snake is at its minimal size (Innodb_redo_log_logical_size) : 512 bytes. The checkpoint age is 0:

Example 2 – load

We see here that the snake is bigger, and still has plenty of room. In case it’s needed (when the snake will reach 90% of the penultimate cage, the cages containing the snake’s old skin will be re-used as new spare cages.

Redo Log Consumers

As I already explained, InnoDB Redo Logs are only read (consumed) during InnoDB recovery process. But this is not the truth. As explained in this blog post, some process can also read the redo log files.

This is the list of potential “consumers”:

  1. InnoDB Recovery.
  2. a thread that used the UDF innodb_redo_log_consumer_register(). This is used by MEB (MySQL Enterprise Backup) for example. As this can cause considerable performance issue when badly used, the BACKUP_ADMIN privilege is required.
  3. Clone’s archiver.

Those process can slow down the re-use of redo log files (conversion to spare redo log files). Meaning that the left-most cage can not be moved to the right because still in use.

These consumers must provide InnoDB with their progress by sharing their current lsn.

In case of InnoDB Recovery (1), we know what is the oldest needed lsn by looking at current checkpoint lsn. That means “the checkpointer thread” is one of consumers, and it is always registered.

For (2), the consumer needs to use the UDF inndob_redo_log_consumer_advance(lsn) to move the cursor forward manually. The session needs to be ended to de-register the consumer.

And finally, when clone (3) is in progress, we register clone’s consumer and clone knows how and then to advance its current lsn (using an API).

Some Useful Queries

These are some interesting queries with their output. Most of them use the InnoDB Metrics table.

To enable all you need to run the following SQL statement:

SET GLOBAL innodb_monitor_enable = all;

You can also specify those you want to enable. You must do it one by one like this:

SET GLOBAL innodb_monitor_enable= 'log_lsn_checkpoint_age';

Overview of the redo log active files

SELECT file_id, start_lsn, end_lsn, 
       if(is_full=1,'100%',
          concat(round((((
               select VARIABLE_VALUE 
                 from performance_schema.global_status 
                where VARIABLE_NAME='Innodb_redo_log_current_lsn'
               )-start_lsn)/(end_lsn-start_lsn)*100),2),'%')) full,
          concat(format_bytes(size_in_bytes)," / " ,
          format_bytes(@@innodb_redo_log_capacity) ) file_size, 
      (select VARIABLE_VALUE from performance_schema.global_status 
        where VARIABLE_NAME='Innodb_redo_log_checkpoint_lsn') checkpoint_lsn,
      (select VARIABLE_VALUE from performance_schema.global_status 
        where VARIABLE_NAME='Innodb_redo_log_current_lsn') current_lsn, 
      (select VARIABLE_VALUE from performance_schema.global_status 
        where VARIABLE_NAME='Innodb_redo_log_flushed_to_disk_lsn') flushed_to_disk_lsn,
      (select count from information_schema.INNODB_METRICS 
        where name like 'log_lsn_checkpoint_age') checkpoint_age 
FROM performance_schema.innodb_redo_log_files;

Getting the Checkpoint Age

The checkpoint age is the difference of the current LSN value and the value of the last checkpoint LSN. As explained before, the checkpoint age should be almost equivalent to the value of the InnoDB Redo Log Logical Size.

This is the query I use to display it (we already saw the query earlier):

select concat(count, " (", format_bytes(count), ")") log_lsn_checkpoint_age,
       concat(variable_value, " (", format_bytes(variable_value),")")
             innodb_redo_log_logical_size 
 from information_schema.innodb_metrics 
 join performance_schema.global_status 
where variable_name like 'innodb_redo_log_logical_size' 
      and name like 'log_lsn_checkpoint_age';

Please pay attention that InnoDB Metrics must be enabled to have a coherent result.

Without having the metrics enabled, you could end up with strange invalid results like this:
+------------------------+------------------------------+
| log_lsn_checkpoint_age | innodb_redo_log_logical_size |
+------------------------+------------------------------+
| 5976089 (5.70 MiB) | 512 ( 512 bytes) |
+------------------------+------------------------------+

This is just incorrect, log_lsn_checkpoint_age should be 0 ! You can verify with the following query that the required metric is enabled:

SELECT status FROM INFORMATION_SCHEMA.INNODB_METRICS
WHERE NAME like 'log_lsn_checkpoint_age';
+----------+
| status |
+----------+
| disabled |
+----------+

Calculating the optimal InnoDB Redo Log Capacity

During peak traffic time, you can get an estimation of the required amount for the Redo Log Capacity Size by running the query below (all in one single line):

select VARIABLE_VALUE from performance_schema.global_status 
 where VARIABLE_NAME='Innodb_redo_log_current_lsn' into @a;select sleep(60) into @garb ;select VARIABLE_VALUE from performance_schema.global_status 
 where VARIABLE_NAME='Innodb_redo_log_current_lsn' into @b;select 
 format_bytes(abs(@a - @b)) per_min, format_bytes(abs(@a - @b)*60) per_hour;

The rule of thumb is to make the Redo Log Capacity large enough to hold at least 1h of logs, so as not to force InnoDB to dump dirty pages too often in case the Redo Log Capacity is too small, and not to waste disk and take too long for recovery if it is too large.

Redo Log usage view

When the appropriate InnoDB Metrics are enabled, it’s also possible to get an overview of the Redo Log’s usage and see where we are in relation to the soft and hard redo log logical capacity:

select concat(variable_value, " (", 
              format_bytes(variable_value),")") innodb_redo_log_logical_size,
       concat(round(count*8/7), " (", 
              format_bytes(round(count*8/7)), ")") soft_logical_capacity,
       concat(round(@@innodb_redo_log_capacity*29.8/32), " (",
              format_bytes(round(@@innodb_redo_log_capacity*29.8/32)) ,")") hard_logical_capacity,
       concat(@@innodb_redo_log_capacity, " (",
              format_bytes(@@innodb_redo_log_capacity) ,")") redo_log_capacity,              
       concat(round(variable_value / (count*8/7)*100,2), "%") logical_used,
       concat(round(variable_value / (@@innodb_redo_log_capacity*29.8/32)*100,2), "%") hard_used  
  from performance_schema.global_status 
  join information_schema.innodb_metrics 
 where variable_name like 'innodb_redo_log_logical_size' 
   and name like 'log_max_modified_age_async';

A graph a day keeps the doctor away !

As I always say, trending your database can help you understand the workload and prevent issues.

I’ve updated my MySQL Shell plugins that collect and generate graphs for MySQL with the new MySQL 8.0.30 redo log. I explained in a previous post how to use it.

Here are some output of these new graphs:

If we check the graph above, we can see the different threshold limits. And we can see that we never had to trigger something special. This graph only is not enough to see if we hit some issue. At contrary, if we had already passed only one horizontal line, this would have been problematic.

This graph shows the difference between the current_lsn and the lsn_flushed_to_disk. If the difference stays always high then it could mean that the sync operation to disk is too slow.

However we can see something strange on the graph (pointed with the red arrow). The difference stayed the same for a long time. This means there was data on the MySQL processes buffers but that could not be sent to the redo log files on disk. The next graph will provide us the reason for this:

We can see that twice we reached 32 active InnoDB Redo Log files. This means no spare at all. This should not be possible… unless we have something else also consuming the Redo Log. In fact in another session, I was registering a consumer and didn’t advance the lsn.

We can see that for the same period, there was no new lsn, nothing could be written.

InnoDB was of course not able to flush any new page as checkpointing operations (ib_pg_flush_co & ib_log_checkpt) are blocked waiting for redo log fsync.

In sysbench’s output, we can see that no inserts are made (I’m using oltp_insert.lua):

qps: 0.00 (r/w/o: 0.00/0.00/0.00)

And in MySQL’s processlist we can see plenty of insert queries in waiting for handler commit state.

Pay attention that if you stay in that state for 600 seconds mysqld will intentionally crash !

Conclusion

This new InnoDB Redo Log architecture seems to scale very well and the possibility to resize it dynamically online is really something that makes the difference !

As usual, enjoy MySQL !

Thank you very much to Kuba Lopuszanski from the InnoDB Team for helping me and reviewing this article.

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

4 Comments

  1. Hi @lefred
    Very interesting article !
    As usual, you have demonstrated in a clear and understandable way the usefulness of this new feature of redo_log
    Thank you !

  2. I have to maintain an application where two mysql (8.0.31) servers (ubuntu 22.04.1) are in cluster and share data on an nfs server. mysqld cannot access innodb_redo files. only moving locally #innodb_redo dir I can start mysqld. but when I switch from one node to the other the databases are not aligned and a lot of errors (lsn in the future) are logged and var partition is always full. I tried a lot of work arounds (now I periodically scp the directory from the active node to the other and clean /var/log) but I’m searching for a ckean solution. please help

    • Hi Claudia,

      Running MySQL on NFS is not a recommended practice, certainly when you are looking for performance. Also, if I understood you correctly, you are trying to run two mysqld process on the SAME data stored on a NFS share ? STOP it ! This will never work !

      MySQL won’t allow that, if it works it’s just by chance. There is no guarantee that the pages will be the same and of course, the flushing etc… will just lead in data corruption.

      If you are looking for a cluster solution (High Availability), please check MySQL InnoDB Cluster, included in MySQL, it’s Open Source.

      Regards,

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

As MySQL Community Manager, I am an employee of Oracle and the views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

You can find articles I wrote on Oracle’s blog.