Query and Transaction size in MySQL

Some times it’s important to know the size of a transaction, especially when you plan to migrate to a HA solution where by default transactions have a limited size to guarantee an optimal behavior of the cluster.

Today we will see the different possibilities to have an idea of the size of transactions.

First we need to split the transaction in two types:

  • those generating data (writes, like insert, delete and update, DML)
  • those only ready data (select, DQL)

To implement High Availability, only the first category is important.

Size of DML

To know the size of a DML transaction, the only possibility we have is to parse the binary log (or query the binlog event).

We need to check the binlog event from the binlog file and then calculate its size. To illustrate this, let’s try to find the transaction identified by a specific GTID: 17f6a975-e2b4-11ec-b714-c8cb9e32df8e:7541914

SQL > \pager grep 'Gtid\|COMMIT' ;
Pager has been set to 'grep 'Gtid\|COMMIT' ;'.
SQL > show BINLOG EVENTS in 'binlog.000064' ;
| binlog.000064 |     213 | Gtid           |         1 |         298 | SET @@SESSION.GTID_NEXT= '17f6a975-e2b4-11ec-b714-c8cb9e32df8e:7541914' |
| binlog.000064 | 53904723 | Xid            |         1 |    53904754 | COMMIT /* xid=75 */                                                     |
SQL > \pager
Pager has been disabled.
SQL > select format_bytes(53904754-213);
+----------------------------+
| format_bytes(53904754-213) |
+----------------------------+
| 51.41 MiB                  |
+----------------------------+
1 row in set (0.0005 sec)

We can see that this transaction generated 51MB of binlog event.

This method can be complicated, certainly when you need to parse multiple binlog files to find the desired transaction.

Hopefully, Performance_Schema can again make our life easier. Indeed, we can parse the table binary_log_transaction_compression_stats to have information about the size of a transaction. Even if we don’t use binary log compression:

select format_bytes(UNCOMPRESSED_BYTES_COUNTER/TRANSACTION_COUNTER) size,
       format_bytes(COMPRESSED_BYTES_COUNTER/TRANSACTION_COUNTER) compressed,
       TRANSACTION_COUNTER 
  from performance_schema.binary_log_transaction_compression_stats;
+-----------+------------+---------------------+
| size      | compressed | TRANSACTION_COUNTER |
+-----------+------------+---------------------+
| 51.38 MiB | 51.38 MiB  |                   1 |
+-----------+------------+---------------------+

The TRANSACTION_COUNTER column is very important as if it is bigger than 1, the values are an average.

So if you really need to know the exact size of one transaction, you need first to truncate that table before running your DML.

Let’s have a look at this example:

SQL> select format_bytes(UNCOMPRESSED_BYTES_COUNTER/TRANSACTION_COUNTER) size,
       format_bytes(COMPRESSED_BYTES_COUNTER/TRANSACTION_COUNTER) compressed,
       TRANSACTION_COUNTER 
  from performance_schema.binary_log_transaction_compression_stats;
+-----------+------------+---------------------+
| size      | compressed | TRANSACTION_COUNTER |
+-----------+------------+---------------------+
| 17.13 MiB | 17.13 MiB  |                   6 |
+-----------+------------+---------------------+
1 row in set (0.0004 sec)

SQL > truncate table performance_schema.binary_log_transaction_compression_stats;
Query OK, 0 rows affected (0.0018 sec)

SQL > update sbtest1 set k=k+4;
Query OK, 132188 rows affected (1.3213 sec)

Rows matched: 132188  Changed: 132188  Warnings: 0

SQL > select format_bytes(UNCOMPRESSED_BYTES_COUNTER/TRANSACTION_COUNTER) size,
       format_bytes(COMPRESSED_BYTES_COUNTER/TRANSACTION_COUNTER) compressed,
       TRANSACTION_COUNTER 
  from performance_schema.binary_log_transaction_compression_stats;
+-----------+------------+---------------------+
| size      | compressed | TRANSACTION_COUNTER |
+-----------+------------+---------------------+
| 51.38 MiB | 51.38 MiB  |                   1 |
+-----------+------------+---------------------+
1 row in set (0.0017 sec)

We still have the possibility to use a MySQL Shell Plugin that list all the transaction size from a binary log:

 JS > check.showTrxSizeSort()
Transactions in binary log binlog.000064 orderer by size (limit 10):
51 mb - 17f6a975-e2b4-11ec-b714-c8cb9e32df8e:7541926
51 mb - 17f6a975-e2b4-11ec-b714-c8cb9e32df8e:7541925
51 mb - 17f6a975-e2b4-11ec-b714-c8cb9e32df8e:7541921
51 mb - 17f6a975-e2b4-11ec-b714-c8cb9e32df8e:7541916
51 mb - 17f6a975-e2b4-11ec-b714-c8cb9e32df8e:7541915
51 mb - 17f6a975-e2b4-11ec-b714-c8cb9e32df8e:7541918
51 mb - 17f6a975-e2b4-11ec-b714-c8cb9e32df8e:7541917
51 mb - 17f6a975-e2b4-11ec-b714-c8cb9e32df8e:7541914
257 bytes - 17f6a975-e2b4-11ec-b714-c8cb9e32df8e:7541924
257 bytes - 17f6a975-e2b4-11ec-b714-c8cb9e32df8e:7541923

But how can I know the GTID of my transaction ?

MySQL has the possibility to return the GTID to the client if it supports that information returned by the server. MySQL Shell supports that feature !

To enable it, we use session_track_gtids:

SQL > set session_track_gtids='OWN_GTID';
Query OK, 0 rows affected (0.0001 sec)

SQL > update sbtest1 set k=k+1;
Query OK, 132183 rows affected (5.6854 sec)

Rows matched: 132183  Changed: 132183  Warnings: 0
GTIDs: 17f6a975-e2b4-11ec-b714-c8cb9e32df8e:7541914

As you can see, MySQL Shell returned the GTID of the transaction (update using auto_commit).

Size of DQL

But do we have the possibility to know the size of a SELECT ?

To determine the size of a SELECT, we can calculate the bytes sent by the server to the client like this:

SQL > select variable_value 
      from performance_schema.status_by_thread 
       join performance_schema.threads using(thread_id) 
      where processlist_id=CONNECTION_ID() 
        and variable_name='Bytes_sent' into @before;

SQL > select * from sbtest1;

SQL > select format_bytes(variable_value - @before) query_size 
        from performance_schema.status_by_thread 
        join performance_schema.threads using(thread_id) 
       where processlist_id=CONNECTION_ID() 
         and variable_name='Bytes_sent' ;
+------------+
| query_size |
+------------+
| 26.08 MiB  |
+------------+
1 row in set (0.0010 sec)

Summary

As you can see, MySQL Server provides a lot information via Performance_Schema and the binary logs. Parsing that information, you can retrieve the size of DML transaction or DQL.

Enjoy MySQL !

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

3 Comments

  1. Why is the value obtained by this method inconsistent with the value obtained in the slow query log?

    • There is always some overhead, I don’t think the metrics is taken exactly at the same moment, do you notice a lot of difference ? Because it should not of course.

Leave a Reply

Your email address will not be published. Required fields are marked *

As MySQL Community Manager, I am an employee of Oracle and the views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

You can find articles I wrote on Oracle’s blog.