Some times it’s important to know the size of a transaction, especially when you plan to migrate to a HA solution where by default transactions have a limited size to guarantee an optimal behavior of the cluster.
Today we will see the different possibilities to have an idea of the size of transactions.
First we need to split the transaction in two types:
- those generating data (writes, like insert, delete and update, DML)
- those only ready data (select, DQL)
To implement High Availability, only the first category is important.
Size of DML
To know the size of a DML transaction, the only possibility we have is to parse the binary log (or query the binlog event).
We need to check the binlog event from the binlog file and then calculate its size. To illustrate this, let’s try to find the transaction identified by a specific GTID: 17f6a975-e2b4-11ec-b714-c8cb9e32df8e:7541914
SQL > \pager grep 'Gtid\|COMMIT' ;
Pager has been set to 'grep 'Gtid\|COMMIT' ;'.
SQL > show BINLOG EVENTS in 'binlog.000064' ;
| binlog.000064 | 213 | Gtid | 1 | 298 | SET @@SESSION.GTID_NEXT= '17f6a975-e2b4-11ec-b714-c8cb9e32df8e:7541914' |
| binlog.000064 | 53904723 | Xid | 1 | 53904754 | COMMIT /* xid=75 */ |
SQL > \pager
Pager has been disabled.
SQL > select format_bytes(53904754-213);
+----------------------------+
| format_bytes(53904754-213) |
+----------------------------+
| 51.41 MiB |
+----------------------------+
1 row in set (0.0005 sec)
We can see that this transaction generated 51MB of binlog event.
This method can be complicated, certainly when you need to parse multiple binlog files to find the desired transaction.
Hopefully, Performance_Schema
can again make our life easier. Indeed, we can parse the table binary_log_transaction_compression_stats
to have information about the size of a transaction. Even if we don’t use binary log compression:
select format_bytes(UNCOMPRESSED_BYTES_COUNTER/TRANSACTION_COUNTER) size,
format_bytes(COMPRESSED_BYTES_COUNTER/TRANSACTION_COUNTER) compressed,
TRANSACTION_COUNTER
from performance_schema.binary_log_transaction_compression_stats;
+-----------+------------+---------------------+
| size | compressed | TRANSACTION_COUNTER |
+-----------+------------+---------------------+
| 51.38 MiB | 51.38 MiB | 1 |
+-----------+------------+---------------------+
The TRANSACTION_COUNTER
column is very important as if it is bigger than 1, the values are an average.
So if you really need to know the exact size of one transaction, you need first to truncate that table before running your DML.
Let’s have a look at this example:
SQL> select format_bytes(UNCOMPRESSED_BYTES_COUNTER/TRANSACTION_COUNTER) size,
format_bytes(COMPRESSED_BYTES_COUNTER/TRANSACTION_COUNTER) compressed,
TRANSACTION_COUNTER
from performance_schema.binary_log_transaction_compression_stats;
+-----------+------------+---------------------+
| size | compressed | TRANSACTION_COUNTER |
+-----------+------------+---------------------+
| 17.13 MiB | 17.13 MiB | 6 |
+-----------+------------+---------------------+
1 row in set (0.0004 sec)
SQL > truncate table performance_schema.binary_log_transaction_compression_stats;
Query OK, 0 rows affected (0.0018 sec)
SQL > update sbtest1 set k=k+4;
Query OK, 132188 rows affected (1.3213 sec)
Rows matched: 132188 Changed: 132188 Warnings: 0
SQL > select format_bytes(UNCOMPRESSED_BYTES_COUNTER/TRANSACTION_COUNTER) size,
format_bytes(COMPRESSED_BYTES_COUNTER/TRANSACTION_COUNTER) compressed,
TRANSACTION_COUNTER
from performance_schema.binary_log_transaction_compression_stats;
+-----------+------------+---------------------+
| size | compressed | TRANSACTION_COUNTER |
+-----------+------------+---------------------+
| 51.38 MiB | 51.38 MiB | 1 |
+-----------+------------+---------------------+
1 row in set (0.0017 sec)
We still have the possibility to use a MySQL Shell Plugin that list all the transaction size from a binary log:
JS > check.showTrxSizeSort()
Transactions in binary log binlog.000064 orderer by size (limit 10):
51 mb - 17f6a975-e2b4-11ec-b714-c8cb9e32df8e:7541926
51 mb - 17f6a975-e2b4-11ec-b714-c8cb9e32df8e:7541925
51 mb - 17f6a975-e2b4-11ec-b714-c8cb9e32df8e:7541921
51 mb - 17f6a975-e2b4-11ec-b714-c8cb9e32df8e:7541916
51 mb - 17f6a975-e2b4-11ec-b714-c8cb9e32df8e:7541915
51 mb - 17f6a975-e2b4-11ec-b714-c8cb9e32df8e:7541918
51 mb - 17f6a975-e2b4-11ec-b714-c8cb9e32df8e:7541917
51 mb - 17f6a975-e2b4-11ec-b714-c8cb9e32df8e:7541914
257 bytes - 17f6a975-e2b4-11ec-b714-c8cb9e32df8e:7541924
257 bytes - 17f6a975-e2b4-11ec-b714-c8cb9e32df8e:7541923
But how can I know the GTID of my transaction ?
MySQL has the possibility to return the GTID to the client if it supports that information returned by the server. MySQL Shell supports that feature !
To enable it, we use session_track_gtids
:
SQL > set session_track_gtids='OWN_GTID';
Query OK, 0 rows affected (0.0001 sec)
SQL > update sbtest1 set k=k+1;
Query OK, 132183 rows affected (5.6854 sec)
Rows matched: 132183 Changed: 132183 Warnings: 0
GTIDs: 17f6a975-e2b4-11ec-b714-c8cb9e32df8e:7541914
As you can see, MySQL Shell returned the GTID of the transaction (update using auto_commit).
Size of DQL
But do we have the possibility to know the size of a SELECT ?
To determine the size of a SELECT, we can calculate the bytes sent by the server to the client like this:
SQL > select variable_value
from performance_schema.status_by_thread
join performance_schema.threads using(thread_id)
where processlist_id=CONNECTION_ID()
and variable_name='Bytes_sent' into @before;
SQL > select * from sbtest1;
SQL > select format_bytes(variable_value - @before) query_size
from performance_schema.status_by_thread
join performance_schema.threads using(thread_id)
where processlist_id=CONNECTION_ID()
and variable_name='Bytes_sent' ;
+------------+
| query_size |
+------------+
| 26.08 MiB |
+------------+
1 row in set (0.0010 sec)
Summary
As you can see, MySQL Server provides a lot information via Performance_Schema
and the binary logs. Parsing that information, you can retrieve the size of DML transaction or DQL.
Enjoy MySQL !
This is very well written.
Why is the value obtained by this method inconsistent with the value obtained in the slow query log?
There is always some overhead, I don’t think the metrics is taken exactly at the same moment, do you notice a lot of difference ? Because it should not of course.