MySQL & InnoDB Disk Space

on

Yesterday, Bhuvanesh published an article about how to verify the difference between allocated diskspace for a tablespace and the the data in it.

I commented with an old post explaining how to get some similar info only using SQL in case you don’t have filesystem access.

And finally, my friend Bill Karwin, commented how this info is not always accurate. Which, of course, I agree with.

This is why, I checked what info we have available and try to find some better answer.

So first, please remind that information_schema statistics are cached by default:

mysql> show global variables like 'information_schema_stats_expiry';
+---------------------------------+-------+
| Variable_name                   | Value |
+---------------------------------+-------+
| information_schema_stats_expiry | 86400 |
+---------------------------------+-------+

And that for better results, it’s always advised to run ANALYZE TABLE ...

For the following examples, I set information_schema_stats_expiry to 0.

The New Query

The new query takes advantage of the column FILE_SIZE in Information_Schema.INNODB_TABPLESPACES:

> SELECT NAME, TABLE_ROWS, format_bytes(data_length) DATA_SIZE,
       format_bytes(index_length) INDEX_SIZE,
       format_bytes(data_length+index_length) TOTAL_SIZE,
       format_bytes(data_free) DATA_FREE,
       format_bytes(FILE_SIZE) FILE_SIZE,
       format_bytes((FILE_SIZE/10 - (data_length/10 + 
                           index_length/10))*10) WASTED_SIZE  
FROM information_schema.TABLES as t 
JOIN information_schema.INNODB_TABLESPACES as it 
  ON it.name = concat(table_schema,"/",table_name) 
ORDER BY (data_length + index_length) desc limit 5;
+-------------------+------------+------------+------------+------------+------------+------------+-------------+
| NAME              | TABLE_ROWS | DATA_SIZE  | INDEX_SIZE | TOTAL_SIZE | DATA_FREE  | FILE_SIZE  | WASTED_SIZE |
+-------------------+------------+------------+------------+------------+------------+------------+-------------+
| big/testing       |   10241204 | 647.98 MiB |    0 bytes | 647.98 MiB | 2.00 MiB   | 660.00 MiB | 12.02 MiB   |
| docstore/all_recs |      24353 | 17.56 MiB  |    0 bytes | 17.56 MiB  |    0 bytes | 25.00 MiB  | 7.44 MiB    |
| big/pktest        |     111649 | 11.55 MiB  |    0 bytes | 11.55 MiB  |    0 bytes | 19.00 MiB  | 7.45 MiB    |
| big/pktest_seq    |      81880 | 6.52 MiB   |    0 bytes | 6.52 MiB   |    0 bytes | 14.00 MiB  | 7.48 MiB    |
| library/books     |         39 | 384.00 KiB | 16.00 KiB  | 400.00 KiB |    0 bytes | 464.00 KiB | 64.00 KiB   |
+-------------------+------------+------------+------------+------------+------------+------------+-------------+

We can see that MySQL estimates that the datasize for my biggest table is 648MB and that 660MB are used on the disk. The last info is very easy to verify:

$ sudo ls -lh /var/lib/mysql/big/testing.ibd
-rw-r----- 1 mysql mysql 660M Oct 22 00:19 /var/lib/mysql/big/testing.ibd

As I recommended it, it’s always good to do an ANALYZE TABLE:

> analyze table big.testing;
+-------------+---------+----------+----------+
| Table       | Op      | Msg_type | Msg_text |
+-------------+---------+----------+----------+
| big.testing | analyze | status   | OK       |
+-------------+---------+----------+----------+

And we can run again our query:

+-------------------+------------+------------+------------+------------+------------+------------+-------------+
| NAME              | TABLE_ROWS | DATA_SIZE  | INDEX_SIZE | TOTAL_SIZE | DATA_FREE  | FILE_SIZE  | WASTED_SIZE |
+-------------------+------------+------------+------------+------------+------------+------------+-------------+
| big/testing       |    9045529 | 582.42 MiB |    0 bytes | 582.42 MiB | 67.00 MiB  | 660.00 MiB | 77.58 MiB   |
| docstore/all_recs |      24353 | 17.56 MiB  |    0 bytes | 17.56 MiB  |    0 bytes | 25.00 MiB  | 7.44 MiB    |
| big/pktest        |     111649 | 11.55 MiB  |    0 bytes | 11.55 MiB  |    0 bytes | 19.00 MiB  | 7.45 MiB    |
| big/pktest_seq    |      81880 | 6.52 MiB   |    0 bytes | 6.52 MiB   |    0 bytes | 14.00 MiB  | 7.48 MiB    |
| library/books     |         39 | 384.00 KiB | 16.00 KiB  | 400.00 KiB |    0 bytes | 464.00 KiB | 64.00 KiB   |
+-------------------+------------+------------+------------+------------+------------+------------+-------------+

We can see now that the statistics have been updated and that according to my previous post, we are loosing 67MB but with the new one comparing to disk, it seems we are wasting 77.5MB on disk.

Let’s see how to table looks like using innodb_ruby:

Recovering the disk space

Let’s see if we can recover some disk space:

> OPTIMIZE table big.testing;
+-------------+----------+----------+-------------------------------------------------------------------+
| Table       | Op       | Msg_type | Msg_text                                                          | 
+-------------+----------+----------+-------------------------------------------------------------------+
| big.testing | optimize | note     | Table does not support optimize, doing recreate + analyze instead |
| big.testing | optimize | status   | OK                                                                |
+-------------+----------+----------+-------------------------------------------------------------------+
 2 rows in set (1 min 4.8855 sec)

And we can check again:

+-------------------+------------+------------+------------+------------+------------+------------+-------------+
| NAME              | TABLE_ROWS | DATA_SIZE  | INDEX_SIZE | TOTAL_SIZE | DATA_FREE  | FILE_SIZE  | WASTED_SIZE |
+-------------------+------------+------------+------------+------------+------------+------------+-------------+
| big/testing       |    9045529 | 582.42 MiB |    0 bytes | 582.42 MiB | 67.00 MiB  | 584.00 MiB | 1.58 MiB    |
| docstore/all_recs |      24353 | 17.56 MiB  |    0 bytes | 17.56 MiB  |    0 bytes | 25.00 MiB  | 7.44 MiB    |
| big/pktest        |     111649 | 11.55 MiB  |    0 bytes | 11.55 MiB  |    0 bytes | 19.00 MiB  | 7.45 MiB    |
| big/pktest_seq    |      81880 | 6.52 MiB   |    0 bytes | 6.52 MiB   |    0 bytes | 14.00 MiB  | 7.48 MiB    |
| library/books     |         39 | 384.00 KiB | 16.00 KiB  | 400.00 KiB |    0 bytes | 464.00 KiB | 64.00 KiB   |
+-------------------+------------+------------+------------+------------+------------+------------+-------------+

We can see that now we have regain some disk space !

So even if this is not always 100% accurate, this method provides you already a very close to reality view of how your InnoDB Tablespaces are using the disk and when you will benefit from rebuilding your tablespace.

MySQL Shell Plugin

I’ve updated the innodb/fragmented MySQL Shell Plugin on my github with a new method:

14 thoughts on “MySQL & InnoDB Disk Space

  1. It seems format_bytes function exist in MySQL 5.7 and above only and not in MySQL 5.6.So be aware that the above will not work on MySQL 5.6

    1. What do you mean by not compatible ? You can use it in WAN too if you agree to pay the eventual latency price. There are already available optimizations for WAN environment. The only missing piece actually is the concept of segments.

  2. Haven’t try the latest InnoDB Cluster for a while will try it when have enough time to see how it’s the delay comparing it with PXC 😉

  3. information_schema_stats_expiry what happens if not change and perform optimize to reduce the size .
    Indexes and Foreign constraints and depends any impact by perform this operation.

  4. The number of table rows may not be accurate for large tables unless you set innodb_stats_persistent_sample_pages to a higher value than the default 20, but of course it would delay ANALYZE TABLE, because of increased I/O.
    I ran the query for a schema where OPTIMIZE TABLE is regularly run, because much data is inserted and then some deleted, but for some of the smaller tables WASTED_SIZE is negative. The *.ibd file size in the file system corresponds to the value in FILE_SIZE column. One of the table has row format COMPRESSED and the other two are DYNAMIC. I guess it’s because DATA_LENGTH and INDEX_LENGTH are approximated?

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

recent
categories