Yesterday, Bhuvanesh published an article about how to verify the difference between allocated diskspace for a tablespace and the the data in it.
I commented with an old post explaining how to get some similar info only using SQL in case you don’t have filesystem access.
And finally, my friend Bill Karwin, commented how this info is not always accurate. Which, of course, I agree with.
This is why, I checked what info we have available and try to find some better answer.
So first, please remind that information_schema
statistics are cached by default:
mysql> show global variables like 'information_schema_stats_expiry';
+---------------------------------+-------+
| Variable_name | Value |
+---------------------------------+-------+
| information_schema_stats_expiry | 86400 |
+---------------------------------+-------+
And that for better results, it’s always advised to run ANALYZE TABLE ...
For the following examples, I set information_schema_stats_expiry
to 0.
The New Query
The new query takes advantage of the column FILE_SIZE
in Information_Schema.INNODB_TABPLESPACES
:
> SELECT NAME, TABLE_ROWS, format_bytes(data_length) DATA_SIZE,
format_bytes(index_length) INDEX_SIZE,
format_bytes(data_length+index_length) TOTAL_SIZE,
format_bytes(data_free) DATA_FREE,
format_bytes(FILE_SIZE) FILE_SIZE,
format_bytes((FILE_SIZE/10 - (data_length/10 +
index_length/10))*10) WASTED_SIZE
FROM information_schema.TABLES as t
JOIN information_schema.INNODB_TABLESPACES as it
ON it.name = concat(table_schema,"/",table_name)
ORDER BY (data_length + index_length) desc limit 5;
+-------------------+------------+------------+------------+------------+------------+------------+-------------+ | NAME | TABLE_ROWS | DATA_SIZE | INDEX_SIZE | TOTAL_SIZE | DATA_FREE | FILE_SIZE | WASTED_SIZE | +-------------------+------------+------------+------------+------------+------------+------------+-------------+ | big/testing | 10241204 | 647.98 MiB | 0 bytes | 647.98 MiB | 2.00 MiB | 660.00 MiB | 12.02 MiB | | docstore/all_recs | 24353 | 17.56 MiB | 0 bytes | 17.56 MiB | 0 bytes | 25.00 MiB | 7.44 MiB | | big/pktest | 111649 | 11.55 MiB | 0 bytes | 11.55 MiB | 0 bytes | 19.00 MiB | 7.45 MiB | | big/pktest_seq | 81880 | 6.52 MiB | 0 bytes | 6.52 MiB | 0 bytes | 14.00 MiB | 7.48 MiB | | library/books | 39 | 384.00 KiB | 16.00 KiB | 400.00 KiB | 0 bytes | 464.00 KiB | 64.00 KiB | +-------------------+------------+------------+------------+------------+------------+------------+-------------+
We can see that MySQL estimates that the datasize for my biggest table is 648MB and that 660MB are used on the disk. The last info is very easy to verify:
$ sudo ls -lh /var/lib/mysql/big/testing.ibd -rw-r----- 1 mysql mysql 660M Oct 22 00:19 /var/lib/mysql/big/testing.ibd
As I recommended it, it’s always good to do an ANALYZE TABLE
:
> analyze table big.testing;
+-------------+---------+----------+----------+
| Table | Op | Msg_type | Msg_text |
+-------------+---------+----------+----------+
| big.testing | analyze | status | OK |
+-------------+---------+----------+----------+
And we can run again our query:
+-------------------+------------+------------+------------+------------+------------+------------+-------------+ | NAME | TABLE_ROWS | DATA_SIZE | INDEX_SIZE | TOTAL_SIZE | DATA_FREE | FILE_SIZE | WASTED_SIZE | +-------------------+------------+------------+------------+------------+------------+------------+-------------+ | big/testing | 9045529 | 582.42 MiB | 0 bytes | 582.42 MiB | 67.00 MiB | 660.00 MiB | 77.58 MiB | | docstore/all_recs | 24353 | 17.56 MiB | 0 bytes | 17.56 MiB | 0 bytes | 25.00 MiB | 7.44 MiB | | big/pktest | 111649 | 11.55 MiB | 0 bytes | 11.55 MiB | 0 bytes | 19.00 MiB | 7.45 MiB | | big/pktest_seq | 81880 | 6.52 MiB | 0 bytes | 6.52 MiB | 0 bytes | 14.00 MiB | 7.48 MiB | | library/books | 39 | 384.00 KiB | 16.00 KiB | 400.00 KiB | 0 bytes | 464.00 KiB | 64.00 KiB | +-------------------+------------+------------+------------+------------+------------+------------+-------------+
We can see now that the statistics have been updated and that according to my previous post, we are loosing 67MB but with the new one comparing to disk, it seems we are wasting 77.5MB on disk.
Let’s see how to table looks like using innodb_ruby:
Recovering the disk space
Let’s see if we can recover some disk space:
> OPTIMIZE table big.testing; +-------------+----------+----------+-------------------------------------------------------------------+ | Table | Op | Msg_type | Msg_text | +-------------+----------+----------+-------------------------------------------------------------------+ | big.testing | optimize | note | Table does not support optimize, doing recreate + analyze instead | | big.testing | optimize | status | OK | +-------------+----------+----------+-------------------------------------------------------------------+ 2 rows in set (1 min 4.8855 sec)
And we can check again:
+-------------------+------------+------------+------------+------------+------------+------------+-------------+ | NAME | TABLE_ROWS | DATA_SIZE | INDEX_SIZE | TOTAL_SIZE | DATA_FREE | FILE_SIZE | WASTED_SIZE | +-------------------+------------+------------+------------+------------+------------+------------+-------------+ | big/testing | 9045529 | 582.42 MiB | 0 bytes | 582.42 MiB | 67.00 MiB | 584.00 MiB | 1.58 MiB | | docstore/all_recs | 24353 | 17.56 MiB | 0 bytes | 17.56 MiB | 0 bytes | 25.00 MiB | 7.44 MiB | | big/pktest | 111649 | 11.55 MiB | 0 bytes | 11.55 MiB | 0 bytes | 19.00 MiB | 7.45 MiB | | big/pktest_seq | 81880 | 6.52 MiB | 0 bytes | 6.52 MiB | 0 bytes | 14.00 MiB | 7.48 MiB | | library/books | 39 | 384.00 KiB | 16.00 KiB | 400.00 KiB | 0 bytes | 464.00 KiB | 64.00 KiB | +-------------------+------------+------------+------------+------------+------------+------------+-------------+
We can see that now we have regain some disk space !
So even if this is not always 100% accurate, this method provides you already a very close to reality view of how your InnoDB Tablespaces are using the disk and when you will benefit from rebuilding your tablespace.
MySQL Shell Plugin
I’ve updated the innodb/fragmented
MySQL Shell Plugin on my github with a new method:
It seems format_bytes function exist in MySQL 5.7 and above only and not in MySQL 5.6.So be aware that the above will not work on MySQL 5.6
Sure, it’s really time to upgrade! 5.6 is soon end of life
We we are using PXC 5.6 so still active
https://www.percona.com/services/policies/percona-software-platform-lifecycle
Hehe, you should try and enjoy InnoDB Cluster 8.0 π
We are using PXC in wan environment which innodb cluatwr does not support π
AFAIK InnoDB Cluster is not compatible with WAN which is our case π We have 3 DC
What do you mean by not compatible ? You can use it in WAN too if you agree to pay the eventual latency price. There are already available optimizations for WAN environment. The only missing piece actually is the concept of segments.
Well indeed the latency is the issue here π
Then it’s exactly the same problem with PXC…
Haven’t try the latest InnoDB Cluster for a while will try it when have enough time to see how it’s the delay comparing it with PXC π
information_schema_stats_expiry what happens if not change and perform optimize to reduce the size .
Indexes and Foreign constraints and depends any impact by perform this operation.
Hi Vijay,
Thanks for your comment. You can find all the details related to `information_schema_stats_expiry` in the manual: https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_information_schema_stats_expiry
As you can see, it doesn’t affect everything and it can be updated any time running ANALYZE TABLE.
Regards,
The number of table rows may not be accurate for large tables unless you set innodb_stats_persistent_sample_pages to a higher value than the default 20, but of course it would delay ANALYZE TABLE, because of increased I/O.
I ran the query for a schema where OPTIMIZE TABLE is regularly run, because much data is inserted and then some deleted, but for some of the smaller tables WASTED_SIZE is negative. The *.ibd file size in the file system corresponds to the value in FILE_SIZE column. One of the table has row format COMPRESSED and the other two are DYNAMIC. I guess it’s because DATA_LENGTH and INDEX_LENGTH are approximated?
Yes those are estimation, run analyze table and it should be better
This is great query for simple physical schema.
If I understand correctly it is not going to work if you have partitioned tables, use general tablespaces or innodb_file_per_table=0
It would be great for actual disk use information be easier accessible π
Why divide by 10 and multiply by 10? Is this to make the integer math work out correctly? Is this more efficient than CASTing to float or decimal?
I modified the ORDER BY to handle ties. π
( T.data_length + T.index_length ) DESC,
IT.file_size DESC,
CONCAT( T.table_schema, "/", T.table_name ) ASC
Thanks, as we move to 8.0 this will be greatly helpful.
Eric