MySQL: size of your tables – tricks and tips

Many of you already know how to retrieve the size of your dataset, schemas and tables in MySQL.

To summarize, below are the different queries you can run:

Dataset Size

I the past I was using something like this :

But now with sys schema being installed by default, I encourage you to use some of the formatting functions provided with it. The query to calculate the dataset is now:

SELECT sys.format_bytes(sum(data_length)) DATA,
       sys.format_bytes(sum(index_length)) INDEXES,
       sys.format_bytes(sum(data_length + index_length)) 'TOTAL SIZE'
FROM information_schema.TABLES ORDER  BY data_length + index_length;

Let’s see an example:

Engines Used and Size

For a list of all engines used:

SELECT count(*) as '# TABLES', sys.format_bytes(sum(data_length)) DATA, 
       sys.format_bytes(sum(index_length)) INDEXES,   
       sys.format_bytes(sum(data_length + index_length)) 'TOTAL SIZE', 
       engine `ENGINE` FROM information_schema.TABLES 
WHERE TABLE_SCHEMA NOT IN ('sys','mysql', 'information_schema', 
                'performance_schema', 'mysql_innodb_cluster_metadata')
GROUP BY engine;

Let’s see an example on the same database as above:

and on 5.7 with one MyISAM table (eeek):

Schemas Size

Now let’s find out which schemas are the larges:

SELECT TABLE_SCHEMA,
sys.format_bytes(sum(table_rows)) `ROWS`,
sys.format_bytes(sum(data_length)) DATA,
sys.format_bytes(sum(index_length)) IDX,
sys.format_bytes(sum(data_length) + sum(index_length)) 'TOTAL SIZE',
round(sum(index_length) / sum(data_length),2) IDXFRAC
FROM
information_schema.TABLES
GROUP By table_schema ORDER BY sum(DATA_length) DESC;

Top 10 Tables by Size

And finally a query to get the list of the 10 largest tables:

SELECT CONCAT(table_schema, '.', table_name) as 'TABLE', ENGINE,
CONCAT(ROUND(table_rows / 1000000, 2), 'M')  `ROWS`,
sys.format_bytes(data_length) DATA,
sys.format_bytes(index_length) IDX,
sys.format_bytes(data_length + index_length) 'TOTAL SIZE',
round(index_length / data_length,2) IDXFRAC
FROM   information_schema.TABLES
ORDER  BY data_length + index_length DESC LIMIT  10;


You can modify the query to retrieve the size of any given table of course.

That was the theory and it’s always good to see those queries time to time.

But…

But can we trust these results ? In fact, sometimes, this can be very tricky, let’s check this example:

mysql> SELECT COUNT(*) AS TotalTableCount ,table_schema,
CONCAT(ROUND(SUM(table_rows)/1000000,2),'M') AS TotalRowCount,
CONCAT(ROUND(SUM(data_length)/(1024*1024*1024),2),'G') AS TotalTableSize,
CONCAT(ROUND(SUM(index_length)/(1024*1024*1024),2),'G') AS TotalTableIndex,
CONCAT(ROUND(SUM(data_length+index_length)/(1024*1024*1024),2),'G') TotalSize 
FROM information_schema.TABLES GROUP BY table_schema 
ORDER BY SUM(data_length+index_length)  DESC LIMIT 1;
+-----------------+--------------+---------------+----------------+-----------------+-----------+
| TotalTableCount | TABLE_SCHEMA | TotalRowCount | TotalTableSize | TotalTableIndex | TotalSize |
+-----------------+--------------+---------------+----------------+-----------------+-----------+
|              15 | wp_lefred    | 0.02M         | 5.41G          | 0.00G           | 5.41G     |
+-----------------+--------------+---------------+----------------+-----------------+-----------+

This seems to be a very large table ! Let’s verify this:

mysql> select * from information_schema.TABLES 
       where table_schema='wp_lefred' and table_name = 'wp_options'\G
*************************** 1. row ***************************
  TABLE_CATALOG: def
   TABLE_SCHEMA: wp_lefred
     TABLE_NAME: wp_options
     TABLE_TYPE: BASE TABLE
         ENGINE: InnoDB
        VERSION: 10
     ROW_FORMAT: Dynamic
     TABLE_ROWS: 3398
 AVG_ROW_LENGTH: 1701997
    DATA_LENGTH: 5783388160
MAX_DATA_LENGTH: 0
   INDEX_LENGTH: 442368
      DATA_FREE: 5242880
 AUTO_INCREMENT: 1763952
    CREATE_TIME: 2018-09-18 00:29:16
    UPDATE_TIME: 2018-09-17 23:44:40
     CHECK_TIME: NULL
TABLE_COLLATION: utf8mb4_unicode_ci
       CHECKSUM: NULL
 CREATE_OPTIONS:
  TABLE_COMMENT:
1 row in set (0.01 sec)

In fact we can see that the average row length is pretty big.

So let’s verify on the disk:

[root@vps21575 database]# ls -lh wp_lefred/wp_options.ibd
-rw-r----- 1 mysql mysql 11M Sep 18 00:31 wp_lefred/wp_options.ibd

11M ?! The table is 11M but Information_Schema thinks it’s 5.41G ! Quite a big difference !

In fact this is because InnoDB creates these statistics from a very small amount of pages by default.
So if you have a lot of records with a very variable size like it’s the case with this WordPress table, it could be safer to
increase the amount of pages used to generate those statistics:

mysql> set global innodb_stats_transient_sample_pages= 100;
Query OK, 0 rows affected (0.00 sec)

mysql> analyze table wp_lefred.wp_options;
+----------------------+---------+----------+----------+
| Table                | Op      | Msg_type | Msg_text |
+----------------------+---------+----------+----------+
| wp_lefred.wp_options | analyze | status   | OK       |
+----------------------+---------+----------+----------+
1 row in set (0.05 sec)

Let's check the table statistics now:
*************************** 1. row ***************************
  TABLE_CATALOG: def
   TABLE_SCHEMA: wp_lefred
     TABLE_NAME: wp_options
     TABLE_TYPE: BASE TABLE
         ENGINE: InnoDB
        VERSION: 10
     ROW_FORMAT: Dynamic
     TABLE_ROWS: 3075
 AVG_ROW_LENGTH: 1198
    DATA_LENGTH: 3686400
MAX_DATA_LENGTH: 0
   INDEX_LENGTH: 360448
      DATA_FREE: 4194304
 AUTO_INCREMENT: 1764098
    CREATE_TIME: 2018-09-18 00:34:07
    UPDATE_TIME: 2018-09-18 00:32:55
     CHECK_TIME: NULL
TABLE_COLLATION: utf8mb4_unicode_ci
       CHECKSUM: NULL
 CREATE_OPTIONS:
  TABLE_COMMENT:
1 row in set (0.01 sec)

We can see that the average row length is much smaller now (and could be smaller with an even bigger sample).

Let’s verify:

mysql> SELECT CONCAT(table_schema, '.', table_name) as 'TABLE', ENGINE,
    -> CONCAT(ROUND(table_rows / 1000000, 2), 'M')  `ROWS`,
    -> sys.format_bytes(data_length) DATA,
    -> sys.format_bytes(index_length) IDX,
    -> sys.format_bytes(data_length + index_length) 'TOTAL SIZE',
    -> round(index_length / data_length,2) IDXFRAC
    -> FROM   information_schema.TABLES 
    -> WHERE table_schema='wp_lefred' and table_name = 'wp_options';
+----------------------+--------+-------+----------+------------+------------+---------+
| TABLE                | ENGINE | ROWS  | DATA     | IDX        | TOTAL SIZE | IDXFRAC |
+----------------------+--------+-------+----------+------------+------------+---------+
| wp_lefred.wp_options | InnoDB | 0.00M | 3.52 MiB | 352.00 KiB | 3.86 MiB   |    0.10 |
+----------------------+--------+-------+----------+------------+------------+---------+

In fact, this table uses a long text as column and it can be filled with many things or almost nothing.

We can verify that for this particular table we have some very large values:

mysql> select CHAR_LENGTH(option_value), count(*) 
       from wp_lefred.wp_options group by 1 order by 1 desc limit 10;
+---------------------------+----------+
| CHAR_LENGTH(option_value) | count(*) |
+---------------------------+----------+
|                    245613 |        1 |
|                    243545 |        2 |
|                    153482 |        1 |
|                    104060 |        1 |
|                     92871 |        1 |
|                     70468 |        1 |
|                     60890 |        1 |
|                     41116 |        1 |
|                     33619 |        5 |
|                     33015 |        2 |
+---------------------------+----------+

Even if the majority of the records are much smaller:

mysql> select CHAR_LENGTH(option_value), count(*) 
       from wp_lefred.wp_options group by 1 order by 2 desc limit 10;
+---------------------------+----------+
| CHAR_LENGTH(option_value) | count(*) |
+---------------------------+----------+
|                        10 |     1485 |
|                        45 |      547 |
|                        81 |      170 |
|                         6 |      167 |
|                         1 |       84 |
|                        83 |       75 |
|                        82 |       65 |
|                        84 |       60 |
|                        80 |       44 |
|                        30 |       42 |
+---------------------------+----------+

Conclusion

So in general, using Information_Schema provides a good overview of the tables size, but please always verify the size on disk to see if it matches because when a table contains records that can have a large variable size, those statistics are often incorrect because the InnoDB page sample used is too small.

But don’t forget that on disk, table spaces can also be fragmented !

 

 

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

15 Comments

  1. The reason the table appeared very large at first is because your original query is missing the where condition “WHERE table_schema=’wp_lefred’ and table_name = ‘wp_options'”

    • Thx for your comment Brian, in fact that large table is mostly the only large one in that schema. I limited to 1 because other schemas are not interesting. (you can see it from the top 10 largest table) But good catch if you want to reproduce this on your own database.
      Cheers,

    • Very good article. Any idea on how the statistics are created or calculated for tokuDB engine (on Percona or MariaDB)? Is there an equivalent of the “innodb_stats_transient_sample_pages” variable. I can’t find any information on this and have problems with toku on incorrect statistics in the “INFORMATION_SCHEMA.STATISTICS” and “INFORMATION_SCHEMA.Tables” tables.

  2. So, what is the recommended value or is there any logic to calculate the value for the innodb_stats_transient_sample_pages parameter?

    • Hi Stefka,

      In general, the default value is correct. It’s the role of the DBA to verify the query execution plan and compare also the execution time of equivalent queries and if the if you see inconsistent result, then you might increase this a bit. It all depends.

      Regards,

      • This is so tricky.
        I tried with 100 and the huge memory usage was happening but only once in few days, then I tried with 300 and it looks like everything is normal.

        Are there any cons of using a large value like 300?
        Also, then a database table shows huge memory usage, does that mean the MySQL server will use that high memory (if there is enough free memory) or it means it’s just a visual issue?

        • Hi Stefka,

          It all depends of course. If you see more memory usage with 100 than 300, I’m sure the memory usage you see is not related to that but more about the workload.

          How do you measure that memory usage ?

          I also published some articles on memory and MySQL, you might find some useful info in them too.

          Cheers,

          • I`m checking the database size with PHPMyAdmin as well as with custom query:

            mysql -u root -Bse “”SELECT table_schema ‘table name’, sum( data_length + index_length ) ‘Data Base Size in MB’ FROM information_schema.TABLES WHERE table_schema='{0}’ GROUP BY table_schema;

            Notes:
            1. the performance_schema is turned off.
            2. An database table growled up to 35GB in just 5 days, constantly every hour…

            Do you see anything wrong with my setup?

          • Hi Stevo,

            No idea of course… how could I know ?

            What type of data is it ? Is the tablespace on the filesystem also growing that much ? If so, it just means you are adding much data.

            Cheers,

Leave a Reply

Your email address will not be published. Required fields are marked *

As MySQL Community Manager, I am an employee of Oracle and the views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

You can find articles I wrote on Oracle’s blog.