MySQL Invisible Column: part II

This article is the second part of the series related to MySQL Invisible Column started here.

This post covers why Invisible Column is important for InnoDB Storage Engine.

To start, let me explain briefly how InnoDB deals with Primary Key and why an good primary key is important. And finally, why having a Primary Key is also important.

How does InnoDB Stores Data?

InnoDB stores data in table spaces. The records are stored and sorted using the clustered index (the primary key): they are called index-organized tables.

All secondary indexes also contain the primary key as the right-most column in the index (even if this is not exposed). That means when a secondary index is used to retrieve a record, two indexes are used: first the secondary one pointing to the primary key that will be used to finally retrieve the record.

The primary key impact the ratio between random and sequential I/O and the size of the secondary indexes.

Random or Sequential Primary Key?

As written above, the data is stored on the tablespace following the clustered index. Meaning that if you don’t use a sequential index, when performing inserts, InnoDB will have to heavily rebalance all the pages of the tablespace.

If we use InnoDB Ruby to illustrate this process, the picture below shows how a tablespace is updated when inserting records using a random string as Primary Key:

every time there is an insert almost all pages are touched

And now the same insert operations when using an auto_increment integer as Primary Key:

with auto_increment PK only some first pages and last pages are touched

Let’s try to explain this with a high level example:

Let’s imagine one InnoDB Page can store 4 records (disclaimer: this is just a fiction for the example), and we have inserted some records using a random Primary Key:

Now we need to insert a new record and the Primary Key is AA !

All pages were modified to “rebalance” the clustered index, in case of a sequential PK, only the last page would have been touched. Imagine the extra work when thousands of inserts are happening.

This means that choosing a good primary key is important. Two things to keep in mind:

  1. the primary key must be sequential
  2. the primary key must be short

What about UUID?

I always recommend to use auto_increment integers (or bigint…) as primary key but don’t forget to monitor them ! (see this post)

But I also understand that more and more developers prefer to use UUIDs.

If you plan to use UUIDs, you should read this article about UUID support in MySQL 8.0 that recommends to store UUIDs using binary(16).

Like this:

CREATE TABLE t (id binary(16) PRIMARY KEY);
 
INSERT INTO t VALUES(UUID_TO_BIN(UUID()));

However, I don’t share at 100% the same opinion… why ?

Because the use of uuid_to_bin() might change the sequential behavior of the UUID implementation of MySQL (read the Extra section for more info).

But if you need UUIDs, then you also need to pay the price of large indexes, so please don’t waste storage and ram with unnecessary secondary indexes 😉
select * from sys.schema_unused_indexes where object_schema not in ('performance_schema', 'mysql');

And without any Primary Key ?

For InnoDB tables, when no primary key is defined, the first unique not null key is used. And if none is available, InnoDB will create an hidden primary key (6 bytes).

The problem with such key is that you don’t have any control on it
and worse, this value is global to all tables without primary keys and can be a contention problem if you perform multiple simultaneous writes on such tables (dict_sys->mutex).

Invisible Column at the rescue

With the new invisible column, we have now an option to add an optimal primary key to a table without Primary Key if the application doesn’t allow a new column.

The first step is to detect such table using the query of Roland Bouman (already used in this post):

SELECT tables.table_schema , tables.table_name , tables.engine  
FROM information_schema.tables  LEFT JOIN (     
   SELECT table_schema , table_name
   FROM information_schema.statistics
   GROUP BY table_schema, table_name, index_name 
   HAVING SUM( 
     case when non_unique = 0 and nullable != 'YES' then 1 else 0 end ) = count(*) ) puks   
   ON tables.table_schema = puks.table_schema 
  AND tables.table_name = puks.table_name   
  WHERE puks.table_name IS null 
  AND tables.table_type = 'BASE TABLE' 
  AND Engine="InnoDB";
+--------------+--------------+--------+
| TABLE_SCHEMA | TABLE_NAME   | ENGINE |
+--------------+--------------+--------+
| test         | table2       | InnoDB |
+--------------+--------------+--------+

You can also use MySQL Shell with the check plugin: https://github.com/lefred/mysqlshell-plugins/wiki/check#getinnodbtableswithnopk

Let’s check the table’s definition:

show create table table2\G
*************** 1. row ***************
       Table: table2
Create Table: CREATE TABLE table2 (
  name varchar(20) DEFAULT NULL,
  age int DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci

And the current records:

select * from table2;
+--------+-----+
| name   | age |
+--------+-----+
| mysql  |  25 |
| kenny  |  35 |
| lefred |  44 |
+--------+-----+

Now let’s update the schema to add that specific invisible primary key:

alter table table2 
     add column id int unsigned auto_increment 
     primary key invisible first;

Now let’s add one record:

insert into table2 (name, age) values ('PHP', 25);

select * from table2;
+--------+-----+
| name   | age |
+--------+-----+
| mysql  |  25 |
| kenny  |  35 |
| lefred |  44 |
| PHP    |  25 |
+--------+-----+

And if we want to verify the primary key:

select id, table2.* from table2;
+----+--------+-----+
| id | name   | age |
+----+--------+-----+
|  1 | mysql  |  25 |
|  2 | kenny  |  35 |
|  3 | lefred |  44 |
|  4 | PHP    |  25 |
+----+--------+-----+

Conclusion

Now you know why Primary Keys are important in InnoDB and why a good Primary Key is even more important.

And since MySQL 8.0.23, you also have solution for tables without Primary Key with the invisible column !

Extra

Just for fun and to illustrate my opinion regarding the use of UUID_TO_BIN(UUID()) as Primary Key, let’s redo the example of the invisible column but this time using UUIDs.

alter table table2 add column id binary(16) invisible first;
alter table table2 modify column id binary(16) 
      default (UUID_TO_BIN(UUID())) invisible;
update table2 set id=uuid_to_bin(uuid());
alter table table2 add primary key(id);

So far nothing special, it was just a bit more tricky to create that invisible Primary Key.

Let’s query:

select * from table2;
+--------+-----+
| name   | age |
+--------+-----+
| mysql  |  25 |
| kenny  |  35 |
| lefred |  44 |
+--------+-----+

And now, we will insert a new record and query the table again:

insert into table2 (name, age) values ('PHP', 25);

select * from table2;
+--------+-----+
| name   | age |
+--------+-----+
| PHP    |  25 |
| mysql  |  25 |
| kenny  |  35 |
| lefred |  44 |
+--------+-----+

Mmmm.. why is PHP the first one now ?

Because the uuid() is not really sequential…

select bin_to_uuid(id), table2.* from table2;
+--------------------------------------+--------+-----+
| bin_to_uuid(id)                      | name   | age |
+--------------------------------------+--------+-----+
| 05aedcbd-5b36-11eb-94c0-c8e0eb374015 | PHP    |  25 |
| af2002e8-5b35-11eb-94c0-c8e0eb374015 | mysql  |  25 |
| af20117a-5b35-11eb-94c0-c8e0eb374015 | kenny  |  35 |
| af201296-5b35-11eb-94c0-c8e0eb374015 | lefred |  44 |
+--------------------------------------+--------+-----+

Do we have an alternative ?

Yes, if we follow the manual, we can see that the function uuid_to_bin() allows a swap flag. Let’s try it:

alter table table2 add column id binary(16) invisible first;
alter table table2 modify column id binary(16)        
                  default (UUID_TO_BIN(UUID(),1)) invisible;
update table2 set id=uuid_to_bin(uuid(),1);

Now every time we add a record, the insert will be sequential as expected:

select bin_to_uuid(id,1), table2.* from table2;
+--------------------------------------+--------+-----+
| bin_to_uuid(id,1)                    | name   | age |
+--------------------------------------+--------+-----+
| 5b3711eb-023c-e634-94c0-c8e0eb374015 | mysql  |  25 |
| 5b3711eb-0439-e634-94c0-c8e0eb374015 | kenny  |  35 |
| 5b3711eb-0471-e634-94c0-c8e0eb374015 | lefred |  44 |
| f9f075f4-5b37-11eb-94c0-c8e0eb374015 | PHP    |  25 |
| 60ccffda-5b38-11eb-94c0-c8e0eb374015 | PHP8   |   1 |
| 9385cc6a-5b38-11eb-94c0-c8e0eb374015 | Python |  20 |
+--------------------------------------+--------+-----+

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

3 Comments

  1. One can think up use cases for invisible columns, but I wouldn’t put invisible primary key makes it impossible for users to unambiguously identify individual rows in queries, edits &c.

    • Hi Peter,

      I do agree with you that this is not the best choice but it’s a much better choice than not having one at all. You would be surprised about the amount of applications not defining Primary Keys when using InnoDB.

      So we agree that having a good Primary Key (sequential and short) is the best but in case of none, and with an application not allowing to have an extra column that can be defined as PK, the invisible column is a good choice.

      Cheers,

Leave a Reply to Peter BrawleyCancel Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.