MySQL & UUIDs

More and more people are using UUID’s to identify records in their database.

As you already know, for MySQL’s storage engine (InnoDB) the primary key is very important ! (for performance, memory and disk space).

See the following links:

Problems

There are 2 major problems having a UUID as Primary Key in InnoDB:

  1. generally they are random and cause clustered index to be rebalanced
  2. they are included in each secondary indexes (consuming disk and memory)

Let’s have a look at this example:

MySQL > CREATE TABLE my_table ( 
       uuid VARCHAR(36) DEFAULT (UUID()) PRIMARY KEY, 
       name VARCHAR(20), beers int unsigned);
...

MySQL > SELECT * FROM my_table;
+--------------------------------------+---------+-------+
| uuid                                 | name    | beers |
+--------------------------------------+---------+-------+
| 17cd1188-1fa0-11ed-ba36-c8cb9e32df8e | Kenny   |     0 |
| 17cd12e2-1fa0-11ed-ba36-c8cb9e32df8e | lefred  |     1 |
| 478368a0-1fa0-11ed-ba36-c8cb9e32df8e | Scott   |     1 |
| 47836a77-1fa0-11ed-ba36-c8cb9e32df8e | Lenka   |     0 |
+--------------------------------------+---------+-------+

Now, let’s insert 2 new records:

MySQL > INSERT INTO my_table (name, beers) VALUES ("Luis",1), ("Miguel",5);

We can check the content of the table:

MySQL > SELECT * FROM my_table;
+--------------------------------------+---------+-------+
| uuid                                 | name    | beers |
+--------------------------------------+---------+-------+
| 17cd1188-1fa0-11ed-ba36-c8cb9e32df8e | Kenny   |     0 |
| 17cd12e2-1fa0-11ed-ba36-c8cb9e32df8e | lefred  |     1 |
| 36f1ce9a-1fa1-11ed-ba36-c8cb9e32df8e | Luis    |     1 |  <--
| 36f1d158-1fa1-11ed-ba36-c8cb9e32df8e | Miguel  |     5 |  <--
| 478368a0-1fa0-11ed-ba36-c8cb9e32df8e | Scott   |     1 |
| 47836a77-1fa0-11ed-ba36-c8cb9e32df8e | Lenka   |     0 |
+--------------------------------------+---------+-------+

We can see that the two new records were not inserted at the end of the table but in the middle. InnoDB had to move two old records to be able to insert the two new before them. On such small table (all records are on the same page) that doesn’t cause any problem, but imagine this table is 1TB large !

Additionally, if we keep the VARCHCAR datatype for our uuid, the primary key could take 146 bytes per row (some utf8 characters can take up to 4 bytes + the 2 bytes marking the end of the VARCHAR):

MySQL > EXPLAIN SELECT * FROM my_table WHERE 
        uuid='36f1d158-1fa1-11ed-ba36-c8cb9e32df8e'\G
 *************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: my_table
   partitions: NULL
         type: const
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 146        <--
          ref: const
         rows: 1
     filtered: 100
        Extra: NULL

Solutions

Of course there are some best practices that MySQL users can follow to avoid those problems:

  1. use a smaller datatype to store their UUIDs: BINARY(16)
  2. store the UUIDs sequentially: use UUID_TO_BIN(..., swap_flag)
    • The time-low and time-high parts (the first and third groups of hexadecimal digits, respectively) are swapped.

Let’s see this in action with the following example:

MySQL > CREATE TABLE my_table2 ( 
           uuid BINARY(16) DEFAULT (UUID_TO_BIN(UUID(), 1)) PRIMARY KEY, 
           name VARCHAR(20), beers int unsigned);

MySQL > SELECT * FROM my_table2;
+------------------------------------+--------+-------+
| uuid                               | name   | beers |
+------------------------------------+--------+-------+
| 0x11ED1F9F633ECB6CBA36C8CB9E32DF8E | Kenny  |     0 |
| 0x11ED1F9F633ECD6FBA36C8CB9E32DF8E | lefred |     1 |
+------------------------------------+--------+-------+

As the UUID is now binary, we need to decode it using the function BIN_TO_UUID() and not forget the swap flag:

MySQL > SELECT BIN_TO_UUID(uuid,1), name, beers FROM my_table2;
+--------------------------------------+--------+-------+
| BIN_TO_UUID(uuid,1)                  | name   | beers |
+--------------------------------------+--------+-------+
| 633ecb6c-1f9f-11ed-ba36-c8cb9e32df8e | Kenny  |     0 |
| 633ecd6f-1f9f-11ed-ba36-c8cb9e32df8e | lefred |     1 |
+--------------------------------------+--------+-------+

And now we can verify that when we add new entries they are added to the end of the table:

MySQL > INSERT INTO my_table2 (name, beers) VALUES ("Scott",1), ("Lenka",5); 

MySQL > SELECT * FROM my_table2;
+------------------------------------+---------+-------+
| uuid                               | name    | beers |
+------------------------------------+---------+-------+
| 0x11ED1F9F633ECB6CBA36C8CB9E32DF8E | Kenny   |     0 |
| 0x11ED1F9F633ECD6FBA36C8CB9E32DF8E | lefred  |     1 |
| 0x11ED1FA537C57361BA36C8CB9E32DF8E | Scott   |     1 |  <--
| 0x11ED1FA537C5752DBA36C8CB9E32DF8E | Lenka   |     5 |  <--
+------------------------------------+---------+-------+

and we can of course decode the UUID and see that without the swap flag, InnoDB would have to rebalance the clustered index:

MySQL > SELECT BIN_TO_UUID(uuid,1), name, beers FROM my_table2;
+--------------------------------------+---------+-------+
| BIN_TO_UUID(uuid,1)                  | name    | beers |
+--------------------------------------+---------+-------+
| 633ecb6c-1f9f-11ed-ba36-c8cb9e32df8e | Kenny   |     0 |
| 633ecd6f-1f9f-11ed-ba36-c8cb9e32df8e | lefred  |     1 |
| 37c57361-1fa5-11ed-ba36-c8cb9e32df8e | Scott   |     1 |  <--
| 37c5752d-1fa5-11ed-ba36-c8cb9e32df8e | Lenka   |     5 |  <--
+--------------------------------------+---------+-------+

And of course, now the size of the primary key is smaller and fixed to 16 bytes. Only those 16 bytes are added to all secondary indexes:

MySQL > EXPLAIN SELECT * FROM my_table2 
        WHERE uuid=UUID_TO_BIN("37c5752d-1fa5-11ed-ba36-c8cb9e32df8e",1)\G
 *************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: my_table2
   partitions: NULL
         type: const
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 16        <---
          ref: const
         rows: 1
     filtered: 100
        Extra: NULL

UUID v1

MySQL generates UUID v1 as described in RFC4122.

  • UUID v1 : is a universally unique identifier that is generated using a timestamp and the MAC address of the computer on which it was generated.
  • UUID v4 : is a universally unique identifier that is generated using random numbers.

With UUID v4m it’s not possible to generate any sequential ouput and this is why those random UUID should never be used as Primary Key with InnoDB.

UUID v4

Some developers keep asking about UUIDv4 and how to generate them for MySQL. Browsing the Internet, you can find several store procedures trying to achieve this.

This one, found on StackOverflow, is maybe my favorite:

CREATE FUNCTION uuid_v4s()
    RETURNS CHAR(36)
BEGIN
    -- 1th and 2nd block are made of 6 random bytes
    SET @h1 = HEX(RANDOM_BYTES(4));
    SET @h2 = HEX(RANDOM_BYTES(2));

    -- 3th block will start with a 4 indicating the version, remaining is random
    SET @h3 = SUBSTR(HEX(RANDOM_BYTES(2)), 2, 3);

    -- 4th block first nibble can only be 8, 9 A or B, remaining is random
    SET @h4 = CONCAT(HEX(FLOOR(ASCII(RANDOM_BYTES(1)) / 64)+8),
                SUBSTR(HEX(RANDOM_BYTES(2)), 2, 3));

    -- 5th block is made of 6 random bytes
    SET @h5 = HEX(RANDOM_BYTES(6));

    -- Build the complete UUID
    RETURN LOWER(CONCAT(
        @h1, '-', @h2, '-4', @h3, '-', @h4, '-', @h5
    ));
END

Unfortunately this function cannot be used as default expression for a column.

I also wrote a component using boost’s uuid library: https://github.com/lefred/mysql-component-uuid_v4

But this new function is also not usable as default value expression.

MySQL error code MY-003770 (ER_DEFAULT_VAL_GENERATED_NAMED_FUNCTION_IS_NOT_ALLOWED): Default value expression of column '%s' contains a disallowed function: %s.

This means that every new record needs to provide the uuid column… this is not too complicated anyway.

Let’s see an example:

MySQL > install component "file://component_uuid_v4";

MySQL > select uuid_v4() ;
+--------------------------------------+
| uuid_v4()                            |
+--------------------------------------+
| 9944272b-e3f9-4778-9c54-818f0baa87da |
+--------------------------------------+
1 row in set (0.0002 sec)

Now we will create a new table, but as recommended, we won’t use the uuid as Primary Key ! We will use a new feature of MySQL 8.0.30: GIPK Mode !

GIPK stands for Generated Invisible Primary Key, check the manual for more info.

MySQL > SET sql_generate_invisible_primary_key=1;
 
MySQL > CREATE TABLE my_table3 (   
            uuid BINARY(16) NOT NULL UNIQUE,
            name VARCHAR(20), beers INT UNSIGNED);

MySQL > SHOW CREATE TABLE my_table3\G
*************************** 1. row ***************************
       Table: my_table3
Create Table: CREATE TABLE `my_table3` (
  `my_row_id` bigint unsigned NOT NULL AUTO_INCREMENT /*!80023 INVISIBLE */,
  `uuid` binary(16) NOT NULL,
  `name` varchar(20) DEFAULT NULL,
  `beers` int unsigned DEFAULT NULL,
  PRIMARY KEY (`my_row_id`),
  UNIQUE KEY `uuid` (`uuid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci

Now let’s insert some records and see if they are inserted sequentially and if the UUID’s value is completely random:

MySQL > INSERT INTO my_table3 (uuid, name, beers) 
        VALUES (UUID_TO_BIN(uuid_v4()),'Kenny', 3),
               (UUID_TO_BIN(uuid_v4()), 'lefred', 1);

MySQL > SELECT * FROM my_table3;
+------------------------------------+--------+-------+
| uuid                               | name   | beers |
+------------------------------------+--------+-------+
| 0x5A28E5482CDF4B3D89A298ECA3F3703B | Kenny  |     3 |
| 0x94662BF4DC2F469489D868820B7B31E5 | lefred |     1 |
+------------------------------------+--------+-------+

MySQL > SELECT BIN_TO_UUID(uuid), name, beers FROM my_table3;
+--------------------------------------+--------+-------+
| bin_to_uuid(uuid)                    | name   | beers |
+--------------------------------------+--------+-------+
| 5a28e548-2cdf-4b3d-89a2-98eca3f3703b | Kenny  |     3 |
| 94662bf4-dc2f-4694-89d8-68820b7b31e5 | lefred |     1 |
+--------------------------------------+--------+-------+

So far, so good.. let’s add some more records:

MySQL > INSERT INTO my_table3 (uuid, name, beers) 
        VALUES (UUID_TO_BIN(uuid_v4()),'Scott', 10), 
               (UUID_TO_BIN(uuid_v4()), 'Lenka', 0);

MySQL > SELECT BIN_TO_UUID(uuid), name, beers FROM my_table3;
+--------------------------------------+--------+-------+
| bin_to_uuid(uuid)                    | name   | beers |
+--------------------------------------+--------+-------+
| 5a28e548-2cdf-4b3d-89a2-98eca3f3703b | Kenny  |     3 |
| 94662bf4-dc2f-4694-89d8-68820b7b31e5 | lefred |     1 |
| 615fae32-d6c8-439c-9520-5d3c8bfa934b | Scott  |    10 |
| 80a01a29-489b-419d-bca1-05a756ad9d9d | Lenka  |     0 |
+--------------------------------------+--------+-------+

We can see that indeed, the UUIDs are completely random and sequentially added to the table. The reason of that optimal sequential insertion is that the invisible Primary Key is an auto_increment.

It’s possible to also display it on demand:

MySQL > SELECT my_row_id, BIN_TO_UUID(uuid), name, beers FROM my_table3;
+-----------+--------------------------------------+--------+-------+
| my_row_id | bin_to_uuid(uuid)                    | name   | beers |
+-----------+--------------------------------------+--------+-------+
|         1 | 5a28e548-2cdf-4b3d-89a2-98eca3f3703b | Kenny  |     3 |
|         2 | 94662bf4-dc2f-4694-89d8-68820b7b31e5 | lefred |     1 |
|         3 | 615fae32-d6c8-439c-9520-5d3c8bfa934b | Scott  |    10 |
|         4 | 80a01a29-489b-419d-bca1-05a756ad9d9d | Lenka  |     0 |
+-----------+--------------------------------------+--------+-------+

Conclusion

In summary, if you want to use UUID’s in MySQL, it’s recommended to use UUID v1, those generated by MySQL, and store them as binary using the swap flag.

If for some reason you need UUID v4, it is recommended to let MySQL and InnoDB handle the primary key by enabling GIPK mode.

Enjoy MySQL !

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

3 Comments

    • Hi,

      Unfortunately, the answer is as usual in this case: “it depends” 😉

      If the table is not too large, and if there are enough reserved free space in the page (InnoDB tries to reserve some space) that avoid the need of splitting it and if random IOs are fast and Buffer Pool usage is correct, the expected performance might not be visible, it could even be the reverse as it seems the function is “slower”:


      SQL > select uuid_to_bin(uuid());
      +------------------------------------+
      | uuid_to_bin(uuid()) |
      +------------------------------------+
      | 0x60B4141A340111ED9591C8CB9E32DF8E |
      +------------------------------------+
      1 row in set (0.0001 sec)

      SQL > select uuid_to_bin(uuid(),1);
      +------------------------------------+
      | uuid_to_bin(uuid(),1) |
      +------------------------------------+
      | 0x11ED34016190B4369591C8CB9E32DF8E |
      +------------------------------------+
      1 row in set (0.0002 sec)

Leave a Reply to lefredCancel Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

As MySQL Community Manager, I am an employee of Oracle and the views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

You can find articles I wrote on Oracle’s blog.