As I wrote earlier, a good Primary Key is important for InnoDB (storage, IOPS, secondary indexes, memory…) but there is another important domain where a Primary Key is important in MySQL: replication !
When using “traditional” replication, if you modify a record (
DELETE), the record(s) to modify on the replica are identified using indexes, and of course the Primary Key if any. The hidden global 6-bytes auto generated by InnoDB primary key is never used as never exposed and as it’s global, there is no warranty to have the same on the source and the replica. You shouldn’t consider it at all.
If the algorithm was not able to find a suitable index, or was only able to find an index that was non-unique or contained nulls, a hash table is used to assist in identifying the table records. The algorithm creates a hash table containing the rows in the
DELETE operation, with the key as the full before-image of the row. The algorithm then iterates over all the records in the target table, using the selected index if it found one, or else performing a full table scan (see the manual).
So if your application doesn’t support an extra key to be used as a Primary Key, using an hidden column as primary key is a option to speed up the operations by the replication applier.
Here is an example of statement where an additional column as Primary Key would break the application:
SQL create table t1 (name varchar(20), age int); SQL insert into t1 values ('mysql',25),('kenny', 35),('lefred', 44);
Now let’s add a auto_increment column as Primary Key:
SQL alter table t1 add id int auto_increment primary key first;
And now let’s add a record using the
INSERT statement as specified in the application:
SQL insert into t1 values ('python',20);
ERROR: 1136: Column count doesn't match value count at row 1
The best is of course to modify the application to change that
INSERT statement, but is it possible?
Also how many application are still using
SELECT * and then reference the column like
If this is your case, you have two options:
In this particular case, the choice is easy (at least for somebody as lazy as I am 😉 ):
SQL alter table t1 modify id int auto_increment invisible; SQL insert into t1 values ('python',20); Query OK, 1 row affected (0.0887 sec)
Easy, isn’t it ?
MySQL InnoDB Cluster uses another type of replication: Group Replication.
Let’s recreate the table without the Primary Key as on the example above and check if the instance can be used in an InnoDB Cluster:
The message is clear, modifications on that table won’t be spread in the cluster.
So let’s add the invisible primary key and check again:
All good !
This means that if you have an application using tables without Primary Key that didn’t allow you to migrate to a full HA solution like MySQL InnoDB Cluster, now thanks to the invisible column, this is possible.
This would help solving Hadoop Hive support for MySQL InnoDB Cluster for example (see HIVE-17306).