Recently during the Swedish MySQL User Group (SMUG), I presented a session dedicated to MySQL InnoDB Primary Keys.
I forgot to mention a detail that many people are not aware, but Jeremy Cole has pointed out.
Primary Key always included in secondary indexes at the right-most column
When we define a secondary index, the secondary index includes the Primary Key as the right-most column of the index. It’s silently added, meaning that it’s not visible but it’s used to point back to the record in the clustered index.
This is an example with a table having a Primary Key composed of multiple columns:
CREATE TABLE `t1` (
`a` int NOT NULL,
`b` int NOT NULL,
`c` int NOT NULL,
`d` int NOT NULL,
`e` int NOT NULL,
`f` varchar(10) DEFAULT 'aaa',
`inserted` datetime DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY(`a`,`b`,`c`) ) ENGINE=InnoDB;
And this is the table’s content (only 2 records):
SELECT * FROM t1;
+---+---+---+----+----+-----+---------------------+
| a | b | c | d | e | f | inserted |
+---+---+---+----+----+-----+---------------------+
| 1 | 2 | 3 | 4 | 5 | abc | 2024-02-11 17:37:16 |
| 7 | 8 | 9 | 10 | 11 | def | 2024-02-11 17:37:26 |
+---+---+---+----+----+-----+---------------------+
Now let’s create a secondary key for the column f:
ALTER TABLE t1 ADD INDEX f_idx(f);
This key will then include the Primary Key as the right-most column(s) on the secondary index:
The orange filled entries are the hidden one.
Let’s verify this on the InnoDB page for that index:
And indeed, we can see that the Primary Key columns (in red) are included in each entry of the secondary index (in purple).
But not always !
When we have a Primary Key or Part of a Primary Key included in a Secondary Index, only the eventual missing columns of the Primary Key index will be added as right-most and hidden entries to the secondary index.
Let’s create a secondary index where the column b
will be missing:
ALTER TABLE t1 ADD INDEX sec_idx (`d`,`c`,`e`,`a`);
The column b
will be indeed added as the right-most hidden column of the index. Let’s verify this:
We can see above, that indeed, the value of column b
is added. Same for the second record:
If we check in the InnoDB source code, this is also commented:
But what will happen, if we just use a prefix part of the Primary Key in the secondary index?
Let’s try:
CREATE TABLE `t1` (
`a` varchar(10) NOT NULL DEFAULT 'aaaaaaaaaa',
`b` varchar(10) NOT NULL DEFAULT 'bbbbbbbbbb',
`c` int NOT NULL DEFAULT '1',
`f` varchar(10) DEFAULT NULL,
PRIMARY KEY (`a`,`b`,`c`),
KEY `sec_idx` (`c`,`f`,`a`(2))
) ENGINE=InnoDB
SELECT * FROM t1;
+------------+------------+---+-----+
| a | b | c | f |
+------------+------------+---+-----+
| aaaaaaaaaa | bbbbbbbbbb | 1 | abc |
| ccccccccc | dddddddddd | 2 | def |
+------------+------------+---+-----+
We can see that only 2 characters of the column a
are used in the secondary index.
If we check in the InnoDB page, we can notice that in-fact, the full column will also be added as the right-most hidden part of the secondary index:
So InnoDB needs to have the full PK, visible or hidden in the secondary index.
This is something not often known. But let’s hope this blog post clarifies it.
Enjoy MySQL & InnoDB… and of course, don’t forget to use Primary Keys !
Hi – great information as usual. But like in ORACLE DB — is there no ROWID implicit column in mysql or is there?
And if PK is NOT included — then will secondary index has to point to data block row id – somehow – what would it be? (I guess that is what is equivalent of ROWID I am looking for)
thank you
If you check the slides, you will see that when there is not PK defined, the first not null unique key will be used. If there are not, then InnoDB will use a hidden 6-bytes key that will be shared between all tables without valid PK which will cause mutex issues in case of concurrent writes. Also this particular key won’t be used by replication as it’s never externalized, also explained in the slides.
Long story short: use and define Primary Keys with InnoDB !
Oracle rowid represents address of the row in the table, every index has it.
I like this implementation but in mySQL world the decision made to use clustered indexes. Since clustered index is the table and the way to organize the table and each row have unique “address” by the auto-incremented field I would say that you may consider an auto-incremented field as a rowid.
There is, ofcourse, some implementation details of how InnoDB engine in mySQL get the physical block that contain the row but I believe those details, although interesting, out of scope of this answer.
I.e. it seems reasonable, in mysql, to define an auto-incremented numeric field as PK (using numeric datatype as small as possible but still covering business needs) and then add secondary valid (natural) unique key.
The auto-incremented would be relatively small overhead on the rest of the indexes (just like rowid).
Alway keep in mind that PK is part of all secondary indexes (which is demonstrated very efficiently by Frederik in this post), just like oracle rowid part of all leafs of all indexes.
Hi, this is very interesting conversation (for me at least) – from ORACLE DB perspective we have few optimization principles – (1) LIO optimization (2) total work optimization – depending on goal
SELECT EMPNO,FNAME,LNAME,SAL,EMPNO from EMP WHERE DEPTNUM=10;
This will result in lookup via EMP_SI index – it should locate the ROW ADDRESSes — then database will read rest of the data from row block.
Thus we can say its best path to EMP_SI- tree I/O cost (say 3) + (1 I/O to table (random read) x number of entries ) (if same blocks there will be saving)
If you have MySQL index for same table – are we talking first obtain the index entry using deptno=10 – then – it will be PK values of the table lets say EMPNO 501, 1232, 2333 — then it will traverse 501 get the row address of the block? and then same for 1232, get data block, then get 2333 index and get data block?
seems like more logical I/O in mysql .. This is what I am curious to find in mysql.
one possible positive of mysql — if ORACLE row moves to another row — inedex will either have chaining issue or some such. In mysql – it will be still PK value thats with index – so no impact due to row movment i guess …
Also if I have tables with 15 indexes – AND I have a index with multi-column PK – god knows what overhead we are talking about! JDE, SIEBEL, PSOFT, EBS — all have very high number of indexes — of course mySQL is not supported for these products (that I know of) – but in general issue is good to know
thanks
hi, what tools you used to analyze the InnoDB page in this blog? thank you!
Hello,
This is only
xxd
on Linux.Cheers,
Btw, which tool is used to analyse the binary index file as shown in the image above?
Hi Siva,
This is `xxd` in a terminal.
Cheers