MySQL InnoDB: Primary Key always included in secondary indexes as the right-most columns… or not

Recently during the Swedish MySQL User Group (SMUG), I presented a session dedicated to MySQL InnoDB Primary Keys.

I forgot to mention a detail that many people are not aware, but Jeremy Cole has pointed out.

Primary Key always included in secondary indexes at the right-most column

When we define a secondary index, the secondary index includes the Primary Key as the right-most column of the index. It’s silently added, meaning that it’s not visible but it’s used to point back to the record in the clustered index.

This is an example with a table having a Primary Key composed of multiple columns:

CREATE TABLE `t1` (
`a` int NOT NULL,
`b` int NOT NULL, 
`c` int NOT NULL,
`d` int NOT NULL,
`e` int NOT NULL,
`f` varchar(10) DEFAULT 'aaa',
`inserted` datetime DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY(`a`,`b`,`c`) ) ENGINE=InnoDB; 

And this is the table’s content (only 2 records):

SELECT * FROM t1;
+---+---+---+----+----+-----+---------------------+
| a | b | c | d  | e  | f   | inserted            |
+---+---+---+----+----+-----+---------------------+
| 1 | 2 | 3 |  4 |  5 | abc | 2024-02-11 17:37:16 |
| 7 | 8 | 9 | 10 | 11 | def | 2024-02-11 17:37:26 |
+---+---+---+----+----+-----+---------------------+

Now let’s create a secondary key for the column f:

ALTER TABLE t1 ADD INDEX f_idx(f);

This key will then include the Primary Key as the right-most column(s) on the secondary index:

The orange filled entries are the hidden one.

Let’s verify this on the InnoDB page for that index:

And indeed, we can see that the Primary Key columns (in red) are included in each entry of the secondary index (in purple).

But not always !

When we have a Primary Key or Part of a Primary Key included in a Secondary Index, only the eventual missing columns of the Primary Key index will be added as right-most and hidden entries to the secondary index.

Let’s create a secondary index where the column b will be missing:

ALTER TABLE t1 ADD INDEX sec_idx (`d`,`c`,`e`,`a`);

The column b will be indeed added as the right-most hidden column of the index. Let’s verify this:

We can see above, that indeed, the value of column b is added. Same for the second record:

If we check in the InnoDB source code, this is also commented:

But what will happen, if we just use a prefix part of the Primary Key in the secondary index?

Let’s try:

CREATE TABLE `t1` (
  `a` varchar(10) NOT NULL DEFAULT 'aaaaaaaaaa',
  `b` varchar(10) NOT NULL DEFAULT 'bbbbbbbbbb',
  `c` int NOT NULL DEFAULT '1',
  `f` varchar(10) DEFAULT NULL,
  PRIMARY KEY (`a`,`b`,`c`),
  KEY `sec_idx` (`c`,`f`,`a`(2))
) ENGINE=InnoDB 

SELECT * FROM t1;
+------------+------------+---+-----+
| a          | b          | c | f   |
+------------+------------+---+-----+
| aaaaaaaaaa | bbbbbbbbbb | 1 | abc |
| ccccccccc  | dddddddddd | 2 | def |
+------------+------------+---+-----+

We can see that only 2 characters of the column a are used in the secondary index.

If we check in the InnoDB page, we can notice that in-fact, the full column will also be added as the right-most hidden part of the secondary index:

So InnoDB needs to have the full PK, visible or hidden in the secondary index.

This is something not often known. But let’s hope this blog post clarifies it.

Enjoy MySQL & InnoDB… and of course, don’t forget to use Primary Keys !

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

8 Comments

  1. Hi – great information as usual. But like in ORACLE DB — is there no ROWID implicit column in mysql or is there?

    And if PK is NOT included — then will secondary index has to point to data block row id – somehow – what would it be? (I guess that is what is equivalent of ROWID I am looking for)

    thank you

    • If you check the slides, you will see that when there is not PK defined, the first not null unique key will be used. If there are not, then InnoDB will use a hidden 6-bytes key that will be shared between all tables without valid PK which will cause mutex issues in case of concurrent writes. Also this particular key won’t be used by replication as it’s never externalized, also explained in the slides.

      Long story short: use and define Primary Keys with InnoDB !

    • Oracle rowid represents address of the row in the table, every index has it.
      I like this implementation but in mySQL world the decision made to use clustered indexes. Since clustered index is the table and the way to organize the table and each row have unique “address” by the auto-incremented field I would say that you may consider an auto-incremented field as a rowid.
      There is, ofcourse, some implementation details of how InnoDB engine in mySQL get the physical block that contain the row but I believe those details, although interesting, out of scope of this answer.

      I.e. it seems reasonable, in mysql, to define an auto-incremented numeric field as PK (using numeric datatype as small as possible but still covering business needs) and then add secondary valid (natural) unique key.
      The auto-incremented would be relatively small overhead on the rest of the indexes (just like rowid).

      Alway keep in mind that PK is part of all secondary indexes (which is demonstrated very efficiently by Frederik in this post), just like oracle rowid part of all leafs of all indexes.

      • Hi, this is very interesting conversation (for me at least) – from ORACLE DB perspective we have few optimization principles – (1) LIO optimization (2) total work optimization – depending on goal

        SELECT EMPNO,FNAME,LNAME,SAL,EMPNO from EMP WHERE DEPTNUM=10;

        This will result in lookup via EMP_SI index – it should locate the ROW ADDRESSes — then database will read rest of the data from row block.
        Thus we can say its best path to EMP_SI- tree I/O cost (say 3) + (1 I/O to table (random read) x number of entries ) (if same blocks there will be saving)

        If you have MySQL index for same table – are we talking first obtain the index entry using deptno=10 – then – it will be PK values of the table lets say EMPNO 501, 1232, 2333 — then it will traverse 501 get the row address of the block? and then same for 1232, get data block, then get 2333 index and get data block?

        seems like more logical I/O in mysql .. This is what I am curious to find in mysql.

        one possible positive of mysql — if ORACLE row moves to another row — inedex will either have chaining issue or some such. In mysql – it will be still PK value thats with index – so no impact due to row movment i guess …

        Also if I have tables with 15 indexes – AND I have a index with multi-column PK – god knows what overhead we are talking about! JDE, SIEBEL, PSOFT, EBS — all have very high number of indexes — of course mySQL is not supported for these products (that I know of) – but in general issue is good to know

        thanks

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

As MySQL Community Manager, I am an employee of Oracle and the views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

You can find articles I wrote on Oracle’s blog.