MySQL Document Store: creating generated columns like a boss ;)

on
Share on FacebookShare on Google+Tweet about this on TwitterShare on LinkedIn

Last Thursday, I was introducing MySQL Document Store in Ghent, BE at Percona University.

I was explaining how great is this technology and how MySQL can replace your NoSQL database but still provides you all the benefits from a RDBMS.

This is the full presentation:

Introduction to MySQL Document Store from Frederic Descamps

Then somebody came with a nice question. Let me put first some context:

  • we will create a collection to add people in it
  • we will create a virtual column on the age
  • we will index that column
  • we will query and add records to that collection

Collection creation and add some users

mysql-js> schema = session.getSchema('docstore')

mysql-js> collection = schema.createCollection('users')

mysql-js> collection.add({name: "Descamps", firtname: "Frederic", age: "41"}).execute();
mysql-js> collection.add({name: "Cottyn", firtname: "Yvan", age: "42"}).execute();
mysql-js> collection.add({name: "Buytaert", firtname: "Kris", age: "41"}).execute();

mysql-js> \sql
Switching to SQL mode... Commands end with ;
mysql-sql> select * from users;
+------------------------------------------------------------------------------------------------------+----------------------------------+
| doc                                                                                                  | _id                              |
+------------------------------------------------------------------------------------------------------+----------------------------------+
| {"_id": "06ab653c0c58e7117611685b359e77d5", "age": "41", "name": "Descamps", "firtname": "Frederic"} | 06ab653c0c58e7117611685b359e77d5 |
| {"_id": "9828dd6e0c58e7117611685b359e77d5", "age": "41", "name": "Buytaert", "firtname": "Kris"}     | 9828dd6e0c58e7117611685b359e77d5 |
| {"_id": "f24730610c58e7117611685b359e77d5", "age": "42", "name": "Cottyn", "firtname": "Yvan"}       | f24730610c58e7117611685b359e77d5 |
+------------------------------------------------------------------------------------------------------+----------------------------------+
3 rows in set (0.00 sec)

Virtual Column Creation

Usually when we create a virtual generated column, we do like this:

mysql-sql> alter table users add column age varchar(2) 
           GENERATED ALWAYS AS (json_unquote(json_extract(`doc`,'$.age'))) VIRTUAL;
Query OK, 0 rows affected (0.19 sec)

mysql-sql> select _id, age from users;
+----------------------------------+-----+
| _id                              | age |
+----------------------------------+-----+
| 06ab653c0c58e7117611685b359e77d5 | 41  |
| 9828dd6e0c58e7117611685b359e77d5 | 41  |
| f24730610c58e7117611685b359e77d5 | 42  |
+----------------------------------+-----+

The first question I got was related to the data type. As age are integers, could we use it as an integer too ?

The answer is of course yes, but be careful:

mysql-sql> alter table users drop column age;

mysql-sql> alter table users add column age int 
           GENERATED ALWAYS AS (json_unquote(json_extract(`doc`,'$.age'))) VIRTUAL;

mysql-sql> select _id, age from users;
+----------------------------------+-----+
| _id                              | age |
+----------------------------------+-----+
| 06ab653c0c58e7117611685b359e77d5 |  41 |
| 9828dd6e0c58e7117611685b359e77d5 |  41 |
| f24730610c58e7117611685b359e77d5 |  42 |
+----------------------------------+-----+
3 rows in set (0.00 sec)
mysql-sql> show create table users\G
*************************** 1. row ***************************
       Table: users
Create Table: CREATE TABLE `users` (
  `doc` json DEFAULT NULL,
  `_id` varchar(32) 
        GENERATED ALWAYS AS 
         (json_unquote(json_extract(`doc`,'$._id'))) STORED NOT NULL,
  `age` int(11) 
        GENERATED ALWAYS AS 
         (json_unquote(json_extract(`doc`,'$.age'))) VIRTUAL,
  PRIMARY KEY (`_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
1 row in set (0.00 sec)

Ok, so now we are using a virtual column as an integer…. but as you know know, NoSQL does’nt really care, we could add there any value of any type.

But what will MySQL think about it ?

Let’s verify:

mysql-js> collection.add({name: "Vanoverbeke", firtname: "Dimitri", age: "kid"}).execute();
Incorrect integer value: 'kid' for column 'age' at row 1 (MySQL Error 1366)

As you can see above, the virtual column causes an error and we are not able to add such value… but if we really want ? #freedomeverywhere !

But before, let’s remove that generated column and add the record first and then create again the virtual column:

mysql-sql> alter table users drop column age;
mysql-sql> \js
Switching to JavaScript mode...
mysql-js> collection.add({name: "Vanoverbeke", firtname: "Dimitri", age: "kid"}).execute();
Query OK, 1 item affected (0.05 sec)
mysql-sql> alter table users 
           add column age int 
           GENERATED ALWAYS AS (json_unquote(json_extract(`doc`,'$.age'))) VIRTUAL;

This works like a charm (see note at the end of the post)

mysql-sql> select _id, age from users;
+----------------------------------+-----+
| _id                              | age |
+----------------------------------+-----+
| 06ab653c0c58e7117611685b359e77d5 |  41 |
| 9828dd6e0c58e7117611685b359e77d5 |  41 |
| ca4f7fba1058e7117611685b359e77d5 |   0 |
| f24730610c58e7117611685b359e77d5 |  42 |
+----------------------------------+-----+
4 rows in set (0.00 sec)

But we can’t add another one anyway:

mysql-js> collection.add({name: "Gryp", firtname: "Kenny", age: "teenager"}).execute();
Incorrect integer value: 'teenager' for column 'age' at row 1 (MySQL Error 1366)

So let’s remove the column and recreate it with an index on it too:

mysql-sql> alter table users drop column age;
mysql-sql> alter table users add column age int 
           GENERATED ALWAYS AS 
           (json_unquote(json_extract(`doc`,'$.age'))) VIRTUAL, add index age_idx(age) ;
ERROR: 1366: Incorrect integer value: 'kid' for column 'age' at row 1

Then it doesn’t work anymore, even with data already present in the collection.

CAST( )

OK, we should find another solution. Let’s try with the CAST() function that will return 0 if it cannot find a integer in the value:

mysql-sql> SELECT CAST("kid" AS UNSIGNED);
+-------------------------+
| CAST("kid" AS UNSIGNED) |
+-------------------------+
| 0                       |
+-------------------------+
1 row in set, 1 warning (0.00 sec)

This seems to be what we are looking for, let’s use it:

mysql-sql> alter table users add column age int 
           GENERATED ALWAYS AS 
           (cast((json_unquote(json_extract(`doc`,'$.age'))) AS SIGNED)) 
           VIRTUAL, add index age_idx(age) ;
ERROR: 1292: Truncated incorrect INTEGER value: 'kid'

Not the same error, but it doesn’t work.

You might then think that instead of having a VIRTUAL column, we should STORE it and index it…

Unfortunately this is neither an option:

mysql-sql> alter table users add column age int 
           GENERATED ALWAYS AS 
           (cast((json_unquote(json_extract(`doc`,'$.age'))) AS SIGNED)) STORED;
ERROR: 1292: Truncated incorrect INTEGER value: 'kid'

And so ?

So the second question was if we type our fields and we want to index them, as in json they are not typed, what will happen ?

If you are not sure the same type will be used in the document for the same attribute, as you can see, it’s not working very well. Such check must be done in the application that uses MySQL Document Store or you will face some problems as described above.

But of course there is a solution (if there is no solution, there is no problem, isn’t it ?)

Solution

Instead of using the CAST() function we will create our generated column like a boss and use IF with an old trick of adding 0 (IF( ) with CAST() would also work):

mysql-sql> alter table users add column age int 
           GENERATED ALWAYS 
           AS (IF(doc->>"$.age"+0=0,NULL,doc->>"$.age")) VIRTUAL, WITH VALIDATION;
Query OK, 4 rows affected (0.81 sec)

I’ve also added WITH VALIDATION. This means that the ALTER TABLE copies the table and if an out-of-range or any other error occurs, the statement fails. So you are familiar with this too. The default is WITHOUT VALIDATION and this is why one of our previous statement worked like a charm.

Let’s have a look at our users:

mysql-sql> select * from users; 
+---------------------------------------------------------------------------------------------------------+----------------------------------+------+
| doc                                                                                                     | _id                              | age  |
+---------------------------------------------------------------------------------------------------------+----------------------------------+------+
| {"_id": "06ab653c0c58e7117611685b359e77d5", "age": "41", "name": "Descamps", "firtname": "Frederic"}    | 06ab653c0c58e7117611685b359e77d5 | 41   |
| {"_id": "9828dd6e0c58e7117611685b359e77d5", "age": "41", "name": "Buytaert", "firtname": "Kris"}        | 9828dd6e0c58e7117611685b359e77d5 | 41   |
| {"_id": "c4f986214e58e711434d685b359e77d5", "age": "kid", "name": "Vanoverbeke", "firtname": "Dimitri"} | c4f986214e58e711434d685b359e77d5 | null |
| {"_id": "f24730610c58e7117611685b359e77d5", "age": "42", "name": "Cottyn", "firtname": "Yvan"}          | f24730610c58e7117611685b359e77d5 | 42   |
+---------------------------------------------------------------------------------------------------------+----------------------------------+------+

Now we can try to add the index:

mysql-sql> alter table users add index age_idx(age);

And we can even add new data having the age not being the expected integer:

mysql-js> collection.add({name: "Gryp", firtname: "Kenny", age: "teenager"}).execute();

As you can see, it requires some extra effort if you want to type JSON attributes in MySQL’s virtual columns but this allows you to mix both worlds, NoSQL and SQL, very easily using one single platform !

Share on FacebookShare on Google+Tweet about this on TwitterShare on LinkedIn

Leave a Reply

Your email address will not be published. Required fields are marked *

recent

Last Tweets Last Tweets

Locations of visitors to this page
categories