Since the release of MySQL 8.0, the MySQL X Dev API has provided users with the convenient ability to utilize MySQL without the need to write a single line of SQL!
MySQL X Dev API brings the support for CRUD operations on JSON documents that are stored in MySQL. MySQL Document Store is ACID compliant and is compatible with everything MySQL like replication, InnoDB Cluster, …
The MySQL X Dev API is available using the MySQL X Protocol, listening by default on port 33060.
If you are interested in learning more about MySQL Document Store, please refer to the these presentations [1], [2], [3].
OCI MySQL HeatWave Database Service
The MySQL DBaaS on OCI is the only one providing access to the X Protocol. This mean you can create applications that doesn’t use any SQL to add, delete and modify and retrieve JSON documents stored in the DB System.
Let’s add some documents to our DB System. We will use the restaurants
collection used in various MongoDB examples.
We first connect to our DB System (MySQL HeatWave instance in OCI) using MySQL Shell and the X Protocol (default). You can use a VPN, a Compute Instance, a bastion host…
Next, we create a new schema and we import all the JSON documents using the importJson()
utility:
JS > session.createSchema('docstore')
<Schema:docstore>
JS > \u docstore
Default schema `docstore` accessible through db.
JS > util.importJson('restaurants.json', {convertBsonOid: true})
Importing from file "restaurants.json" to collection
`docstore`.`restaurants` in MySQL Server at 10.0.1.249:33060
.. 25359.. 25359
Processed 15.60 MB in 25359 documents in 0.9856 sec (25.36K documents/s)
Total successfully imported documents 25359 (25.36K documents/s)
Done ! It’s fast and easy.
CRUD
It’s time to test the import and read the data we have imported.
Let’s start by just displaying the first document and verify it’s indeed a valid JSON document:
JS > db.restaurants.find().limit(1)
{
"_id": "000065796b9c0000000000000001",
"name": "Howard-Reyes",
"grades": [
{
"date": {
"$date": "2023-03-12T00:00:00.000+0000"
},
"grade": "B",
"score": 5
}
],
"address": {
"coord": [
8.7301765,
52.705775
],
"street": "Ward Branch",
"zipcode": "22737",
"building": "74523"
},
"borough": "West Brandimouth",
"cuisine": "Hawaiian"
}
1 document in set (0.0016 sec)
Let’s display 10 restaurants and their type of cuisine that are in Brooklyn:
JS > db.restaurants.find("borough = 'Brooklyn'").fields("name", "cuisine").limit(10)
{
"name": "Wendy'S",
"cuisine": "Hamburgers"
}
{
"name": "Riviera Caterer",
"cuisine": "American"
}
{
"name": "Wilken'S Fine Food",
"cuisine": "Delicatessen"
}
{
"name": "Regina Caterers",
"cuisine": "American"
}
{
"name": "Taste The Tropics Ice Cream",
"cuisine": "Ice Cream, Gelato, Yogurt, Ices"
}
{
"name": "C & C Catering Service",
"cuisine": "American"
}
{
"name": "May May Kitchen",
"cuisine": "Chinese"
}
{
"name": "Seuda Foods",
"cuisine": "Jewish/Kosher"
}
{
"name": "Carvel Ice Cream",
"cuisine": "Ice Cream, Gelato, Yogurt, Ices"
}
{
"name": "Nordic Delicacies",
"cuisine": "Delicatessen"
}
10 documents in set (0.3392 sec)
Adding Documents
We can also generate JSON documents without a single line of SQL using the X Dev API.
In this example, we use Python (mysql-connector-python3-8.2.0-1.el8.x86_64
) and we add a document using the add()
method of a collection (line 72):
You can find on GitHub all the required files: https://github.com/lefred/restaurants-mysql-ds
In case you need large JSON documents, you can increase the size of the mysqlx_max_allowed_packet
of your DB System’s configuration:
This will allow you to store JSON documents of 1GB !
HeatWave Accelerator
Now that we all our documents stored in a MySQL HeatWave instance on OCI, could we also benefit from a HeatWave Cluster to boost our queries ?
Since the release of 8.2.0-u1 in OCI, MySQL HeatWave also supports JSON datatype in the HeatWave Cluster. See Accelerating JSON Query Processing using MySQL HeatWave.
When using MySQL Document Store, to be able to load a collection to HeatWave Cluster, we need to modify how the Primary Key is stored as by default, a VARBINARY
is used but not supported in HeatWave secondary engine.
If you run the HeatWave cluster nodes estimation you will get the following message:
Error comment: UNABLE TO LOAD TABLE WHEN PRIMARY KEY COLUMN(S) CANNOT BE LOADED
Columns to be loaded:1 (100% VARLEN-encoded columns)
Estimated number of rows:24082
This is the statement to perform the Primary Key change:
SQL> alter table restaurants modify _id char(28) GENERATED ALWAYS AS
(json_unquote(json_extract(`doc`,_utf8mb4'$._id'))) STORED NOT NULL;
As soon as we have enabled HeatWave Cluster we can load our collection to it:
But before loading it, let’s try a SQL query (yes you can do SQL queries on your JSON documents inside a collection) to see if there is some improvement:
SQL > select doc->>"$.borough" borough, count(*) tot,
max(length(doc->>"$.name")) longest_name,
min(length(doc->>"$.name")) shortest_name,
round(avg(length(doc->>"$.name")),2) avg_name_length
from restaurants
where doc->>"$.cuisine" collate utf8mb4_0900_ai_ci like 'belgian'
group by borough order by tot desc limit 10;
+---------------+-----+--------------+---------------+-----------------+
| borough | tot | longest_name | shortest_name | avg_name_length |
+---------------+-----+--------------+---------------+-----------------+
| Lake Michael | 17 | 28 | 9 | 18.59 |
| Port Michael | 11 | 29 | 8 | 19.91 |
| East Michael | 11 | 25 | 8 | 13.64 |
| South Michael | 11 | 27 | 11 | 19.64 |
| South David | 11 | 30 | 8 | 13.82 |
| Port Matthew | 11 | 27 | 10 | 18.45 |
| Michaelmouth | 10 | 25 | 9 | 15.50 |
| Port Jennifer | 10 | 29 | 7 | 18.50 |
| West Michael | 10 | 26 | 10 | 21.10 |
| Lake James | 10 | 27 | 9 | 15.70 |
+---------------+-----+--------------+---------------+-----------------+
10 rows in set (0.5191 sec)
This took a bit more than half second (0.5191 sec) to parse 962,520 JSON documents.
Now we load our data into our HeatWave Cluster:
SQL > call sys.heatwave_load(JSON_ARRAY('docstore'), NULL);
+------------------------------------------+
| INITIALIZING HEATWAVE AUTO PARALLEL LOAD |
+------------------------------------------+
| Version: 2.41 |
| |
| Load Mode: normal |
| Load Policy: disable_unsupported_columns |
| Output Mode: normal |
| |
+------------------------------------------+
6 rows in set (19.47 sec)
...
+------------------------------------------+
| LOADING TABLE |
+------------------------------------------+
| TABLE (2 of 2): `docstore`.`restaurants` |
| Commands executed successfully: 2 of 2 |
| Warnings encountered: 0 |
| Table loaded successfully! |
| Total columns loaded: 2 |
| Table loaded using 32 thread(s) |
| Elapsed time: 19.47 s |
| |
+------------------------------------------+
8 rows in set (19.47 sec)
+----------------------------------------------------------------+
| LOAD SUMMARY |
+----------------------------------------------------------------+
| |
| SCHEMA TABLES TABLES COLUMNS LOAD |
| NAME LOADED FAILED LOADED DURATION |
| ------ ------ ------ ------- -------- |
| `docstore` 1 0 2 19.47 s |
| |
+----------------------------------------------------------------+
6 rows in set (19.47 sec)
Let’s now run the query again:
SQL > select doc->>"$.borough" borough, count(*) tot,
max(length(doc->>"$.name")) longest_name,
min(length(doc->>"$.name")) shortest_name,
round(avg(length(doc->>"$.name")),2) avg_name_length
from restaurants
where doc->>"$.cuisine" collate utf8mb4_0900_ai_ci like 'belgian'
group by borough order by tot desc limit 10;
+---------------+-----+--------------+---------------+-----------------+
| borough | tot | longest_name | shortest_name | avg_name_length |
+---------------+-----+--------------+---------------+-----------------+
| Lake Michael | 17 | 28 | 9 | 18.59 |
| South David | 11 | 30 | 8 | 13.82 |
| Port Michael | 11 | 29 | 8 | 19.91 |
| South Michael | 11 | 27 | 11 | 19.64 |
| Port Matthew | 11 | 27 | 10 | 18.45 |
| East Michael | 11 | 25 | 8 | 13.64 |
| Port Jennifer | 10 | 29 | 7 | 18.50 |
| West Michael | 10 | 26 | 10 | 21.10 |
| Lake James | 10 | 27 | 9 | 15.70 |
| Michaelmouth | 10 | 25 | 9 | 15.50 |
+---------------+-----+--------------+---------------+-----------------+
10 rows in set (0.1017 sec)
It’s faster ! But it was already very fast. I even had to reduce the value of secondary_engine_cost_threshold
because the query cost was just a little lower and MySQL HeatWave was not offloading it to the accelerator.
But with even more data you could really benefit for the acceleration.
For example with 2,162,520 documents we go from 1.1713 seconds to 0.1922 seconds when using HeatWeave Cluster.
Limitations
All is well in the best of worlds! Almost! When it comes to utilizing MySQL Document Store with MySQL HeatWave Database Service in OCI, you will find an absence of limitations. However, if your intention is to enable the HeatWave Cluster (Accelerator), certain restrictions should be taken into consideration.
The JSON document cannot be larger than 65532 bytes. As the default character set is utf8mb4, the maximum size of a JSON document is then 16Kb.
Currently, the JSON_TABLE()
function is not supported.
Conclusion
MySQL Document Store is highly valued by developers and is available in both MySQL Community Edition and only in MySQL HeatWave when opting for a hosted and managed MySQL Service in the cloud.
With the lasted update, it’s also possible to benefit from HeatWave Cluster Accelerator when using MySQL Document Store on OCI which makes a very good choice for those having to deal with a large amount of JSON documents like in the IoT systems for example.
Enjoy using MySQL without a single line of SQL… even in the cloud !