MySQL Document Store in OCI with MySQL HeatWave

Since the release of MySQL 8.0, the MySQL X Dev API has provided users with the convenient ability to utilize MySQL without the need to write a single line of SQL!

MySQL X Dev API brings the support for CRUD operations on JSON documents that are stored in MySQL. MySQL Document Store is ACID compliant and is compatible with everything MySQL like replication, InnoDB Cluster, …

The MySQL X Dev API is available using the MySQL X Protocol, listening by default on port 33060.

If you are interested in learning more about MySQL Document Store, please refer to the these presentations [1], [2], [3].

OCI MySQL HeatWave Database Service

The MySQL DBaaS on OCI is the only one providing access to the X Protocol. This mean you can create applications that doesn’t use any SQL to add, delete and modify and retrieve JSON documents stored in the DB System.

Let’s add some documents to our DB System. We will use the restaurants collection used in various MongoDB examples.

We first connect to our DB System (MySQL HeatWave instance in OCI) using MySQL Shell and the X Protocol (default). You can use a VPN, a Compute Instance, a bastion host…

Next, we create a new schema and we import all the JSON documents using the importJson() utility:

JS > session.createSchema('docstore')
<Schema:docstore>
JS > \u docstore
Default schema `docstore` accessible through db.
JS > util.importJson('restaurants.json', {convertBsonOid: true})
Importing from file "restaurants.json" to collection 
`docstore`.`restaurants` in MySQL Server at 10.0.1.249:33060

.. 25359.. 25359
Processed 15.60 MB in 25359 documents in 0.9856 sec (25.36K documents/s)
Total successfully imported documents 25359 (25.36K documents/s)

Done ! It’s fast and easy.

CRUD

It’s time to test the import and read the data we have imported.

Let’s start by just displaying the first document and verify it’s indeed a valid JSON document:

JS > db.restaurants.find().limit(1)
{
    "_id": "000065796b9c0000000000000001",
    "name": "Howard-Reyes",
    "grades": [
        {
            "date": {
                "$date": "2023-03-12T00:00:00.000+0000"
            },
            "grade": "B",
            "score": 5
        }
    ],
    "address": {
        "coord": [
            8.7301765,
            52.705775
        ],
        "street": "Ward Branch",
        "zipcode": "22737",
        "building": "74523"
    },
    "borough": "West Brandimouth",
    "cuisine": "Hawaiian"
}
1 document in set (0.0016 sec)

Let’s display 10 restaurants and their type of cuisine that are in Brooklyn:

JS > db.restaurants.find("borough = 'Brooklyn'").fields("name", "cuisine").limit(10)
{
    "name": "Wendy'S",
    "cuisine": "Hamburgers"
}
{
    "name": "Riviera Caterer",
    "cuisine": "American"
}
{
    "name": "Wilken'S Fine Food",
    "cuisine": "Delicatessen"
}
{
    "name": "Regina Caterers",
    "cuisine": "American"
}
{
    "name": "Taste The Tropics Ice Cream",
    "cuisine": "Ice Cream, Gelato, Yogurt, Ices"
}
{
    "name": "C & C Catering Service",
    "cuisine": "American"
}
{
    "name": "May May Kitchen",
    "cuisine": "Chinese"
}
{
    "name": "Seuda Foods",
    "cuisine": "Jewish/Kosher"
}
{
    "name": "Carvel Ice Cream",
    "cuisine": "Ice Cream, Gelato, Yogurt, Ices"
}
{
    "name": "Nordic Delicacies",
    "cuisine": "Delicatessen"
}
10 documents in set (0.3392 sec)

Adding Documents

We can also generate JSON documents without a single line of SQL using the X Dev API.

In this example, we use Python (mysql-connector-python3-8.2.0-1.el8.x86_64) and we add a document using the add() method of a collection (line 72):

#!/usr/bin/python3.8
import mysqlx
import sys
from random import choice, randrange
from faker import Faker
def connect():
session = mysqlx.get_session(
{
"host": "db instance IP",
"port": 33060,
"user": "login",
"password": "password",
"ssl-mode": "REQUIRED",
}
)
return session
def gen_cuisine():
cuisine_list = [
"Belgian",
"Italian"
]
return choice(cuisine_list)
session = connect()
db = session.get_schema("docstore")
col = db.get_collection("restaurants")
fake = Faker()
print("Generating new documents.", end="", flush=True)
total = 1000
if len(sys.argv) > 1:
if sys.argv[1]:
total = int(sys.argv[1])
for _ in range(total):
doc = {}
doc["name"] = fake.company()
address = {}
address["street"] = fake.street_name()
address["building"] = fake.building_number()
address["zipcode"] = fake.postcode()
doc["borough"] = fake.city()
doc["cuisine"] = gen_cuisine()
coord = []
coord.append(float(fake.latitude()))
coord.append(float(fake.longitude()))
address["coord"] = coord
doc["address"] = address
grades = []
for _ in range(randrange(5)):
grade = {}
grade_date = {}
date = fake.date_time_this_decade()
grade_date["$date"] = date.strftime("%Y-%m-%dT00:00:00.000+0000")
grade_note = choice(["A", "B", "C"])
grade_score = randrange(20)
grade["date"] = grade_date
grade["grade"] = grade_note
grade["score"] = grade_score
grades.append(grade)
doc["grades"] = grades
col.add(doc).execute()
if total > 100000 and not _ % 1000:
print(".", end="", flush=True)
else:
print(".", end="", flush=True)
print("\nDone ! {} documents generated.".format(total))

You can find on GitHub all the required files: https://github.com/lefred/restaurants-mysql-ds

In case you need large JSON documents, you can increase the size of the mysqlx_max_allowed_packet of your DB System’s configuration:

This will allow you to store JSON documents of 1GB !

HeatWave Accelerator

Now that we all our documents stored in a MySQL HeatWave instance on OCI, could we also benefit from a HeatWave Cluster to boost our queries ?

Since the release of 8.2.0-u1 in OCI, MySQL HeatWave also supports JSON datatype in the HeatWave Cluster. See Accelerating JSON Query Processing using MySQL HeatWave.

When using MySQL Document Store, to be able to load a collection to HeatWave Cluster, we need to modify how the Primary Key is stored as by default, a VARBINARY is used but not supported in HeatWave secondary engine.

If you run the HeatWave cluster nodes estimation you will get the following message:

Error comment: UNABLE TO LOAD TABLE WHEN PRIMARY KEY COLUMN(S) CANNOT BE LOADED
Columns to be loaded:1 (100% VARLEN-encoded columns)
Estimated number of rows:24082

This is the statement to perform the Primary Key change:

SQL> alter table restaurants modify _id char(28) GENERATED ALWAYS AS
     (json_unquote(json_extract(`doc`,_utf8mb4'$._id'))) STORED NOT NULL;

As soon as we have enabled HeatWave Cluster we can load our collection to it:

But before loading it, let’s try a SQL query (yes you can do SQL queries on your JSON documents inside a collection) to see if there is some improvement:

SQL > select doc->>"$.borough" borough, count(*) tot,
             max(length(doc->>"$.name")) longest_name, 
             min(length(doc->>"$.name")) shortest_name,
             round(avg(length(doc->>"$.name")),2) avg_name_length 
      from restaurants 
      where doc->>"$.cuisine" collate utf8mb4_0900_ai_ci like 'belgian' 
      group by borough order by tot desc limit 10;
+---------------+-----+--------------+---------------+-----------------+
| borough       | tot | longest_name | shortest_name | avg_name_length |
+---------------+-----+--------------+---------------+-----------------+
| Lake Michael  |  17 |           28 |             9 |           18.59 |
| Port Michael  |  11 |           29 |             8 |           19.91 |
| East Michael  |  11 |           25 |             8 |           13.64 |
| South Michael |  11 |           27 |            11 |           19.64 |
| South David   |  11 |           30 |             8 |           13.82 |
| Port Matthew  |  11 |           27 |            10 |           18.45 |
| Michaelmouth  |  10 |           25 |             9 |           15.50 |
| Port Jennifer |  10 |           29 |             7 |           18.50 |
| West Michael  |  10 |           26 |            10 |           21.10 |
| Lake James    |  10 |           27 |             9 |           15.70 |
+---------------+-----+--------------+---------------+-----------------+
10 rows in set (0.5191 sec)

This took a bit more than half second (0.5191 sec) to parse 962,520 JSON documents.

Now we load our data into our HeatWave Cluster:

SQL > call sys.heatwave_load(JSON_ARRAY('docstore'), NULL);
+------------------------------------------+
| INITIALIZING HEATWAVE AUTO PARALLEL LOAD |
+------------------------------------------+
| Version: 2.41                            |
|                                          |
| Load Mode: normal                        |
| Load Policy: disable_unsupported_columns |
| Output Mode: normal                      |
|                                          |
+------------------------------------------+
6 rows in set (19.47 sec)
...
+------------------------------------------+
| LOADING TABLE                            |
+------------------------------------------+
| TABLE (2 of 2): `docstore`.`restaurants` |
| Commands executed successfully: 2 of 2   |
| Warnings encountered: 0                  |
| Table loaded successfully!               |
|   Total columns loaded: 2                |
|   Table loaded using 32 thread(s)        |
|   Elapsed time: 19.47 s                  |
|                                          |
+------------------------------------------+
8 rows in set (19.47 sec)
+----------------------------------------------------------------+
| LOAD SUMMARY                                                   |
+----------------------------------------------------------------+
|                                                                |
| SCHEMA           TABLES       TABLES      COLUMNS         LOAD |
| NAME             LOADED       FAILED       LOADED     DURATION |
| ------           ------       ------      -------     -------- |
| `docstore`            1            0            2      19.47 s |
|                                                                |
+----------------------------------------------------------------+
6 rows in set (19.47 sec)

Let’s now run the query again:

SQL > select doc->>"$.borough" borough, count(*) tot,
             max(length(doc->>"$.name")) longest_name, 
             min(length(doc->>"$.name")) shortest_name,
             round(avg(length(doc->>"$.name")),2) avg_name_length 
      from restaurants 
      where doc->>"$.cuisine" collate utf8mb4_0900_ai_ci like 'belgian' 
      group by borough order by tot desc limit 10;
+---------------+-----+--------------+---------------+-----------------+
| borough       | tot | longest_name | shortest_name | avg_name_length |
+---------------+-----+--------------+---------------+-----------------+
| Lake Michael  |  17 |           28 |             9 |           18.59 |
| South David   |  11 |           30 |             8 |           13.82 |
| Port Michael  |  11 |           29 |             8 |           19.91 |
| South Michael |  11 |           27 |            11 |           19.64 |
| Port Matthew  |  11 |           27 |            10 |           18.45 |
| East Michael  |  11 |           25 |             8 |           13.64 |
| Port Jennifer |  10 |           29 |             7 |           18.50 |
| West Michael  |  10 |           26 |            10 |           21.10 |
| Lake James    |  10 |           27 |             9 |           15.70 |
| Michaelmouth  |  10 |           25 |             9 |           15.50 |
+---------------+-----+--------------+---------------+-----------------+
10 rows in set (0.1017 sec)

It’s faster ! But it was already very fast. I even had to reduce the value of secondary_engine_cost_threshold because the query cost was just a little lower and MySQL HeatWave was not offloading it to the accelerator.

But with even more data you could really benefit for the acceleration.

For example with 2,162,520 documents we go from 1.1713 seconds to 0.1922 seconds when using HeatWeave Cluster.

Limitations

All is well in the best of worlds! Almost! When it comes to utilizing MySQL Document Store with MySQL HeatWave Database Service in OCI, you will find an absence of limitations. However, if your intention is to enable the HeatWave Cluster (Accelerator), certain restrictions should be taken into consideration.

The JSON document cannot be larger than 65532 bytes. As the default character set is utf8mb4, the maximum size of a JSON document is then 16Kb.

Currently, the JSON_TABLE() function is not supported.

Conclusion

MySQL Document Store is highly valued by developers and is available in both MySQL Community Edition and only in MySQL HeatWave when opting for a hosted and managed MySQL Service in the cloud.

With the lasted update, it’s also possible to benefit from HeatWave Cluster Accelerator when using MySQL Document Store on OCI which makes a very good choice for those having to deal with a large amount of JSON documents like in the IoT systems for example.

Enjoy using MySQL without a single line of SQL… even in the cloud !

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Leave a Reply

Your email address will not be published. Required fields are marked *

As MySQL Community Manager, I am an employee of Oracle and the views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

You can find articles I wrote on Oracle’s blog.