JSON in MySQL HeatWave: why is data or my query not off-loaded to HeatWave Cluster ?

We recently saw that we can benefit from using JSON documents with MySQL HeatWave cluster in OCI (HeatWave accelerator for MySQL Database Service).

However sometimes the data can’t be loaded to HeatWave Cluster or the query cannot use it.

And this is not always easy to understand why. Let’s get familiar on how to find the info.

Data not loaded in HW Cluster

Let’s see an example with this table (collection):

SQL > show create table listingsAndReviews\G
*************************** 1. row ***************************
       Table: listingsAndReviews
Create Table: CREATE TABLE `listingsAndReviews` (
  `doc` json DEFAULT NULL,
  `_id` char(28) GENERATED ALWAYS AS
 (json_unquote(json_extract(`doc`,_utf8mb4'$._id'))) STORED NOT NULL,
  `_json_schema` json GENERATED ALWAYS AS 
  (_utf8mb4'{"type":"object"}') VIRTUAL,
  PRIMARY KEY (`_id`),
  CONSTRAINT `$val_strict_B70EB65BDDBAB20B0EE5BB7532C9060C422A06F8` 
  CHECK (json_schema_valid(`_json_schema`,`doc`)) /*!80016 NOT ENFORCED */
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
 SECONDARY_ENGINE=rapid

Now when we try to load the data int HeatWave Cluster we get the following output:

SQL > alter table listingsAndReviews secondary_load;
Query OK, 0 rows affected, 3 warnings (19.1727 sec)
Warning (code 3877): Guided Load executed 
'ALTER TABLE `docstore`.`listingsAndReviews` MODIFY `_json_schema` json 
GENERATED ALWAYS AS (_utf8mb4'{"type":"object"}') VIRTUAL 
COMMENT '' NOT SECONDARY'.
Warning (code 3877): Guided Load executed 'ALTER TABLE
`docstore`.`listingsAndReviews` MODIFY `doc` json 
DEFAULT NULL COMMENT '' NOT SECONDARY'.
Warning (code 3877): Guided Load requested 11 InnoDB parallel read 
threads to load docstore.listingsAndReviews (session variable value: 32).

We can verify which columns are loaded in HeatWave cluster from that table:

SQL> select name, column_name, data_placement_type, nrows 
   from rpd_column_id ci 
   join rpd_tables t on t.id=ci.table_id 
     join rpd_table_id ri on ri.id=t.id 
     where table_name='listingsAndReviews';
+-----------------------------+-------------+---------------------+-------+
| name                        | column_name | data_placement_type | nrows |
+-----------------------------+-------------+---------------------+-------+
| docstore.listingsAndReviews | _id         | PrimaryKey          |  5555 |
+-----------------------------+-------------+---------------------+-------+
1 row in set (0.0010 sec)

We can see that the column doc, the JSON one, is not loaded. But why ?

If we use in OCI Console the HeatWave Cluster node estimation tool, we also get a warning notifying us that only one column will be loaded:

But once again, we don’t know the reason and as we are using MySQL 8.2.0-u1, JSON is supported, so the column doc should also be loaded into the secondary engine.

When we try with the auto pilot load (call sys.heatwave_load(JSON_ARRAY('docstore'), NULL)) we don’t get much more details.

But in the error_log (available in performance_schema) we have the info we are looking for:

{"warn": "Column size is more than 65532 bytes", "table_name": "listingsAndReviews", "column_name": "doc", "schema_name": "docstore"}

We can see why the JSON column was not loaded, it exceeds the maximum column size allowed of 65532 bytes.

The information is also available in sys.heatwave_autopilot_report after using the auto pilot load procedure (call sys.heatwave_load()).

Query not off-loaded to secondary engine

Now let’s see if a query is off-loaded to HeatWave cluster (secondary engine). So first, we check how much queries have been offloaded:

SQL >  show status like 'rapid%query%';
+---------------------------+-------+
| Variable_name             | Value |
+---------------------------+-------+
| rapid_query_offload_count | 59    |
+---------------------------+-------+

And then we run a query:

SQL > with cte1 as (
       select doc->>"$.name" as name,  doc->>"$.cuisine" as cuisine, 
       (
        select avg(score) 
         from json_table (
            doc, "$.grades[*]" columns (score int path "$.score")) as r
         ) as  avg_score 
        from restaurants) select *, rank() 
        over (  
          partition by cuisine order by avg_score desc) as `rank`  
     from cte1 order by `rank`, avg_score desc limit 5;
+-----------------------+--------------------------------+-----------+------+
| name                  | cuisine                        | avg_score | rank |
+-----------------------+--------------------------------+-----------+------+
| Juice It Health Bar   | Juice, Smoothies, Fruit Salads |   75.0000 |    1 |
| Golden Dragon Cuisine | Chinese                        |   73.0000 |    1 |
| Palombo Pastry Shop   | Bakery                         |   69.0000 |    1 |
| Go Go Curry           | Japanese                       |   65.0000 |    1 |
| Koyla                 | Middle Eastern                 |   61.0000 |    1 |
+-----------------------+--------------------------------+-----------+------+
5 rows in set (31.4319 sec)

31 .4 seconds, this doesn’t seem to be using the HeatWave cluster.

Let’s check again the amount of query offloaded:

SQL > show status like 'rapid%query%';
+---------------------------+-------+
| Variable_name             | Value |
+---------------------------+-------+
| rapid_query_offload_count | 59    |
+---------------------------+-------+

Same amount….

We can verify the Query Execution Plan using EXPLAIN:

EXPLAIN: -> Limit: 5 row(s)
    -> Sort row IDs: rank, cte1.avg_score DESC
        -> Table scan on <temporary>  (cost=2.5..2.5 rows=0)
            -> Temporary table  (cost=0..0 rows=0)
                -> Window aggregate: rank() OVER (PARTITION BY cte1.cuisine ORDER BY cte1.avg_score desc ) 
                    -> Sort row IDs: cte1.cuisine, cte1.avg_score DESC  (cost=5.08e+6..5.08e+6 rows=2.1e+6)
                        -> Table scan on cte1  (cost=438291..464507 rows=2.1e+6)
                            -> Materialize CTE cte1  (cost=438291..438291 rows=2.1e+6)
                                -> Table scan on restaurants  (cost=228577 rows=2.1e+6)
                                -> Select #3 (subquery in projection; dependent)
                                    -> Aggregate: avg(r.score)  (cost=0 rows=1)
                                        -> Materialize table function

There is indeed not sign of using the secondary engine.

Usually, when using HeatWave cluster, we can enable the optimizer trace:

SQL > set  optimizer_trace="enabled=on";
SQL > explain format=tree with cte1 as .... <THE QUERY> ... desc limit 5\G
[the output of the QEP]

And then we verify the reason in information_schema.optimizer_trace:

SQL > select query, trace->'$**.Rapid_Offload_Fails' 'Offload Failed',
             trace->'$**.secondary_engine_not_used' 'HeatWave Not Used'
      from information_schema.optimizer_trace\G
*************************** 1. row ***************************
            query: explain format=tree with cte1 as (select doc->>"$.name" as name,
  doc->>"$.cuisine" as cuisine, (select avg(score) from json_table (doc,
 "$.grades[*]" columns (score int path "$.score")) as r) as  avg_score from
 restaurants) select *, rank() over (  partition by cuisine order by avg_score
 desc) as `rank`  from cte1 order by `rank`, avg_score desc limit 5
   Offload Failed: NULL
HeatWave Not Used: NULL

Most of the time, when a query is not offloaded to the secondary engine, we have the reason… but not this time, the value is NULL !

This could be related to the JSON datatype which was very recently supported.

Let’s have a look in the error log then:

SQL> select * from (select * from performance_schema.error_log 
     where subsystem='RAPID' order by 1 desc limit 3) a order by 1\G
*************************** 1. row ***************************
    LOGGED: 2023-12-15 17:41:58.876588
 THREAD_ID: 3535
      PRIO: System
ERROR_CODE: MY-011071
 SUBSYSTEM: RAPID
      DATA: Attempting offload of query for HeatWave. qid=142, my_qid=5253596,
 DB = "docstore". Query = "explain format=tree with cte1 as 
(select doc->>"$.name" as name,  doc->>"$.cuisine" as cuisine,
(select avg(score) from json_table (doc, "$.grades[*]" columns 
(score int path "$.score")) as r) as  avg_score from restaurants) 
select *, rank() over (  partition by cuisine order by avg_score desc)
 as `rank`  from cte1 order by `rank`, avg_score desc limit 5"
*************************** 2. row ***************************
    LOGGED: 2023-12-15 17:41:58.876733
 THREAD_ID: 3535
      PRIO: System
ERROR_CODE: MY-011071
 SUBSYSTEM: RAPID
      DATA: Starting costing for HeatWave query qid=142
*************************** 3. row ***************************
    LOGGED: 2023-12-15 17:41:58.876763
 THREAD_ID: 3535
      PRIO: System
ERROR_CODE: MY-011071
 SUBSYSTEM: RAPID
      DATA: Heatwave chosen plan quality accepted: qid=142 : total SG pairs: 0, Plan Quality metric: 1.0 , Plan Quality threshold threshold: 90.0

We can see that the MySQL attempted to offload the query….

So, how can we know the reason why the secondary engine is not used ?

There is an Optimizer Hint that will help us finding the reason: /*+ set_var(use_secondary_engine=forced) */.

When used, this optimizer hint will try to offload the query to HeatWave cluster but won’t run it on InnoDB when it fails to run on the secondary engine. Let’s try:

SQL > with cte1 as (
       select /*+ set_var(use_secondary_engine=forced) */
       doc->>"$.name" as name,  doc->>"$.cuisine" as cuisine, 
       (
        select avg(score) 
         from json_table (
            doc, "$.grades[*]" columns (score int path "$.score")) as r
         ) as  avg_score 
        from restaurants) select *, rank() 
        over (  
          partition by cuisine order by avg_score desc) as `rank`  
     from cte1 order by `rank`, avg_score desc limit 5;
ERROR: 3889: Secondary engine operation failed. Reason: "SUBQUERY not yet
 supported", "Table ``.`/db/tmp/#sql36f2_dcf_135` is not loaded in HeatWave".

Here we have a the reason why the query is not running in our HeatWave Clustser.

And in fact this unsupported subquery is related to the function JSON_TABLES() that is not supported in HeatWave cluster.

Conclusion

When encountering issues while utilizing JSON documents in the MySQL HeatWave Database Service in OCI and enabling the HeatWave cluster, it is crucial to become familiar with the proper way to obtain the necessary information for resolution. We learn how users can find those resource that can guide them in troubleshooting and rectifying any unexpected functionality problems.

Enjoy fast JSON in MySQL HeatWave on OCI !

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

2 Comments

  1. there’s spurious “Reason: “SUBQUERY not yet supported”, “Table “.`/db/tmp/#sql36f2_dcf_135`
    is not loaded in HeatWave””

    in outputs 1 and 3 in the middle of the other output, looks like it shouldn’t be there.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

As MySQL Community Manager, I am an employee of Oracle and the views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

You can find articles I wrote on Oracle’s blog.