We recently saw that we can benefit from using JSON documents with MySQL HeatWave cluster in OCI (HeatWave accelerator for MySQL Database Service).
However sometimes the data can’t be loaded to HeatWave Cluster or the query cannot use it.
And this is not always easy to understand why. Let’s get familiar on how to find the info.
Data not loaded in HW Cluster
Let’s see an example with this table (collection):
SQL > show create table listingsAndReviews\G
*************************** 1. row ***************************
Table: listingsAndReviews
Create Table: CREATE TABLE `listingsAndReviews` (
`doc` json DEFAULT NULL,
`_id` char(28) GENERATED ALWAYS AS
(json_unquote(json_extract(`doc`,_utf8mb4'$._id'))) STORED NOT NULL,
`_json_schema` json GENERATED ALWAYS AS
(_utf8mb4'{"type":"object"}') VIRTUAL,
PRIMARY KEY (`_id`),
CONSTRAINT `$val_strict_B70EB65BDDBAB20B0EE5BB7532C9060C422A06F8`
CHECK (json_schema_valid(`_json_schema`,`doc`)) /*!80016 NOT ENFORCED */
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
SECONDARY_ENGINE=rapid
Now when we try to load the data int HeatWave Cluster we get the following output:
SQL > alter table listingsAndReviews secondary_load;
Query OK, 0 rows affected, 3 warnings (19.1727 sec)
Warning (code 3877): Guided Load executed
'ALTER TABLE `docstore`.`listingsAndReviews` MODIFY `_json_schema` json
GENERATED ALWAYS AS (_utf8mb4'{"type":"object"}') VIRTUAL
COMMENT '' NOT SECONDARY'.
Warning (code 3877): Guided Load executed 'ALTER TABLE
`docstore`.`listingsAndReviews` MODIFY `doc` json
DEFAULT NULL COMMENT '' NOT SECONDARY'.
Warning (code 3877): Guided Load requested 11 InnoDB parallel read
threads to load docstore.listingsAndReviews (session variable value: 32).
We can verify which columns are loaded in HeatWave cluster from that table:
SQL> select name, column_name, data_placement_type, nrows
from rpd_column_id ci
join rpd_tables t on t.id=ci.table_id
join rpd_table_id ri on ri.id=t.id
where table_name='listingsAndReviews';
+-----------------------------+-------------+---------------------+-------+
| name | column_name | data_placement_type | nrows |
+-----------------------------+-------------+---------------------+-------+
| docstore.listingsAndReviews | _id | PrimaryKey | 5555 |
+-----------------------------+-------------+---------------------+-------+
1 row in set (0.0010 sec)
We can see that the column doc
, the JSON one, is not loaded. But why ?
If we use in OCI Console the HeatWave Cluster node estimation tool, we also get a warning notifying us that only one column will be loaded:
But once again, we don’t know the reason and as we are using MySQL 8.2.0-u1, JSON is supported, so the column doc
should also be loaded into the secondary engine.
When we try with the auto pilot load (call sys.heatwave_load(JSON_ARRAY('docstore'), NULL)
) we don’t get much more details.
But in the error_log (available in performance_schema
) we have the info we are looking for:
{"warn": "Column size is more than 65532 bytes", "table_name": "listingsAndReviews", "column_name": "doc", "schema_name": "docstore"}
We can see why the JSON column was not loaded, it exceeds the maximum column size allowed of 65532 bytes.
The information is also available in sys.heatwave_autopilot_report
after using the auto pilot load procedure (call sys.heatwave_load()
).
Query not off-loaded to secondary engine
Now let’s see if a query is off-loaded to HeatWave cluster (secondary engine). So first, we check how much queries have been offloaded:
SQL > show status like 'rapid%query%';
+---------------------------+-------+
| Variable_name | Value |
+---------------------------+-------+
| rapid_query_offload_count | 59 |
+---------------------------+-------+
And then we run a query:
SQL > with cte1 as (
select doc->>"$.name" as name, doc->>"$.cuisine" as cuisine,
(
select avg(score)
from json_table (
doc, "$.grades[*]" columns (score int path "$.score")) as r
) as avg_score
from restaurants) select *, rank()
over (
partition by cuisine order by avg_score desc) as `rank`
from cte1 order by `rank`, avg_score desc limit 5;
+-----------------------+--------------------------------+-----------+------+
| name | cuisine | avg_score | rank |
+-----------------------+--------------------------------+-----------+------+
| Juice It Health Bar | Juice, Smoothies, Fruit Salads | 75.0000 | 1 |
| Golden Dragon Cuisine | Chinese | 73.0000 | 1 |
| Palombo Pastry Shop | Bakery | 69.0000 | 1 |
| Go Go Curry | Japanese | 65.0000 | 1 |
| Koyla | Middle Eastern | 61.0000 | 1 |
+-----------------------+--------------------------------+-----------+------+
5 rows in set (31.4319 sec)
31 .4 seconds, this doesn’t seem to be using the HeatWave cluster.
Let’s check again the amount of query offloaded:
SQL > show status like 'rapid%query%';
+---------------------------+-------+
| Variable_name | Value |
+---------------------------+-------+
| rapid_query_offload_count | 59 |
+---------------------------+-------+
Same amount….
We can verify the Query Execution Plan using EXPLAIN
:
EXPLAIN: -> Limit: 5 row(s)
-> Sort row IDs: rank, cte1.avg_score DESC
-> Table scan on <temporary> (cost=2.5..2.5 rows=0)
-> Temporary table (cost=0..0 rows=0)
-> Window aggregate: rank() OVER (PARTITION BY cte1.cuisine ORDER BY cte1.avg_score desc )
-> Sort row IDs: cte1.cuisine, cte1.avg_score DESC (cost=5.08e+6..5.08e+6 rows=2.1e+6)
-> Table scan on cte1 (cost=438291..464507 rows=2.1e+6)
-> Materialize CTE cte1 (cost=438291..438291 rows=2.1e+6)
-> Table scan on restaurants (cost=228577 rows=2.1e+6)
-> Select #3 (subquery in projection; dependent)
-> Aggregate: avg(r.score) (cost=0 rows=1)
-> Materialize table function
There is indeed not sign of using the secondary engine.
Usually, when using HeatWave cluster, we can enable the optimizer trace:
SQL > set optimizer_trace="enabled=on";
SQL > explain format=tree with cte1 as .... <THE QUERY> ... desc limit 5\G
[the output of the QEP]
And then we verify the reason in information_schema.optimizer_trace
:
SQL > select query, trace->'$**.Rapid_Offload_Fails' 'Offload Failed',
trace->'$**.secondary_engine_not_used' 'HeatWave Not Used'
from information_schema.optimizer_trace\G
*************************** 1. row ***************************
query: explain format=tree with cte1 as (select doc->>"$.name" as name,
doc->>"$.cuisine" as cuisine, (select avg(score) from json_table (doc,
"$.grades[*]" columns (score int path "$.score")) as r) as avg_score from
restaurants) select *, rank() over ( partition by cuisine order by avg_score
desc) as `rank` from cte1 order by `rank`, avg_score desc limit 5
Offload Failed: NULL
HeatWave Not Used: NULL
Most of the time, when a query is not offloaded to the secondary engine, we have the reason… but not this time, the value is NULL !
This could be related to the JSON datatype which was very recently supported.
Let’s have a look in the error log then:
SQL> select * from (select * from performance_schema.error_log
where subsystem='RAPID' order by 1 desc limit 3) a order by 1\G
*************************** 1. row ***************************
LOGGED: 2023-12-15 17:41:58.876588
THREAD_ID: 3535
PRIO: System
ERROR_CODE: MY-011071
SUBSYSTEM: RAPID
DATA: Attempting offload of query for HeatWave. qid=142, my_qid=5253596,
DB = "docstore". Query = "explain format=tree with cte1 as
(select doc->>"$.name" as name, doc->>"$.cuisine" as cuisine,
(select avg(score) from json_table (doc, "$.grades[*]" columns
(score int path "$.score")) as r) as avg_score from restaurants)
select *, rank() over ( partition by cuisine order by avg_score desc)
as `rank` from cte1 order by `rank`, avg_score desc limit 5"
*************************** 2. row ***************************
LOGGED: 2023-12-15 17:41:58.876733
THREAD_ID: 3535
PRIO: System
ERROR_CODE: MY-011071
SUBSYSTEM: RAPID
DATA: Starting costing for HeatWave query qid=142
*************************** 3. row ***************************
LOGGED: 2023-12-15 17:41:58.876763
THREAD_ID: 3535
PRIO: System
ERROR_CODE: MY-011071
SUBSYSTEM: RAPID
DATA: Heatwave chosen plan quality accepted: qid=142 : total SG pairs: 0, Plan Quality metric: 1.0 , Plan Quality threshold threshold: 90.0
We can see that the MySQL attempted to offload the query….
So, how can we know the reason why the secondary engine is not used ?
There is an Optimizer Hint that will help us finding the reason: /*+ set_var(use_secondary_engine=forced) */
.
When used, this optimizer hint will try to offload the query to HeatWave cluster but won’t run it on InnoDB when it fails to run on the secondary engine. Let’s try:
SQL > with cte1 as (
select /*+ set_var(use_secondary_engine=forced) */
doc->>"$.name" as name, doc->>"$.cuisine" as cuisine,
(
select avg(score)
from json_table (
doc, "$.grades[*]" columns (score int path "$.score")) as r
) as avg_score
from restaurants) select *, rank()
over (
partition by cuisine order by avg_score desc) as `rank`
from cte1 order by `rank`, avg_score desc limit 5;
ERROR: 3889: Secondary engine operation failed. Reason: "SUBQUERY not yet
supported", "Table ``.`/db/tmp/#sql36f2_dcf_135` is not loaded in HeatWave".
Here we have a the reason why the query is not running in our HeatWave Clustser.
And in fact this unsupported subquery is related to the function JSON_TABLES()
that is not supported in HeatWave cluster.
Conclusion
When encountering issues while utilizing JSON documents in the MySQL HeatWave Database Service in OCI and enabling the HeatWave cluster, it is crucial to become familiar with the proper way to obtain the necessary information for resolution. We learn how users can find those resource that can guide them in troubleshooting and rectifying any unexpected functionality problems.
Enjoy fast JSON in MySQL HeatWave on OCI !
there’s spurious “Reason: “SUBQUERY not yet supported”, “Table “.`/db/tmp/#sql36f2_dcf_135`
is not loaded in HeatWave””
in outputs 1 and 3 in the middle of the other output, looks like it shouldn’t be there.
Thank you, I fixed it… no clue how it ended up there :'(