This post is the third and last of a series of articles dedicated to MySQL Database Service (MDS):
- Using MySQL Database Service in OCI – Part 1: creating a MySQL DB System
- Using MySQL Database Service in OCI – Part 2: importing Data to MySQL DB System
- Using MySQL Database Service in OCI – Part 3: faster with MySQL HeatWave !
The goal of this series if to get familiar with MDS, HeatWave and useful MySQL Shell utilities.
We can now start a query on our MDS instance that has been populated with >200M rows:
SQL > SELECT year, Reporting_Airline, AVG(ArrDelay) AS avgArrDelay FROM ontime WHERE Reporting_Airline IN ('AA', 'UA', 'DL') GROUP BY Reporting_Airline, year ORDER BY year, Reporting_Airline; +------+-------------------+----------------------+ | year | Reporting_Airline | avgArrDelay | +------+-------------------+----------------------+ | 1987 | AA | 4.839905281581386 | | 1987 | DL | 11.78429388686475 | | 1987 | UA | 8.559413984694412 | | 1988 | AA | 4.172048644346153 | | 1988 | DL | 6.116446922543346 | ... | 2021 | UA | 3.7546497666434027 | +------+-------------------+----------------------+ 105 rows in set (8 min 42.8133 sec)
This is our reference time.
Enabling HeatWave Cluster
We start by enabling HeatWave Cluster for our MySQL DB System from the OCI console:
You can choose either ways to enable HeatWave to your MDS instance.
Then on the next screen of the Wizard, I recommend you to click on Estimate Node Count. This will start some processing that will advice you a valid number of nodes for your current data. If your data grows over time, I recommend to run the estimate periodically:
You can then choose the schemas or tables you want to load into HeatWave:
The node count estimation determined that 1 node was enough but a minimum of 2 is mandatory.
We also get the load command to load that schema into HeatWave once the cluster will be ready:
Let’s go then for 2 and we can see in the MySQL DB System details page that the cluster is being created:
When ready, these orange dots become green:
From MySQL Shell, we can check some status variables to see if the HeatWave is ready:
SQL > select * from performance_schema.global_status where variable_name like 'rapid%er%status'; +----------------------+----------------+ | VARIABLE_NAME | VARIABLE_VALUE | +----------------------+----------------+ | rapid_cluster_status | ON | | rapid_service_status | ONLINE | +----------------------+----------------+ 2 rows in set (0.0011 sec)
Perfect !
Now let’s run our query again:
SQL > SELECT year, Reporting_Airline, AVG(ArrDelay) AS avgArrDelay FROM ontime WHERE Reporting_Airline IN ('AA', 'UA', 'DL') GROUP BY Reporting_Airline, year ORDER BY year, Reporting_Airline; +------+-------------------+----------------------+ | year | Reporting_Airline | avgArrDelay | +------+-------------------+----------------------+ | 1987 | AA | 4.839905281581386 | | 1987 | DL | 11.78429388686475 | | 1987 | UA | 8.559413984694412 | | 1988 | AA | 4.172048644346153 | | 1988 | DL | 6.116446922543346 | ... | 2021 | UA | 3.7546497666434027 | +------+-------------------+----------------------+ 105 rows in set (8 min 38.2327 sec)
euh… ??
In fact this is expected. Once your HeatWave cluster is ready, you still need to load the data you want to with it. Having the possibility to load and unload data from HeatWave on demand allows you to pay only for what you consume.
Remember when we ran the Estimation Count, we received a statement to load the data. Let’s use it:
SQL > CALL sys.heatwave_load(JSON_ARRAY('airline'), NULL);
This will load the data and return a report. At the end of it we can see:
+-----------------------------------------------------------------------+ | LOAD SUMMARY | +-----------------------------------------------------------------------+ | | | SCHEMA TABLES TABLES COLUMNS LOAD | | NAME LOADED FAILED LOADED DURATION | | ------ ------ ------ ------- -------- | | `airline` 1 0 110 2.39 min | | | +-----------------------------------------------------------------------+ 6 rows in set (2 min 23.5079 sec)
Let’s run it again now:
105 rows in set (0.6596 sec)
Wow ! This is what I call a Query Response Time improvement !
Operating MySQL HeatWave
We can easily verify that the query was offloaded to HeatWave as there is a global status variable that counts all the queries which used our HeatWave cluster:
SQL > select * from performance_schema.global_status where variable_name ='rapid_query_offload_count'; +---------------------------+----------------+ | VARIABLE_NAME | VARIABLE_VALUE | +---------------------------+----------------+ | rapid_query_offload_count | 1 | +---------------------------+----------------+ 1 row in set (0.0010 sec)
We can always check is the HeatWave is enabled and the cluster online:
SQL > select * from performance_schema.global_status where variable_name like 'rapid%er%status'; +----------------------+----------------+ | VARIABLE_NAME | VARIABLE_VALUE | +----------------------+----------------+ | rapid_cluster_status | ON | | rapid_service_status | ONLINE | +----------------------+----------------+ 2 rows in set (0.0013 sec)
As we saw in previously, it’s important to be sure the tables we want to use have been loaded to HeatWave.
The list is also available from performance_schema
:
SQL > select rpt.NAME, rp.* from performance_schema.rpd_tables rp join performance_schema.rpd_table_id rpt on rpt.id = rp.id\G *************************** 1. row *************************** NAME: airline.ontime ID: 2 SNAPSHOT_SCN: 3 PERSISTED_SCN: 3 POOL_TYPE: RAPID_LOAD_POOL_TRANSACTIONAL DATA_PLACEMENT_TYPE: PrimaryKey NROWS: 201575308 LOAD_STATUS: AVAIL_RPDGSTABSTATE LOAD_PROGRESS: 100 SIZE_BYTES: 66597158912 QUERY_COUNT: 1 LAST_QUERIED: 2021-12-16 18:57:36.49062 LOAD_END_TIMESTAMP: 2021-12-16 18:54:16.582146
To see if a query will be offloaded to HeatWave or not, EXPLAIN
can also be used:
SQL > EXPLAIN SELECT year, Reporting_Airline, AVG(ArrDelay) AS avgArrDelay FROM ontime WHERE Reporting_Airline IN ('AA', 'UA', 'DL') GROUP BY Reporting_Airline, year ORDER BY year, Reporting_Airline\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: ontime partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 178572003 filtered: 30.000001907348633 Extra: Using where; Using temporary; Using filesort; Using secondary engine RAPID
In a future post, I will also show how to get more information related to queries not offloaded to HeatWave
Faster with Machine Learning
MySQL HeatWave includes several Autopilot processes. As we already ran one query, let’s see if Autopilot Encoding will be able to improve our query even more:
SQL > CALL sys.heatwave_advisor( JSON_OBJECT("auto_enc",JSON_OBJECT("mode", "recommend")));
This generates a report with some encoding suggestions.
Let’s retrieve them in a format we can copy/paste to apply them in MySQL Shell:
SQL > \P cut -d '|' -f2 SQL > SELECT log->>"$.sql" AS "SQL Script" FROM sys.heatwave_advisor_report WHERE type = "sql" ORDER BY id; ..... SQL > \P
And now we cut and paste all the statements in the output above.
This is a bunch of SET
and ALTER
statements:
SQL > SET SESSION innodb_parallel_read_threads = 32; SQL > ALTER TABLE `airline`.`ontime` SECONDARY_UNLOAD; Query OK, 0 rows affected (0.0030 sec) SQL > ALTER TABLE `airline`.`ontime` SECONDARY_ENGINE=NULL; Query OK, 0 rows affected (0.0160 sec) SQL > ALTER TABLE `airline`.`ontime` MODIFY `ArrTimeBlk` varchar(9) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL COMMENT ' RAPID_COLUMN_AUTODB=ENCODING=SORTED RAPID_COLUMN=ENCODING=SORTED '; Query OK, 0 rows affected (0.0298 sec) SQL > ALTER TABLE `airline`.`ontime` MODIFY `DepTimeBlk` varchar(9) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL COMMENT ' RAPID_COLUMN_AUTODB=ENCODING=SORTED RAPID_COLUMN=ENCODING=SORTED '; Query OK, 0 rows affected (0.0174 sec) [...] SQL > ALTER TABLE `airline`.`ontime` SECONDARY_ENGINE=RAPID; Query OK, 0 rows affected (0.0130 sec) SQL > ALTER TABLE `airline`.`ontime` SECONDARY_LOAD; Query OK, 0 rows affected (3 min 7.6535 sec)
We can re-run our query:
SQL > SELECT year, Reporting_Airline, AVG(ArrDelay) AS avgArrDelay FROM ontime WHERE Reporting_Airline IN ('AA', 'UA', 'DL') GROUP BY Reporting_Airline, year ORDER BY year, Reporting_Airline; 105 rows in set (0.6087 sec)
As we had only one query and a single table, there’s not much improvement. But if you use multiple queries and joins, this can provide even better results !
With multiple tables and queries, I also recommend to use the Autopilot Placement advisor running the following statement (with a single table, it’s already optimal):
SQL > CALL sys.heatwave_advisor(JSON_OBJECT("target_schema", JSON_ARRAY("airline")));
Conclusion
In this series, you learned how to use MySQL Database Service on OCI and how to enable HeatWave to reduce the query response time on heavy (OLAP) workload.
In this example, we reduced the response time of a query from 8 min 42.8133 sec to 0.6087 sec !! 858x faster !
And as usual, enjoy MySQL, MySQL Database Service with HeatWave and MySQL Shell !
[…] Using MySQL Database Service in OCI – Part 3: faster with MySQL HeatWave ! […]
[…] Using MySQL Database Service in OCI – Part 3: faster with MySQL HeatWave ! […]
Thanks for article !
I waiting for more 😉