Using MySQL Database Service in OCI – Part 3: faster with MySQL HeatWave !

This post is the third and last of a series of articles dedicated to MySQL Database Service (MDS):

The goal of this series if to get familiar with MDS, HeatWave and useful MySQL Shell utilities.

We can now start a query on our MDS instance that has been populated with >200M rows:

 SQL > SELECT year, Reporting_Airline,  
              AVG(ArrDelay) AS avgArrDelay 
       FROM ontime 
       WHERE Reporting_Airline IN ('AA', 'UA', 'DL')
    GROUP BY Reporting_Airline, year ORDER BY year, Reporting_Airline;
+------+-------------------+----------------------+
| year | Reporting_Airline | avgArrDelay          |
+------+-------------------+----------------------+
| 1987 | AA                |    4.839905281581386 |
| 1987 | DL                |    11.78429388686475 |
| 1987 | UA                |    8.559413984694412 |
| 1988 | AA                |    4.172048644346153 |
| 1988 | DL                |    6.116446922543346 |
...
| 2021 | UA                |   3.7546497666434027 |
+------+-------------------+----------------------+
105 rows in set (8 min 42.8133 sec)

This is our reference time.

Enabling HeatWave Cluster

We start by enabling HeatWave Cluster for our MySQL DB System from the OCI console:

You can choose either ways to enable HeatWave to your MDS instance.

Then on the next screen of the Wizard, I recommend you to click on Estimate Node Count. This will start some processing that will advice you a valid number of nodes for your current data. If your data grows over time, I recommend to run the estimate periodically:

You can then choose the schemas or tables you want to load into HeatWave:

The node count estimation determined that 1 node was enough but a minimum of 2 is mandatory.

We also get the load command to load that schema into HeatWave once the cluster will be ready:

Let’s go then for 2 and we can see in the MySQL DB System details page that the cluster is being created:

When ready, these orange dots become green:

From MySQL Shell, we can check some status variables to see if the HeatWave is ready:

 SQL > select * from performance_schema.global_status 
       where variable_name like 'rapid%er%status';
+----------------------+----------------+
| VARIABLE_NAME        | VARIABLE_VALUE |
+----------------------+----------------+
| rapid_cluster_status | ON             |
| rapid_service_status | ONLINE         |
+----------------------+----------------+
2 rows in set (0.0011 sec)

Perfect !

Now let’s run our query again:

 SQL > SELECT year, Reporting_Airline,  
              AVG(ArrDelay) AS avgArrDelay 
       FROM ontime 
       WHERE Reporting_Airline IN ('AA', 'UA', 'DL')
    GROUP BY Reporting_Airline, year ORDER BY year, Reporting_Airline;
+------+-------------------+----------------------+
| year | Reporting_Airline | avgArrDelay          |
+------+-------------------+----------------------+
| 1987 | AA                |    4.839905281581386 |
| 1987 | DL                |    11.78429388686475 |
| 1987 | UA                |    8.559413984694412 |
| 1988 | AA                |    4.172048644346153 |
| 1988 | DL                |    6.116446922543346 |
...
| 2021 | UA                |   3.7546497666434027 |
+------+-------------------+----------------------+
105 rows in set (8 min 38.2327 sec)

euh… ??

In fact this is expected. Once your HeatWave cluster is ready, you still need to load the data you want to with it. Having the possibility to load and unload data from HeatWave on demand allows you to pay only for what you consume.

Remember when we ran the Estimation Count, we received a statement to load the data. Let’s use it:

 SQL > CALL sys.heatwave_load(JSON_ARRAY('airline'), NULL);

This will load the data and return a report. At the end of it we can see:

+-----------------------------------------------------------------------+
| LOAD SUMMARY                                                          |
+-----------------------------------------------------------------------+
|                                                                       |
| SCHEMA                  TABLES       TABLES      COLUMNS         LOAD |
| NAME                    LOADED       FAILED       LOADED     DURATION |
| ------                  ------       ------      -------     -------- |
| `airline`                    1            0          110     2.39 min |
|                                                                       | 
+-----------------------------------------------------------------------+
6 rows in set (2 min 23.5079 sec)

Let’s run it again now:

105 rows in set (0.6596 sec)

Wow ! This is what I call a Query Response Time improvement !

Operating MySQL HeatWave

We can easily verify that the query was offloaded to HeatWave as there is a global status variable that counts all the queries which used our HeatWave cluster:

 SQL > select * from performance_schema.global_status 
       where variable_name ='rapid_query_offload_count';
+---------------------------+----------------+
| VARIABLE_NAME             | VARIABLE_VALUE |
+---------------------------+----------------+
| rapid_query_offload_count | 1              |
+---------------------------+----------------+
1 row in set (0.0010 sec)

We can always check is the HeatWave is enabled and the cluster online:

 SQL > select * from performance_schema.global_status 
       where variable_name like 'rapid%er%status';
+----------------------+----------------+
| VARIABLE_NAME        | VARIABLE_VALUE |
+----------------------+----------------+
| rapid_cluster_status | ON             |
| rapid_service_status | ONLINE         |
+----------------------+----------------+
2 rows in set (0.0013 sec)

As we saw in previously, it’s important to be sure the tables we want to use have been loaded to HeatWave.

The list is also available from performance_schema:

 SQL > select rpt.NAME, rp.* 
       from performance_schema.rpd_tables rp 
       join performance_schema.rpd_table_id rpt 
         on rpt.id = rp.id\G
*************************** 1. row ***************************
               NAME: airline.ontime
                 ID: 2
       SNAPSHOT_SCN: 3
      PERSISTED_SCN: 3
          POOL_TYPE: RAPID_LOAD_POOL_TRANSACTIONAL
DATA_PLACEMENT_TYPE: PrimaryKey
              NROWS: 201575308
        LOAD_STATUS: AVAIL_RPDGSTABSTATE
      LOAD_PROGRESS: 100
         SIZE_BYTES: 66597158912
        QUERY_COUNT: 1
       LAST_QUERIED: 2021-12-16 18:57:36.49062
 LOAD_END_TIMESTAMP: 2021-12-16 18:54:16.582146

To see if a query will be offloaded to HeatWave or not, EXPLAIN can also be used:

 SQL > EXPLAIN SELECT year, Reporting_Airline,  
              AVG(ArrDelay) AS avgArrDelay 
       FROM ontime 
       WHERE Reporting_Airline IN ('AA', 'UA', 'DL')
    GROUP BY Reporting_Airline, year ORDER BY year, Reporting_Airline\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: ontime
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 178572003
     filtered: 30.000001907348633
        Extra: Using where; Using temporary; Using filesort;
               Using secondary engine RAPID

In a future post, I will also show how to get more information related to queries not offloaded to HeatWave

Faster with Machine Learning

MySQL HeatWave includes several Autopilot processes. As we already ran one query, let’s see if Autopilot Encoding will be able to improve our query even more:

SQL > CALL sys.heatwave_advisor(
      JSON_OBJECT("auto_enc",JSON_OBJECT("mode", "recommend")));

This generates a report with some encoding suggestions.

Let’s retrieve them in a format we can copy/paste to apply them in MySQL Shell:

 SQL > \P cut -d '|' -f2
 SQL > SELECT log->>"$.sql" AS "SQL Script" 
      FROM sys.heatwave_advisor_report 
      WHERE type = "sql" ORDER BY id;
.....
SQL > \P

And now we cut and paste all the statements in the output above.

This is a bunch of SET and ALTER statements:

 SQL >  SET SESSION innodb_parallel_read_threads = 32;                                                                                                                                                                  
 SQL >  ALTER TABLE `airline`.`ontime` SECONDARY_UNLOAD;                                                                                                                                                                
Query OK, 0 rows affected (0.0030 sec)
 SQL >  ALTER TABLE `airline`.`ontime` SECONDARY_ENGINE=NULL;                                                                                                                                                           
Query OK, 0 rows affected (0.0160 sec)
 SQL >  ALTER TABLE `airline`.`ontime` MODIFY `ArrTimeBlk` 
        varchar(9) CHARACTER SET utf8mb4 
        COLLATE utf8mb4_0900_ai_ci DEFAULT NULL 
       COMMENT ' RAPID_COLUMN_AUTODB=ENCODING=SORTED
         RAPID_COLUMN=ENCODING=SORTED ';       
Query OK, 0 rows affected (0.0298 sec)
 SQL >  ALTER TABLE `airline`.`ontime` MODIFY `DepTimeBlk` 
        varchar(9) CHARACTER SET utf8mb4 
        COLLATE utf8mb4_0900_ai_ci DEFAULT NULL 
        COMMENT ' RAPID_COLUMN_AUTODB=ENCODING=SORTED
          RAPID_COLUMN=ENCODING=SORTED ';       
Query OK, 0 rows affected (0.0174 sec)
 [...] 
 SQL >  ALTER TABLE `airline`.`ontime` SECONDARY_ENGINE=RAPID;                                                                                                                                                          
Query OK, 0 rows affected (0.0130 sec)
 SQL >  ALTER TABLE `airline`.`ontime` SECONDARY_LOAD;
Query OK, 0 rows affected (3 min 7.6535 sec)

We can re-run our query:

SQL > SELECT year, Reporting_Airline,  
              AVG(ArrDelay) AS avgArrDelay 
       FROM ontime 
       WHERE Reporting_Airline IN ('AA', 'UA', 'DL')
    GROUP BY Reporting_Airline, year ORDER BY year, Reporting_Airline;
105 rows in set (0.6087 sec)

As we had only one query and a single table, there’s not much improvement. But if you use multiple queries and joins, this can provide even better results !

With multiple tables and queries, I also recommend to use the Autopilot Placement advisor running the following statement (with a single table, it’s already optimal):

SQL > CALL sys.heatwave_advisor(JSON_OBJECT("target_schema",
          JSON_ARRAY("airline")));

Conclusion

In this series, you learned how to use MySQL Database Service on OCI and how to enable HeatWave to reduce the query response time on heavy (OLAP) workload.

In this example, we reduced the response time of a query from 8 min 42.8133 sec to 0.6087 sec !! 858x faster !

And as usual, enjoy MySQL, MySQL Database Service with HeatWave and MySQL Shell !

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

3 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

As MySQL Community Manager, I am an employee of Oracle and the views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

You can find articles I wrote on Oracle’s blog.