Machine Learning with MySQL HeatWave and predictions in Grafana

In MySQL HeatWave, the Database as a Service (DBaaS) offered by Oracle, Machine Learning is incorporated to provide database administrators (DBAs) and developers with an optimal experience.

Those tools are the Auto ML advisors like the autopilot indexing for example.

MySQL HeatWave is also an invaluable asset for Data Analysts, offering straightforward tools that streamline the process of creating predictive models through data training and scoring.

The Data

The data we are using is the Bank Marketing data related to marketing campaigns over phone calls of a Portuguese banking institution. The classification goal is to predict if the client will subscribe to a term deposit. This is the variable y in the dataset.

The data is available here. We will use bank-full.csv to load into our DB System to train and create our model.

Then, we will use bank.csv to test it.

Importing the data

It’s not mandatory to load the data into HeatWave’s secondary engine to create the model, it can stay in InnoDB.

We need first to create a schema and a new table to match our data:

MySQL> CREATE DATABASE marketing;
MySQL> CREATE TABLE bank_marketing(
        age int,
        job varchar(255),
        marital varchar(255),
        education varchar(255),
        default1 varchar(255),
        balance float,
        housing varchar(255),
        loan varchar(255),
        contact varchar(255),
        day int,
        month varchar(255),
        duration float,
        campaign int,
        pdays float,
        previous float,
        poutcome varchar(255),
        y varchar(255)
    );

After that, we can load the data using MySQL Shell’s util.importTable() method:

JS> util.importTable('bank-full.csv', 
             {table: 'bank_marketing', skipRows: 1,
             fieldsTerminatedBy: ';', dialect:  'csv-unix'});
Importing from file '/home/opc/bank-full.csv' to table 
                                              `marketing`.`bank_marketing` 
in MySQL Server at 10.0.1.35:3306 using 1 thread
[Worker000] bank-full.csv: Records: 45211  
                                      Deleted: 0  Skipped: 0  Warnings: 0
99% (4.61 MB / 4.61 MB), 0.00 B/s
File '/home/opc/bank-full.csv' (4.61 MB) 
                                  was imported in 0.4034 sec at 4.61 MB/s
Total rows affected in marketing.bank_marketing: Records: 45211  
                                     Deleted: 0  Skipped: 0  Warnings: 0

We can already create an identical table called bank_marketing_test this time.

And then load the test data into it:

JS> util.importTable('bank.csv', 
             {table: 'bank_marketing_test', skipRows: 1,
              fieldsTerminatedBy: ';', dialect:  'csv-unix'});
Importing from file '/home/opc/bank.csv' to table 
                                         `marketing`.`bank_marketing_test`
in MySQL Server at 10.0.1.35:3306 using 1 thread
[Worker000] bank.csv: Records: 4521  
                                 Deleted: 0  Skipped: 0  Warnings: 0
99% (461.32 KB / 461.47 KB), 0.00 B/s
File '/home/opc/bank.csv' (461.32 KB) 
                            was imported in 0.1058 sec at 461.32 KB/s
Total rows affected in marketing.bank_marketing_test: Records: 4521  
                                 Deleted: 0  Skipped: 0  Warnings: 0

Training & Inference

Now we need to create our model by training the data. If we plan to use a dedicated user for the Grafana dashboard later, it’s also mandatory to provide some privileges to that specific user.

If the user is called grafana, the following grants are mandatory:

MySQL> GRANT ALL PRIVILEGES ON `ML_SCHEMA_grafana`.* TO grafana;
MySQL> GRANT SELECT, EXECUTE ON `sys`.* TO grafana;
MySQL> GRANT SELECT ON `marketing`.* TO grafana;
MySQL> GRANT ALTER ON marketing.* to grafana;

It is also very important to create the schema where the model will be stored as the dedicated user won’t be able to create it:

MySQL> CREATE DATABASE ML_SCHEMA_grafana;
MySQL> \c grafana@10.0.1.35
Creating a session to 'grafana@10.0.1.35'
Please provide the password for 'grafana@10.0.1.35': ************
Save password for 'grafana@10.0.1.35'? [Y]es/[N]o/Ne[v]er (default No): yes

Let’s train the data now:

MySQL> CALL sys.ML_TRAIN('marketing.bank_marketing',
               'y', JSON_OBJECT('task', 'classification'), @grafana_model);
Query OK, 0 rows affected (10 min 6.6280 sec)

We can check the value of @grafana_model:

MySQL> select @grafana_model;
+------------------------------------------------+
| @grafana_model                                 |
+------------------------------------------------+
| marketing.bank_marketing_grafana_1716846458328 |
+------------------------------------------------+
1 row in set (0.0014 sec)

To validate the model, we need to score it. Therefore, we will load the model and test it against the testing data:

MySQL> CALL sys.ML_MODEL_LOAD(@grafana_model, NULL) ;
Query OK, 0 rows affected (0.1200 sec)

MySQL> CALL sys.ML_SCORE('marketing.bank_marketing_test', 'y', 
                     @grafana_model,  'accuracy', @score, null);
Query OK, 0 rows affected (0.9615 sec)

MySQL> SELECT @score;
+--------------------+
| @score             |
+--------------------+
| 0.9303251504898071 |
+--------------------+
1 row in set (0.0007 sec)

This is a very good score that validates our model!

Prediction

We can also try to predict with our model of a special set of attributes will be willing to subscribe or not to our term deposit:

MySQL> SELECT JSON_PRETTY(sys.ML_PREDICT_ROW(
        JSON_OBJECT("age", "30", 
                    "job", "services",
                    "marital", "married", 
                    "education", "secondary",
                    "default1", "no",     
                    "balance", "7032",     
                    "housing", "no",      
                    "loan", "no",
                    "contact", "cellular",     
                    "day", "17",      
                    "month", "jul",      
                    "duration", "402",
                    "campaign", "1",      
                    "pdays", "-1",     
                    "previous", "0",
                    "poutcome", "unknown", 
                    "id", "0"), @grafana_model, NULL)) prediction\G
*************************** 1. row ***************************
prediction: {
  "id": "0",
  "age": 30.0,
  "day": 17.0,
  "job": "services",
  "loan": "no",
  "month": "jul",
  "pdays": -1.0,
  "balance": 7032.0,
  "contact": "cellular",
  "housing": "no",
  "marital": "married",
  "campaign": 1.0,
  "default1": "no",
  "duration": 402.0,
  "poutcome": "unknown",
  "previous": 0.0,
  "education": "secondary",
  "Prediction": "no",
  "ml_results": {
    "predictions": {
      "y": "no"
    },
    "probabilities": {
      "no": 0.9,
      "yes": 0.1
    }
  }
}
1 row in set (0.2049 sec)

We can see that with those characteristics, the customer won’t subscribe to our term deposit.

Grafana

Grafana is an open source dashboard to visualize and query your data.

I’ve installed a Grafana server on a compute instance (public subnet) in the same VCN as my DB System.

When connected to Grafana, we need to create a new data source to connect to MySQL:

And we use the credentials of the dedicated user we created (grafana) and pointing the schema where the data is stored (marketing):

We can create a dashboard to visualize the current data in the bank_marketing table:

Prediction with Grafana

The funniest part is to visualize the prediction when changing some of the values:

Here are the two dashboards in case you want to import them:

The query used to build the Probability pie chart is the following:

SELECT
  prediction, prob_y as "yes", prob_n as "no"
FROM
  (
    SELECT
      sys.ML_PREDICT_ROW(
        json_object(
          "age", "${age}",
          "job", "${job}",
          "marital", "${marital}",
          "education", "${education}",
          "default1",  "no",
          "balance",    "${balance}",
          "housing",    "${housing}",
          "loan",       "${loan}",
          "contact",    "${contact}",
          "day",        "10",
          "month",      "${month}",
          "duration",   "600",
          "campaign",   "1",
          "pdays",      "-1",
          "previous",   "0",
          "poutcome",   "unknown",
          "id",         "0"
        ),
        (
          SELECT
            model_handle
          FROM
            ML_SCHEMA_grafana.MODEL_CATALOG
          ORDER BY
            model_id DESC
          LIMIT
            1
        ), NULL
      ) as a
    
  ) as t
  JOIN JSON_TABLE(
    t.a,
    '$' COLUMNS(
      prediction varchar(3) PATH "$.ml_results.predictions.y",
      prob_y decimal(4, 2) PATH "$.ml_results.probabilities.yes",
      prob_n decimal(4, 2) PATH "$.ml_results.probabilities.no"
    )
  ) as jt;

As you can see above, the tricky part is the model catalog to use. Here we read the first line in the schema ML_SCHEMA_grafana where grafana is the user connecting to the DB System.

Conclusion

MySQL HeatWave’s Machine Learning capabilities simplify tasks for data analysts by using simple SQL calls instead of extensive Python code for example. It also easily integrates with third-party tools like Grafana through a simple MySQL connection.

Enjoy MySQL and enjoy discovering the MySQL HeatWave Machine Learning capabilities, you are now ready to make predictions!

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

As MySQL Community Manager, I am an employee of Oracle and the views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

You can find articles I wrote on Oracle’s blog.