In MySQL HeatWave, the Database as a Service (DBaaS) offered by Oracle, Machine Learning is incorporated to provide database administrators (DBAs) and developers with an optimal experience.
Those tools are the Auto ML advisors like the autopilot indexing for example.
MySQL HeatWave is also an invaluable asset for Data Analysts, offering straightforward tools that streamline the process of creating predictive models through data training and scoring.
The Data
The data we are using is the Bank Marketing data related to marketing campaigns over phone calls of a Portuguese banking institution. The classification goal is to predict if the client will subscribe to a term deposit. This is the variable y in the dataset.
The data is available here. We will use bank-full.csv
to load into our DB System to train and create our model.
Then, we will use bank.csv
to test it.
Importing the data
It’s not mandatory to load the data into HeatWave’s secondary engine to create the model, it can stay in InnoDB.
We need first to create a schema and a new table to match our data:
MySQL> CREATE DATABASE marketing;
MySQL> CREATE TABLE bank_marketing(
age int,
job varchar(255),
marital varchar(255),
education varchar(255),
default1 varchar(255),
balance float,
housing varchar(255),
loan varchar(255),
contact varchar(255),
day int,
month varchar(255),
duration float,
campaign int,
pdays float,
previous float,
poutcome varchar(255),
y varchar(255)
);
After that, we can load the data using MySQL Shell’s util.importTable()
method:
JS> util.importTable('bank-full.csv',
{table: 'bank_marketing', skipRows: 1,
fieldsTerminatedBy: ';', dialect: 'csv-unix'});
Importing from file '/home/opc/bank-full.csv' to table
`marketing`.`bank_marketing`
in MySQL Server at 10.0.1.35:3306 using 1 thread
[Worker000] bank-full.csv: Records: 45211
Deleted: 0 Skipped: 0 Warnings: 0
99% (4.61 MB / 4.61 MB), 0.00 B/s
File '/home/opc/bank-full.csv' (4.61 MB)
was imported in 0.4034 sec at 4.61 MB/s
Total rows affected in marketing.bank_marketing: Records: 45211
Deleted: 0 Skipped: 0 Warnings: 0
We can already create an identical table called bank_marketing_test
this time.
And then load the test data into it:
JS> util.importTable('bank.csv',
{table: 'bank_marketing_test', skipRows: 1,
fieldsTerminatedBy: ';', dialect: 'csv-unix'});
Importing from file '/home/opc/bank.csv' to table
`marketing`.`bank_marketing_test`
in MySQL Server at 10.0.1.35:3306 using 1 thread
[Worker000] bank.csv: Records: 4521
Deleted: 0 Skipped: 0 Warnings: 0
99% (461.32 KB / 461.47 KB), 0.00 B/s
File '/home/opc/bank.csv' (461.32 KB)
was imported in 0.1058 sec at 461.32 KB/s
Total rows affected in marketing.bank_marketing_test: Records: 4521
Deleted: 0 Skipped: 0 Warnings: 0
Training & Inference
Now we need to create our model by training the data. If we plan to use a dedicated user for the Grafana dashboard later, it’s also mandatory to provide some privileges to that specific user.
If the user is called grafana
, the following grants are mandatory:
MySQL> GRANT ALL PRIVILEGES ON `ML_SCHEMA_grafana`.* TO grafana;
MySQL> GRANT SELECT, EXECUTE ON `sys`.* TO grafana;
MySQL> GRANT SELECT ON `marketing`.* TO grafana;
MySQL> GRANT ALTER ON marketing.* to grafana;
It is also very important to create the schema where the model will be stored as the dedicated user won’t be able to create it:
MySQL> CREATE DATABASE ML_SCHEMA_grafana;
MySQL> \c grafana@10.0.1.35
Creating a session to 'grafana@10.0.1.35'
Please provide the password for 'grafana@10.0.1.35': ************
Save password for 'grafana@10.0.1.35'? [Y]es/[N]o/Ne[v]er (default No): yes
Let’s train the data now:
MySQL> CALL sys.ML_TRAIN('marketing.bank_marketing',
'y', JSON_OBJECT('task', 'classification'), @grafana_model);
Query OK, 0 rows affected (10 min 6.6280 sec)
We can check the value of @grafana_model
:
MySQL> select @grafana_model;
+------------------------------------------------+
| @grafana_model |
+------------------------------------------------+
| marketing.bank_marketing_grafana_1716846458328 |
+------------------------------------------------+
1 row in set (0.0014 sec)
To validate the model, we need to score it. Therefore, we will load the model and test it against the testing data:
MySQL> CALL sys.ML_MODEL_LOAD(@grafana_model, NULL) ;
Query OK, 0 rows affected (0.1200 sec)
MySQL> CALL sys.ML_SCORE('marketing.bank_marketing_test', 'y',
@grafana_model, 'accuracy', @score, null);
Query OK, 0 rows affected (0.9615 sec)
MySQL> SELECT @score;
+--------------------+
| @score |
+--------------------+
| 0.9303251504898071 |
+--------------------+
1 row in set (0.0007 sec)
This is a very good score that validates our model!
Prediction
We can also try to predict with our model of a special set of attributes will be willing to subscribe or not to our term deposit:
MySQL> SELECT JSON_PRETTY(sys.ML_PREDICT_ROW(
JSON_OBJECT("age", "30",
"job", "services",
"marital", "married",
"education", "secondary",
"default1", "no",
"balance", "7032",
"housing", "no",
"loan", "no",
"contact", "cellular",
"day", "17",
"month", "jul",
"duration", "402",
"campaign", "1",
"pdays", "-1",
"previous", "0",
"poutcome", "unknown",
"id", "0"), @grafana_model, NULL)) prediction\G
*************************** 1. row ***************************
prediction: {
"id": "0",
"age": 30.0,
"day": 17.0,
"job": "services",
"loan": "no",
"month": "jul",
"pdays": -1.0,
"balance": 7032.0,
"contact": "cellular",
"housing": "no",
"marital": "married",
"campaign": 1.0,
"default1": "no",
"duration": 402.0,
"poutcome": "unknown",
"previous": 0.0,
"education": "secondary",
"Prediction": "no",
"ml_results": {
"predictions": {
"y": "no"
},
"probabilities": {
"no": 0.9,
"yes": 0.1
}
}
}
1 row in set (0.2049 sec)
We can see that with those characteristics, the customer won’t subscribe to our term deposit.
Grafana
Grafana is an open source dashboard to visualize and query your data.
I’ve installed a Grafana server on a compute instance (public subnet) in the same VCN as my DB System.
When connected to Grafana, we need to create a new data source to connect to MySQL:
And we use the credentials of the dedicated user we created (grafana
) and pointing the schema where the data is stored (marketing
):
We can create a dashboard to visualize the current data in the bank_marketing
table:
Prediction with Grafana
The funniest part is to visualize the prediction when changing some of the values:
Here are the two dashboards in case you want to import them:
The query used to build the Probability pie chart is the following:
SELECT
prediction, prob_y as "yes", prob_n as "no"
FROM
(
SELECT
sys.ML_PREDICT_ROW(
json_object(
"age", "${age}",
"job", "${job}",
"marital", "${marital}",
"education", "${education}",
"default1", "no",
"balance", "${balance}",
"housing", "${housing}",
"loan", "${loan}",
"contact", "${contact}",
"day", "10",
"month", "${month}",
"duration", "600",
"campaign", "1",
"pdays", "-1",
"previous", "0",
"poutcome", "unknown",
"id", "0"
),
(
SELECT
model_handle
FROM
ML_SCHEMA_grafana.MODEL_CATALOG
ORDER BY
model_id DESC
LIMIT
1
), NULL
) as a
) as t
JOIN JSON_TABLE(
t.a,
'$' COLUMNS(
prediction varchar(3) PATH "$.ml_results.predictions.y",
prob_y decimal(4, 2) PATH "$.ml_results.probabilities.yes",
prob_n decimal(4, 2) PATH "$.ml_results.probabilities.no"
)
) as jt;
As you can see above, the tricky part is the model catalog to use. Here we read the first line in the schema ML_SCHEMA_grafana
where grafana is the user connecting to the DB System.
Conclusion
MySQL HeatWave’s Machine Learning capabilities simplify tasks for data analysts by using simple SQL calls instead of extensive Python code for example. It also easily integrates with third-party tools like Grafana through a simple MySQL connection.
Enjoy MySQL and enjoy discovering the MySQL HeatWave Machine Learning capabilities, you are now ready to make predictions!