MySQL 8.0: if I should optimize only one query on my application, which one should it be ?

on

Answering this question is not easy. Like always, the best response is “it depends” !

But let’s try to give you all the necessary info the provide the most accurate answer. Also, may be fixing one single query is not enough and looking for that specific statement will lead in finding multiple problematic statements.

The most consuming one

The first candidate to be fixed is the query that consumes most of the execution time (latency). To identify it, we will use the sys schema and join it with events_statements_summary_by_digest from performance_schemato retrieve a real example of the query (see this post for more details).

Let’s take a look at what sys schema has to offer us related to our mission:

> show tables like 'statements_with%';
+---------------------------------------------+
| Tables_in_sys (statements_with%)            |
+---------------------------------------------+
| statements_with_errors_or_warnings          |
| statements_with_full_table_scans            |
| statements_with_runtimes_in_95th_percentile |
| statements_with_sorting                     |
| statements_with_temp_tables                 |
+---------------------------------------------+

We will then use the statements_with_runtimes_in_95th_percentile to achieve our first task. However we will use the version of the view with raw data (not human readable formatted), to be able to sort the results as we want. The raw data version of sysschema views start with x$:

SELECT schema_name, format_time(total_latency) tot_lat,   
       exec_count, format_time(total_latency/exec_count) latency_per_call, 
       query_sample_text 
  FROM sys.x$statements_with_runtimes_in_95th_percentile AS t1
  JOIN performance_schema.events_statements_summary_by_digest AS t2 
    ON t2.digest=t1.digest 
 WHERE schema_name NOT in ('performance_schema', 'sys') 
ORDER BY (total_latency/exec_count) desc LIMIT 1\G
*************************** 1. row ***************************
      schema_name: library
          tot_lat: 857.29 ms
       exec_count: 1
 latency_per_call: 857.29 ms
query_sample_text: INSERT INTO `books` (`doc`) VALUES ('{\"_id\": \"00005d44289d000000000000007d\", \"title\": \"lucky luke, tome 27 : l alibi\", \"isbn10\": \"2884710086\", \"isbn13\": \"978-2884710084\", \"langue\": \"français\", \"relié\": \"48 pages\", \"authors\": [\"Guylouis (Auteur)\", \"Morris (Illustrations)\"], \"editeur\": \"lucky comics (21 décembre 1999)\", \"collection\": \"lucky luke\", \"couverture\": \" ...
1 row in set (0.2838 sec)

This statement is complicated to optimize as it’s a simple insert, and it was run only once. Insert can be slower because of disk response time (I run in full durability of course). Having too many indexes may also increase the response time, this is why I invite you to have a look at these two sysschema tables:

  • schema_redundant_indexes
  • schema_unused_indexes

You will have to play with the limit of the query to find some valid candidates and then, thanks to the query_sample_text we have the possibility to run an EXPLAIN on the query without having to rewrite it !

Full table scans

Another query I would try to optimize is the one doing full table scans:

SELECT schema_name, sum_rows_examined, (sum_rows_examined/exec_count) avg_rows_call,
       format_time(total_latency) tot_lat, exec_count,
       format_time(total_latency/exec_count) AS latency_per_call,
       query_sample_text 
  FROM sys.x$statements_with_full_table_scans AS t1
  JOIN performance_schema.events_statements_summary_by_digest AS t2 
    ON t2.digest=t1.digest 
 WHERE schema_name NOT in ('performance_schema', 'sys') 
ORDER BY (total_latency/exec_count) desc LIMIT 1\G
*************************** 1. row ***************************
       schema_name: wp_lefred
 sum_rows_examined: 268075
     avg_rows_call: 3277.0419
           tot_lat: 31.31 s
        exec_count: 124
  latency_per_call: 252.47 ms
 query_sample_text: SELECT count(*) as mytotal
                 FROM wp_posts
                 WHERE (post_content LIKE '%youtube.com/%' 
                   OR post_content LIKE '%youtu.be/%')
                 AND post_status = 'publish'
 1 row in set (0.0264 sec)

We can then see that this query was executed 124 times for a total execution time of 31.31 seconds which makes 252.47 milliseconds per call. We can also see that this query examined more than 268k rows which means that on average those full table scans are examining 3277 records per query.

This is a very good one for optimization.

Temp tables

Creating temporary tables is also sub optimal for your workload, if you have some slow ones you should have identified them already with the previous queries. But if you want to hunt those specifically, once again, sys schema helps you to catch them:

SELECT schema_name, format_time(total_latency) tot_lat, exec_count, 
       format_time(total_latency/exec_count) latency_per_call, query_sample_text 
  FROM sys.x$statements_with_temp_tables AS t1
  JOIN  performance_schema.events_statements_summary_by_digest AS t2
    ON t2.digest=t1.digest 
 WHERE schema_name NOT in ('performance_schema', 'sys') AND disk_tmp_tables=1
ORDER BY 2 desc,(total_latency/exec_count) desc LIMIT 1\G

Fortunately, I had none on my system.

Query optimization is not the most exciting part of the DBA job… but it has to be done ;-). You have now an easy method to find where to start, good luck ! And don’t forget that if you need any help, you can always joins the MySQL Community Slack channel !

5 thoughts on “MySQL 8.0: if I should optimize only one query on my application, which one should it be ?

    1. Hi Mohsin, if you check the first article linked, you will see that QUERY_SAMPLE_TEXT is an addition in MySQL 8.0 only. You can use the SYS schema in 5.7 without any join.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

recent
categories