MySQL 8.0: if I should optimize only one query on my application, which one should it be ?

Answering this question is not easy. Like always, the best response is “it depends” !

But let’s try to give you all the necessary info the provide the most accurate answer. Also, may be fixing one single query is not enough and looking for that specific statement will lead in finding multiple problematic statements.

The most consuming one

The first candidate to be fixed is the query that consumes most of the execution time (latency). To identify it, we will use the sys schema and join it with events_statements_summary_by_digest from performance_schemato retrieve a real example of the query (see this post for more details).

Let’s take a look at what sys schema has to offer us related to our mission:

> show tables like 'statements_with%';
+---------------------------------------------+
| Tables_in_sys (statements_with%)            |
+---------------------------------------------+
| statements_with_errors_or_warnings          |
| statements_with_full_table_scans            |
| statements_with_runtimes_in_95th_percentile |
| statements_with_sorting                     |
| statements_with_temp_tables                 |
+---------------------------------------------+

We will then use the statements_with_runtimes_in_95th_percentile to achieve our first task. However we will use the version of the view with raw data (not human readable formatted), to be able to sort the results as we want. The raw data version of sysschema views start with x$:

SELECT schema_name, format_time(total_latency) tot_lat,   
       exec_count, format_time(total_latency/exec_count) latency_per_call, 
       query_sample_text 
  FROM sys.x$statements_with_runtimes_in_95th_percentile AS t1
  JOIN performance_schema.events_statements_summary_by_digest AS t2 
    ON t2.digest=t1.digest 
 WHERE schema_name NOT in ('performance_schema', 'sys') 
ORDER BY (total_latency/exec_count) desc LIMIT 1\G
*************************** 1. row ***************************
      schema_name: library
          tot_lat: 857.29 ms
       exec_count: 1
 latency_per_call: 857.29 ms
query_sample_text: INSERT INTO `books` (`doc`) VALUES ('{\"_id\": \"00005d44289d000000000000007d\", \"title\": \"lucky luke, tome 27 : l alibi\", \"isbn10\": \"2884710086\", \"isbn13\": \"978-2884710084\", \"langue\": \"français\", \"relié\": \"48 pages\", \"authors\": [\"Guylouis (Auteur)\", \"Morris (Illustrations)\"], \"editeur\": \"lucky comics (21 décembre 1999)\", \"collection\": \"lucky luke\", \"couverture\": \" data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDABQODxIPDRQSEBIXFRQYHjIhHhwcHj0sLiQySUBMS0dARkVQWnNiUFVtVkVGZIhlbXd7gYKBTmCNl4x9lnN+gXz/2wBDARUXFx4aHjshITt8U0ZTfHx8fHx8fHx8fHx8fHx8fHx8fHx8fHx8fHx8fHx8fHx8fHx8fHx8fHx8fHx8fHx8fHz/wAARCAEfANwDASIAAhEBAxEB/8QAHwAAAQUBAQEBAQEAAAAAAAAAAAECAwQFBgcICQoL/8QAtRAAAgEDAwIEAwUFBAQAAAF9AQIDAAQRBRIhMUEGE1FhByJxFDKBkaEII0KxwRVS0fAkM2JyggkKFhcYGRolJicoKSo0NTY3ODk6Q0RFRkdISUpTVFVWV1hZWmNkZWZnaGlqc3R1dnd4eXqDhIWGh4iJipKTlJWWl5iZmqKjpKWmp6ipqrKztLW2t7i5usLDxMXGx8jJytLT1NXW19jZ2uHi4+Tl5ufo6erx8vP09fb3+Pn6/8QAHwEAAwEBAQEBAQEBAQAAAAAAAAECAwQFBgcICQoL/8QAtREAAgECBAQDBAcFBA...
1 row in set (0.2838 sec)

This statement is complicated to optimize as it’s a simple insert, and it was run only once. Insert can be slower because of disk response time (I run in full durability of course). Having too many indexes may also increase the response time, this is why I invite you to have a look at these two sysschema tables:

schema_redundant_indexes
schema_unused_indexes

You will have to play with the limit of the query to find some valid candidates and then, thanks to the query_sample_text we have the possibility to run an EXPLAIN on the query without having to rewrite it !

Full table scans

Another query I would try to optimize is the one doing full table scans:

SELECT schema_name, sum_rows_examined, (sum_rows_examined/exec_count) avg_rows_call,
       format_time(total_latency) tot_lat, exec_count,
       format_time(total_latency/exec_count) AS latency_per_call,
       query_sample_text 
  FROM sys.x$statements_with_full_table_scans AS t1
  JOIN performance_schema.events_statements_summary_by_digest AS t2 
    ON t2.digest=t1.digest 
 WHERE schema_name NOT in ('performance_schema', 'sys') 
ORDER BY (total_latency/exec_count) desc LIMIT 1\G
*************************** 1. row ***************************
       schema_name: wp_lefred
 sum_rows_examined: 268075
     avg_rows_call: 3277.0419
           tot_lat: 31.31 s
        exec_count: 124
  latency_per_call: 252.47 ms
 query_sample_text: SELECT count(*) as mytotal
                 FROM wp_posts
                 WHERE (post_content LIKE '%youtube.com/%' 
                   OR post_content LIKE '%youtu.be/%')
                 AND post_status = 'publish'
 1 row in set (0.0264 sec)

We can then see that this query was executed 124 times for a total execution time of 31.31 seconds which makes 252.47 milliseconds per call. We can also see that this query examined more than 268k rows which means that on average those full table scans are examining 3277 records per query.

This is a very good one for optimization.

Temp tables

Creating temporary tables is also sub optimal for your workload, if you have some slow ones you should have identified them already with the previous queries. But if you want to hunt those specifically, once again, sys schema helps you to catch them:

SELECT schema_name, format_time(total_latency) tot_lat, exec_count, 
       format_time(total_latency/exec_count) latency_per_call, query_sample_text 
  FROM sys.x$statements_with_temp_tables AS t1
  JOIN  performance_schema.events_statements_summary_by_digest AS t2
    ON t2.digest=t1.digest 
 WHERE schema_name NOT in ('performance_schema', 'sys') AND disk_tmp_tables=1
ORDER BY 2 desc,(total_latency/exec_count) desc LIMIT 1\G

Fortunately, I had none on my system.

Query optimization is not the most exciting part of the DBA job… but it has to be done ;-). You have now an easy method to find where to start, good luck ! And don’t forget that if you need any help, you can always joins the MySQL Community Slack channel !

8 Comments

Edward Soares

August 18, 2019 / 08:21 Reply

Query optimization is not the most exiting part of the DBA job…

“exiting”
- lefred
  
  August 18, 2019 / 11:48 Reply
  
  Thank you for pointing this mistake, I updated the post.
Mohsin Ali

August 18, 2019 / 08:34 Reply

Hi Lefred,

How can we use above SQL queries in 5.7?

Regards!
Mohsin
- lefred
  
  August 18, 2019 / 09:18 Reply
  
  Hi Mohsin, if you check the first article linked, you will see that QUERY_SAMPLE_TEXT is an addition in MySQL 8.0 only. You can use the SYS schema in 5.7 without any join.
  - Mohsin Ali
    
    August 18, 2019 / 11:20 Reply
    
    Thanks
Ben Nadel

December 18, 2020 / 13:56 Reply

This is great. I’m just upgrading a DB from 5.6 to 5.7 and I’m researching how I can best leverage the new sys schema. This has been very helpful. Also, one of the best things that I learned from this post is that the `x$` tables have the _full SQL statement_, whereas the “user friendly” tables have really brief digests. I thought I was going to have to pick those apart – but, being able to `JOIN` to the non “user friendly” table to get the full SQL statement is such a huge win! Thanks.
3tagroup

June 26, 2022 / 13:54 Reply

you will see that QUERY_SAMPLE_TEXT
Analyzing queries in MySQL Database Service – lefred blog: tribulations of a MySQL Evangelist

October 13, 2022 / 15:57 Reply

[…] already wrote an article on how to use Performance_Schema and Sys, let’s point out the query consuming most of the execution time (latency) with this new […]

Cookie	Duration	Description
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

The most consuming one

Full table scans

Temp tables

follow me

Subscribe to Blog via Email

8 Comments

Leave a ReplyCancel Reply