A graph a day, keeps the doctor away ! – Full Table Scans - lefred blog: tribulations of a MySQL Evangelist

Full table scans can be problematic for performance. Certainly if the scanned tables are large. The worst case is when full table scans are involved in joins and particularly when the scanned table is not the first one (this was dramatic before MySQL 8.0 as Block Nested Loop was used) !

A full table scans means that MySQL was not able to use an index (no index or no filters using it).

Effects

When Full Table Scans happen (depending of the size of course), a lot of data gets pulled into the Buffer Pool and maybe other important data from the working set is pulled out. Most of the time that new data in the Buffer Pool might even not be required by the application, what a waste of resources !

You then understand that another side effect of Full Table Scans is the increase of I/O operations.

The most noticeable symptoms of Full Table Scans are:

increase of CPU usage
increase of disk I/O (depending on the size of the tables and the size of the Buffer Pool)
increase of accessed rows

Handler API

Each storage engine is a class with each instance of the class communicating with the MySQL server through a special handler interface.

The handler API is then the interface between MySQL and the storage engine. The MySQL server communicates with the storage engines through that API and it’s the storage engine’s responsibility to manage data storage and index management.

The Handler_% variables count handler operations, such as the number of times MySQL asks a
storage engine to read the next row from an index.

This is exactly the values of those handler_% variables that are plotted in the graph above.

Let’s have a quick look at some interesting ones:

handler_read_first: The number of times the first entry in an index was read. If this value is high, it suggests that the server is doing a lot of full index scans
handler_read_next: The number of requests to read the next row in key order. This value is incremented if you are querying an index column with a range constraint or if you are doing an index scan.
handler_read_rnd_next: The number of requests to read the next row in the data file. This value is high if you are doing a lot of table scans. Generally this suggests that your tables are not properly indexed or that your queries are not written to take advantage of the indexes you have. Handler_read_rnd_next is incremented when handler::rnd_next() is called. This operation advances the cursor position to the next row.

For more information about other handler status variable, check the manual.

I rewrote a query from High Performance MySQL, 3rd edition, O’Reilly to be compatible with MySQL 8.0 to illustrate the instrumentation of the handler counters:

SELECT * FROM (
  SELECT STRAIGHT_JOIN
    LOWER(gs0.VARIABLE_NAME) AS variable_name,
    gs0.VARIABLE_VALUE AS value_0, gs1.VARIABLE_VALUE AS value_1,
    ROUND((gs1.VARIABLE_VALUE - gs0.VARIABLE_VALUE) / 60, 2) AS per_sec,
    (gs1.VARIABLE_VALUE - gs0.VARIABLE_VALUE)  AS per_min
  FROM 
  (
    SELECT VARIABLE_NAME, VARIABLE_VALUE
      FROM performance_schema.global_status
    UNION ALL
    SELECT '', SLEEP(60) FROM DUAL
  ) AS gs0
  JOIN performance_schema.global_status gs1 
  USING (VARIABLE_NAME)
  WHERE gs1.VARIABLE_VALUE <> gs0.VARIABLE_VALUE
) a 
WHERE variable_name LIKE 'handler%';

This is an example of this query’s output:

Please node that the SHOW VARIABLES command is a full table scan and the handler_read_rnd_next counter will be incremented when that command is executed.

JOINS

As written above, when Full Tables Scans are involved in Joins, it usually makes things even worse. That information can also be plotted:

select_range: The number of joins that used ranges on the first table. This is normally not a critical issue even if the value is quite large.
select_scan: The number of joins that did a full scan of the first table.
select_full_range_join: The number of joins that used a range search on a reference table. In other words, this is the number of joins that used a value from table n to retrieve rows from a range of the reference index in table n + 1. Depending on the query, this can be more or less costly than Select_scan.
select_range_check: The number of joins without keys that check for key usage after each row. If this is not 0, you should carefully check the indexes of your tables as this query plan has very high overhead.
select_full_join: This is the counter you don’t wan t to see with high value. It represents the number of joins that perform table scans because they do not use indexes, the number of cross joins, or joins without any criteria to match rows in the tables. When checking this value for a specific query, the value of rows examined is the product of the number of rows in each table. Should be absolutely avoided !

How to find those Queries ?

Performance_Schema and Sys Schema have all the necessary resources to retrieve the query performing Full Table Scans. Let’s have a look at the output of the following query:

SELECT format_pico_time(total_latency) total_time, 
       db, 
       exec_count, 
       no_index_used_count, 
       no_good_index_used_count, 
       no_index_used_pct, 
       rows_examined, 
       rows_sent_avg, 
       rows_examined_avg, 
       t1.first_seen, t1.last_seen, 
       query_sample_text 
FROM sys.x$statements_with_full_table_scans AS t1 
JOIN performance_schema.events_statements_summary_by_digest AS t2 
  ON t2.digest=t1.digest ORDER BY total_latency DESC\G

This is an example of one returned row:

              total_time: 2.05 s
                      db: sbtest
              exec_count: 3
     no_index_used_count: 3
no_good_index_used_count: 0
       no_index_used_pct: 100
           rows_examined: 859740
           rows_sent_avg: 1
       rows_examined_avg: 286580
              first_seen: 2022-06-16 12:21:14.450874
               last_seen: 2022-06-16 12:23:48.577439
       query_sample_text: select count(*) from sbtest2 
                          join sbtest1 using(k) where sbtest1.c 
                          like '%1414%' or sbtest2.c like '%1424%'

We can see that on average this query is scanning more than 280k rows each time it’s executed.

Let’s have a look a the Query Execution Plan for that specific query and confirm it does Full Table Scans:

And this is even more obvious when we use the Tree format:

To illustrate another bad behavior, I will remove the index on the k column of those two tables and use the same query.

We will also check the handler_% and Select_% status variables:

We can see that this is exactly the situation we should avoid, especially if you are not using MySQL 8.0. Without Hash Joins this will be worse !

Conclusion

In general, Full Table Scans should be avoided but of course this also depends on the size of the tables involved, the performance of the storage, the storage engine used (Performance_Schema is not a problem) and how the buffer pool is used.

Query Optimization is the solution, this consists in eventually adding indexes, having to rewrite the query, … however this is not always easy.

If you need to deal with such operations, I recommend you to read Chapter 24 of MySQL 8 Query Performance Tuning, Jesper Wisborg Krog, Apress, 2020.

Enjoy MySQL !

Cookie	Duration	Description
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

A graph a day, keeps the doctor away ! – Full Table Scans

Effects

Trending

Handler API

JOINS

How to find those Queries ?

Conclusion

follow me

Subscribe to Blog via Email

Leave a ReplyCancel Reply