A graph a day, keeps the doctor away ! – Full Table Scans

Full table scans can be problematic for performance. Certainly if the scanned tables are large. The worst case is when full table scans are involved in joins and particularly when the scanned table is not the first one (this was dramatic before MySQL 8.0 as Block Nested Loop was used) !

A full table scans means that MySQL was not able to use an index (no index or no filters using it).

Effects

When Full Table Scans happen (depending of the size of course), a lot of data gets pulled into the Buffer Pool and maybe other important data from the working set is pulled out. Most of the time that new data in the Buffer Pool might even not be required by the application, what a waste of resources !

You then understand that another side effect of Full Table Scans is the increase of I/O operations.

The most noticeable symptoms of Full Table Scans are:

  • increase of CPU usage
  • increase of disk I/O (depending on the size of the tables and the size of the Buffer Pool)
  • increase of accessed rows

Trending

What is the best way to see if we have an increase of Full Table Scans ?

MySQL doesn’t provide a metric with the exact amount of table scans and additionally, if the full table scan is performed against a table with only 1 record, is it problematic ?

To determine if we have an increase in Full Table Scans, we will use the handler API metrics and precisely handler_read_rnd_next:

handler_read_rnd_next represents the number of requests to read the next row in the data file. In other words, it represents the number of non-indexed reads.

Handler API

Each storage engine is a class with each instance of the class communicating with the MySQL server through a special handler interface.

The handler API is then the interface between MySQL and the storage engine. The MySQL server communicates with the storage engines through that API and it’s the storage engine’s responsibility to manage data storage and index management.

The Handler_% variables count handler operations, such as the number of times MySQL asks a
storage engine to read the next row from an index.

This is exactly the values of those handler_% variables that are plotted in the graph above.

Let’s have a quick look at some interesting ones:

  • handler_read_first: The number of times the first entry in an index was read. If this value is high, it suggests that the server is doing a lot of full index scans
  • handler_read_next: The number of requests to read the next row in key order. This value is incremented if you are querying an index column with a range constraint or if you are doing an index scan.
  • handler_read_rnd_next: The number of requests to read the next row in the data file. This value is high if you are doing a lot of table scans. Generally this suggests that your tables are not properly indexed or that your queries are not written to take advantage of the indexes you have. Handler_read_rnd_next is incremented when handler::rnd_next() is called. This operation advances the cursor position to the next row.

For more information about other handler status variable, check the manual.

I rewrote a query from High Performance MySQL, 3rd edition, O’Reilly to be compatible with MySQL 8.0 to illustrate the instrumentation of the handler counters:

SELECT * FROM (
  SELECT STRAIGHT_JOIN
    LOWER(gs0.VARIABLE_NAME) AS variable_name,
    gs0.VARIABLE_VALUE AS value_0, gs1.VARIABLE_VALUE AS value_1,
    ROUND((gs1.VARIABLE_VALUE - gs0.VARIABLE_VALUE) / 60, 2) AS per_sec,
    (gs1.VARIABLE_VALUE - gs0.VARIABLE_VALUE)  AS per_min
  FROM 
  (
    SELECT VARIABLE_NAME, VARIABLE_VALUE
      FROM performance_schema.global_status
    UNION ALL
    SELECT '', SLEEP(60) FROM DUAL
  ) AS gs0
  JOIN performance_schema.global_status gs1 
  USING (VARIABLE_NAME)
  WHERE gs1.VARIABLE_VALUE <> gs0.VARIABLE_VALUE
) a 
WHERE variable_name LIKE 'handler%';

This is an example of this query’s output:

Please node that the SHOW VARIABLES command is a full table scan and the handler_read_rnd_next counter will be incremented when that command is executed.

JOINS

As written above, when Full Tables Scans are involved in Joins, it usually makes things even worse. That information can also be plotted:

MySQL provides status variables allowing to control how the joins are made during a SELECT statement:
  • select_range: The number of joins that used ranges on the first table. This is normally not a critical issue even if the value is quite large.
  • select_scan: The number of joins that did a full scan of the first table.
  • select_full_range_join: The number of joins that used a range search on a reference table. In other words, this is the number of joins that used a value from table n to retrieve rows from a range of the reference index in table n + 1. Depending on the query, this can be more or less costly than Select_scan.
  • select_range_check: The number of joins without keys that check for key usage after each row. If this is not 0, you should carefully check the indexes of your tables as this query plan has very high overhead.
  • select_full_join: This is the counter you don’t wan t to see with high value. It represents the number of joins that perform table scans because they do not use indexes, the number of cross joins, or joins without any criteria to match rows in the tables. When checking this value for a specific query, the value of rows examined is the product of the number of rows in each table. Should be absolutely avoided !

How to find those Queries ?

Performance_Schema and Sys Schema have all the necessary resources to retrieve the query performing Full Table Scans. Let’s have a look at the output of the following query:

SELECT format_pico_time(total_latency) total_time, 
       db, 
       exec_count, 
       no_index_used_count, 
       no_good_index_used_count, 
       no_index_used_pct, 
       rows_examined, 
       rows_sent_avg, 
       rows_examined_avg, 
       t1.first_seen, t1.last_seen, 
       query_sample_text 
FROM sys.x$statements_with_full_table_scans AS t1 
JOIN performance_schema.events_statements_summary_by_digest AS t2 
  ON t2.digest=t1.digest ORDER BY total_latency DESC\G

This is an example of one returned row:

              total_time: 2.05 s
                      db: sbtest
              exec_count: 3
     no_index_used_count: 3
no_good_index_used_count: 0
       no_index_used_pct: 100
           rows_examined: 859740
           rows_sent_avg: 1
       rows_examined_avg: 286580
              first_seen: 2022-06-16 12:21:14.450874
               last_seen: 2022-06-16 12:23:48.577439
       query_sample_text: select count(*) from sbtest2 
                          join sbtest1 using(k) where sbtest1.c 
                          like '%1414%' or sbtest2.c like '%1424%'

We can see that on average this query is scanning more than 280k rows each time it’s executed.

Let’s have a look a the Query Execution Plan for that specific query and confirm it does Full Table Scans:

And this is even more obvious when we use the Tree format:

To illustrate another bad behavior, I will remove the index on the k column of those two tables and use the same query.

We will also check the handler_% and Select_% status variables:

We can see that this is exactly the situation we should avoid, especially if you are not using MySQL 8.0. Without Hash Joins this will be worse !

Conclusion

In general, Full Table Scans should be avoided but of course this also depends on the size of the tables involved, the performance of the storage, the storage engine used (Performance_Schema is not a problem) and how the buffer pool is used.

Query Optimization is the solution, this consists in eventually adding indexes, having to rewrite the query, … however this is not always easy.

If you need to deal with such operations, I recommend you to read Chapter 24 of MySQL 8 Query Performance Tuning, Jesper Wisborg Krog, Apress, 2020.

Enjoy MySQL !

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

As MySQL Community Manager, I am an employee of Oracle and the views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

You can find articles I wrote on Oracle’s blog.