Full table scans can be problematic for performance. Certainly if the scanned tables are large. The worst case is when full table scans are involved in joins and particularly when the scanned table is not the first one (this was dramatic before MySQL 8.0 as Block Nested Loop was used) !
A full table scans means that MySQL was not able to use an index (no index or no filters using it).
Effects
When Full Table Scans happen (depending of the size of course), a lot of data gets pulled into the Buffer Pool and maybe other important data from the working set is pulled out. Most of the time that new data in the Buffer Pool might even not be required by the application, what a waste of resources !
You then understand that another side effect of Full Table Scans is the increase of I/O operations.
The most noticeable symptoms of Full Table Scans are:
- increase of CPU usage
- increase of disk I/O (depending on the size of the tables and the size of the Buffer Pool)
- increase of accessed rows
Trending
What is the best way to see if we have an increase of Full Table Scans ?
MySQL doesn’t provide a metric with the exact amount of table scans and additionally, if the full table scan is performed against a table with only 1 record, is it problematic ?
To determine if we have an increase in Full Table Scans, we will use the handler API metrics and precisely handler_read_rnd_next
:
handler_read_rnd_next
represents the number of requests to read the next row in the data file. In other words, it represents the number of non-indexed reads.
Handler API
Each storage engine is a class with each instance of the class communicating with the MySQL server through a special handler
interface.
The handler API is then the interface between MySQL and the storage engine. The MySQL server communicates with the storage engines through that API and it’s the storage engine’s responsibility to manage data storage and index management.
The Handler_%
variables count handler operations, such as the number of times MySQL asks a
storage engine to read the next row from an index.
This is exactly the values of those handler_%
variables that are plotted in the graph above.
Let’s have a quick look at some interesting ones:
handler_read_first
: The number of times the first entry in an index was read. If this value is high, it suggests that the server is doing a lot of full index scanshandler_read_next
: The number of requests to read the next row in key order. This value is incremented if you are querying an index column with a range constraint or if you are doing an index scan.handler_read_rnd_next
: The number of requests to read the next row in the data file. This value is high if you are doing a lot of table scans. Generally this suggests that your tables are not properly indexed or that your queries are not written to take advantage of the indexes you have.Handler_read_rnd_next
is incremented whenhandler::rnd_next()
is called. This operation advances the cursor position to the next row.
For more information about other handler status variable, check the manual.
I rewrote a query from High Performance MySQL, 3rd edition, O’Reilly to be compatible with MySQL 8.0 to illustrate the instrumentation of the handler counters:
SELECT * FROM ( SELECT STRAIGHT_JOIN LOWER(gs0.VARIABLE_NAME) AS variable_name, gs0.VARIABLE_VALUE AS value_0, gs1.VARIABLE_VALUE AS value_1, ROUND((gs1.VARIABLE_VALUE - gs0.VARIABLE_VALUE) / 60, 2) AS per_sec, (gs1.VARIABLE_VALUE - gs0.VARIABLE_VALUE) AS per_min FROM ( SELECT VARIABLE_NAME, VARIABLE_VALUE FROM performance_schema.global_status UNION ALL SELECT '', SLEEP(60) FROM DUAL ) AS gs0 JOIN performance_schema.global_status gs1 USING (VARIABLE_NAME) WHERE gs1.VARIABLE_VALUE <> gs0.VARIABLE_VALUE ) a WHERE variable_name LIKE 'handler%';
This is an example of this query’s output:
Please node that the
SHOW
VARIABLES command is a full table scan and thehandler_read_rnd_next
counter will be incremented when that command is executed.
JOINS
As written above, when Full Tables Scans are involved in Joins, it usually makes things even worse. That information can also be plotted:
select_range
: The number of joins that used ranges on the first table. This is normally not a critical issue even if the value is quite large.select_scan
: The number of joins that did a full scan of the first table.select_full_range_join
: The number of joins that used a range search on a reference table. In other words, this is the number of joins that used a value from table n to retrieve rows from a range of the reference index in table n + 1. Depending on the query, this can be more or less costly thanSelect_scan
.select_range_check
: The number of joins without keys that check for key usage after each row. If this is not 0, you should carefully check the indexes of your tables as this query plan has very high overhead.select_full_join
: This is the counter you don’t wan t to see with high value. It represents the number of joins that perform table scans because they do not use indexes, the number of cross joins, or joins without any criteria to match rows in the tables. When checking this value for a specific query, the value of rows examined is the product of the number of rows in each table. Should be absolutely avoided !
How to find those Queries ?
Performance_Schema and Sys Schema have all the necessary resources to retrieve the query performing Full Table Scans. Let’s have a look at the output of the following query:
SELECT format_pico_time(total_latency) total_time, db, exec_count, no_index_used_count, no_good_index_used_count, no_index_used_pct, rows_examined, rows_sent_avg, rows_examined_avg, t1.first_seen, t1.last_seen, query_sample_text FROM sys.x$statements_with_full_table_scans AS t1 JOIN performance_schema.events_statements_summary_by_digest AS t2 ON t2.digest=t1.digest ORDER BY total_latency DESC\G
This is an example of one returned row:
total_time: 2.05 s db: sbtest exec_count: 3 no_index_used_count: 3 no_good_index_used_count: 0 no_index_used_pct: 100 rows_examined: 859740 rows_sent_avg: 1 rows_examined_avg: 286580 first_seen: 2022-06-16 12:21:14.450874 last_seen: 2022-06-16 12:23:48.577439 query_sample_text: select count(*) from sbtest2 join sbtest1 using(k) where sbtest1.c like '%1414%' or sbtest2.c like '%1424%'
We can see that on average this query is scanning more than 280k rows each time it’s executed.
Let’s have a look a the Query Execution Plan for that specific query and confirm it does Full Table Scans:
And this is even more obvious when we use the Tree format:
To illustrate another bad behavior, I will remove the index on the k column of those two tables and use the same query.
We will also check the handler_%
and Select_%
status variables:
We can see that this is exactly the situation we should avoid, especially if you are not using MySQL 8.0. Without Hash Joins this will be worse !
Conclusion
In general, Full Table Scans should be avoided but of course this also depends on the size of the tables involved, the performance of the storage, the storage engine used (Performance_Schema is not a problem) and how the buffer pool is used.
Query Optimization is the solution, this consists in eventually adding indexes, having to rewrite the query, … however this is not always easy.
If you need to deal with such operations, I recommend you to read Chapter 24 of MySQL 8 Query Performance Tuning, Jesper Wisborg Krog, Apress, 2020.
Enjoy MySQL !