Ioannis Bekiaris

Efficient Pagination on MySQL – Iterate Over Large Tables

The web is full of discussions regarding slow pagination queries in MySQL. Some devs are complaining that their queries - using offset - are getting really slow when they iterate over a really big MySQL table.

Back in the day, I was using offset and limit for pagination as well. Most of the well-known frameworks were doing the same. To be honest, I ’ve never faced a big issue using them in my past implementations. That probably happened because I was mostly working with databases that were not as big and/or my applications’ users rarely visited the last page of paginated results.

However, years later I realized that offset is not the best approach, especially when the table contains a lot of data and/or has many rows.

What is the problem with offset?

By design, offset in MySQL works like the following: MySQL fetches the data and limits them down by dropping those before the offset. That means that usage of offset puts a lot of effort on the MySQL side, and that affects query performance in high offsets.

But the problems with offset don’t stop here. Imagine that you iterate over a MySQL table and you’re fetching data in batches of 10 (ten) rows. After the first iteration 3 (three) new rows are inserted just in the beginning. What is going to happen with your second batch? You’re going to have duplications on your iteration results.

Is there a solution to those offset issues?

In fact, yes, there is. Keyset pagination is the solution for that offset problem. The idea is simple. As previously mentioned, we want to iterate over a table by fetching 10 (ten) results per iteration. We have to order our data by id as we did when using offset.

That implies of course that we have a “sortable” id such as a rearranged UUID (using timestamp to make UUID sequential) or an auto-increment id.

Read: UUIDs in real-life applications

Explaining performance

Let’s try an example. There is the articles_tags table that relates articles with tags. Let’s investigate a bit each case using EXPLAIN for the respective queries:

explain extended select * from articles_tags order by id  desc limit 10 offset 90000;
explain extended select * from articles_tags where id < 633782 order by id  desc limit 10;

The main difference that affects performance of both queries is the ‘type’ (join type). For offset pagination the ‘type’ is ‘index’ and for keyset pagination is ‘range’. According to MySQL documentation the index join type is the same as ALL, except that the index tree is scanned. On the other hand for range type Only rows that are in a given range are retrieved, using an index to select the row. So, range type is faster than index.

One more thing that we should mention is that the number of rows that MySQL will examine increases as we increase the offset in offset pagination.

Keyset “pagination” example in PHP

A simple example using an active record implementation.


do {
            $query = ArticlesTags::find();
                
            if (isset($lastId)) {
                $query->where(['<', 'id', $lastId]);
            }

            $records = $query
                ->orderBy('id DESC')
                ->limit(10)
                ->all();

            foreach ($records as $record) {
                $lastId = $record['id'];
                 yield ArticlesTagsConvertor::deserialize($record);
            }
 } while (count($records) === $batchSize);


Read: PHP Generators: To Yield or Not to Yield?

Summary

Think twice before using offsets to scan large MySQL tables. Key-set pagination is always the best solution if you have sortable ids, and always remember to use EXPLAIN in front of you MySQL queries to detect performance bottlenecks!