MySQL Query Cache

The following article is an in-depth look at Query Cache, mentioned briefly in our post about Caching Strategies.

With Query Cache, the result set from SELECT queries are cached with the query command itself as key. Meaning that if a SELECT query initially executes in ten seconds, consecutive times the _exact_ (including whitespace, case and more. The query must be an identical string.) same query is executed, the server gets an hit in Query Cache and returns the cached data in milliseconds, normally giving performance boosts between a few and many orders of magnitude. QC is a global server setting and has three modes; ON, OFF and DEMAND. Setting QC to OFF disables it, with it to ON makes the server look for a cached result for all SELECT queries while setting it to DEMAND will have the server only do a QC lookup if the query has SQL_CACHE after the SELECT command. It’s recommended to put this MySQL-only extension in a comment to keep with compatibility with SQL standards. (SELECT /*SQL_CACHE*/ * FROM brille) A QC miss will add up to about 20% overhead compared to the same query executed directly without QC lookup. QC is guaranteed to deliver fresh data as all QC on a table is invalidated when a INSERT, DELETE or UPDATE command is executed on it. This means that for a table with frequent writes, QC on reads will probably decrease performance. On tables with a high number of both reads and writes, setting QC to DEMAND and not use it on queries hitting that table might be a good idea. When actively using query cache on tables with dates it’s also important to write reusable and cacheable queries. Any use of NOW() or equivalent non-deterministic functions will render query cache unused even within the same second, so date and time should be calculated outside of MySQL. When doing so, first analyze the need for an exact query. If you can get away with < 1 minute accuracy, don’t add date( ‘Y-m-d H:i:s’ ) but date( ‘Y-m-d H:i:00′ ) so any identical queries within the same minute will get the cached result. Subselects will not use QC. On high performance applications, QC should be kept in mind when designing the database and queries. Below are two examples where performance characteristics could be radically different:

First table structure:

CREATE TABLE `users` (
`id` int(10) unsigned NOT NULL auto_increment,
`username` varchar(16) collate utf8_danish_ci NOT NULL,
`password` text collate utf8_danish_ci NOT NULL,
`salt` text collate utf8_danish_ci NOT NULL,
`last_time_visited` datetime NOT NULL default '0000-00-00 00:00:00',
PRIMARY KEY  (`id`),
UNIQUE KEY `username` (`username`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COLLATE=utf8_danish_ci;

Second table structure:

CREATE TABLE `users` (
`id` int(10) unsigned NOT NULL auto_increment,
`username` varchar(16) collate utf8_danish_ci NOT NULL,
`password` text collate utf8_danish_ci NOT NULL,
`salt` text collate utf8_danish_ci NOT NULL,
PRIMARY KEY  (`id`),
UNIQUE KEY `username` (`username`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COLLATE=utf8_danish_ci;
CREATE TABLE `user_visited` (
`id` int(10) unsigned NOT NULL,
`last_time_visited` datetime NOT NULL default '0000-00-00 00:00:00',
PRIMARY KEY  (`id`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_danish_ci;

If the first table was used on a high traffic forum, every time a user loads a page the QC would be invalidated as last_time_visited had to be updated. By denormalization and having two tables where one is often read and the other is often written to, QC on the first table would be kept until a user changed password/username or a new user is registered which is probably less frequent than pageloads.

But the way the SELECT query is created is also important. Lets say you normaly just need the basic information about a user, but every time you visit the users profile, last_time_visited is needed:

First set of queries:
Normal display of user:

SELECT /*SQL_CACHE*/ id, username FROM users WHERE id = 1;

Display of users profile:

SELECT /*SQL_CACHE*/ u.id, u.username, uv.last_time_visited FROM users AS u LEFT JOIN user_visited AS uv ON ( u.id = uv.id ) WHERE u.id = 1;

Second set of queries:
Normal display of user:

SELECT /*SQL_CACHE*/ id, username FROM users WHERE id = 1;

Display of users profile:

SELECT /*SQL_CACHE*/ id, username FROM users WHERE id = 1;
SELECT last_time_visited FROM user_visited WHERE id = 1;

In the first set, two different queries are used to fetch the same information, the username. This means the second query won’t be able to use the QC of the first and more frequent query. Also, the often-read table is joined with the often-written which means INSERT, DELETE and UPDATE queries on _both_ tables will clear cache for this query. In the second set of queries, the same query is used two times, increasing the chance of QC hit. And instead of joining the two tables in one query, a second and separate query is executed on the second table, here without the SQL_CACHE keyword. Two queries increases overhead, but the gain in higher cache hit (The first query can even get a hit in a Memcache-lookup) can easily outweigh this.