Caching Strategies (AMS week #43)

This is the second post of our lecture notes from Aptoma Monday School – series (AMS). As it is lecture notes, you should expect the texsts to be a bit rough around the edges. We’ve decided to discuss and revise our caching strategies. Our session this week was spent settling upon a set of cache layers.

disk-cache

To make things perfectly clear : a cache is a temporary storage area where frequently accessed data can be stored for rapid access.

1. Page caching Reverse proxy caching

Caching the entire page is typically done using Varnish (see the technology references at the end of this article). Our experience with Varnish is that it is stable, and very high performance. However, people who really know how to configure it properly does not come by the dozen. As much as we could hope that all our hosting providers see it as their job to get to know this kind of blazing technology intimately, we can not rely on it, and thus we will have to leverage our own competency further in this area. A weak point of page caching through Varnish, is that once you set a client cookie, Varnish is forced to send traffic straight through (i.e. no caching whatsoever) or dispose of the cookie all together (i.e. break application functionality). Local storage (the “new cookie”, as discussed last week) will not create any such problems for Varnish.

2. View / subview caching (Application Caching)

In MVC-terms, the view is the rendered HTML from the application. Caching the full view would be like the “page caching” from above, only in this case done within the application itself. This would overlap with Varnish strategy above. But even though this version would solve the cookie-problem depicted above, it is normally no longer used this way after Varnish came along.

Subview caching, however, can still be of good use. A subview is simply a part of a page. This is useful when you e.g. have Varnish set for 1 minute caching, but you know that parts of the page could easily be cached for an hour without becoming obsolete. Examples include a seldom changing navigational menu, a generated thumbnail image or page headers and footers. To do subview caching will involve putting your subview data into memcached, file cache or you can achieve something similar using Varnish Edge Side Includes (ESI)

3. Data caching (Application Caching)

Data caching is typically done in your “model-layer” (again with the MVC). A class-function in your business logic will check memcached if there are cached versions of the data about to be fetched from database before proceeding with the SQL-statement if no such data can be found in cache. Afterward it will update the cache with fresh data. This allows a very fine grained control on cache times on different data. Data which is heavy on the ol’ computer and can be cached for a very long time, is candidate even for being serialized and put into a database for easy recovery upon reboots etc. Involving a database for handling large data sets is also provides you with more sophisticated access, update or delete schemes beyond what the neanderthal key/value database in memcached can offer.

We rely heavily on this strategy in our products, and the strategy is typically a good one for creating robust API’s. Also, we use it in our in-house framework (Aptoma FrameWork, AFW) for the Autoloader (automatic discovery and loading of classes) where file-paths are stored in memcached for rapid access upon subsequent requests. Used in combination with APC, this provides us with a blazing fast framework technology.

Let’s take an example

An Event object that handles data in a roster scheduler is stored in the “events” database table. To load an event, we have to fetch a lot of extra information relating to it: Employee, department, workplace, conflicts with interfering events, and so on. All this will be very time consuming. Thus, we should store all data to cache for re-use!

public function loadEvent($id)
{
 // Try to fetch the event from Memcache data storage
 $event = Cache::getObject($id, 'Event');
 if (!empty($event))
 {
  return $event;
 }

 // If the Event does not exists in Memcached,
 // we do the conventional, time consuming processing here.

 // Finally, we do cache the Event for the posterity
 Cache::setObject($event, 'Event');
 return $event;
}

Usually we use cron-jobs to pre-load these data as well.

A main challenge for this kind of caching-scheme is to prevent data from becoming corrupt. If an interfering event changes, and we fail to validate the cache from the example above, we would be left with bad data in our cache. Keep this in mind when designing your data caching.

4. Local Client caching

Local Client caching is something as simple as the client itself keeping a version of some data, not asking for a new version from the server. This can be cookies, a CSS-file or any other form of static content (javascript or images).

There are two strategies for client caching :
a) force very, very long cache times on the client, and change file names whenever you need the contents of a CSS-file for instance or
b) keep cache times low to ensure that the clients asks for an updated version from time to time.

Strategy a) adds complexity to the application level, as you will have to handle file names changing, and changing file names might break external dependencies on the file. However it will ensure that changes to static content is displayed at the clients immediately. Yahoo recommend this strategy. Strategy b) will delay updates of static content until the cache at the client has expired, but will keep things simpler for the programmer and more stable for any external dependencies on the file names.

5. Query Cache

SQL-query results are cached in MySQL internally. Configuration at server level can be OFF / ON / DEMAND. An attempted cache lookup with no hits, gives a 20% loss of performance. DEMAND requires use of SELECT /*SQL CACHE*/, to force caching. Putting the keyword in comments, will ensure SQL-compatibility with other engines than MySQL.

We recommend using the ON or DEMAND setting on your installation. If you never touched this setting, it is probably set to ON.

If you need to squeeze out every bit of performance available from your application, you should switch to the DEMAND-setting, and review all your SQL-statements and judiciously use the SQL CACHE-trigger. If you have the time and skill to do this, you will benefit from it. You would want to use it on tables with a high percentage of read over write requests. Using SQL-CACHE on tables with a lot of writes will only hurt your overall performance. (Lars has more to say on the topic of Query Cache)

Referred technologies

  • Varnish – high-performance HTTP accelerator. (see also Varnish ESI).
  • Memcache – high-performance, distributed memory object caching system.
  • APC – caching and optimizing PHP intermediate code.
  • AFW – our in-house framework which is tightly coupled and integrated to all of the above technologies. (due for release New BSD License in 2010)