7 Approved Caching Technologies

(These are the notes from Aptoma Monday School for week #44 and #45)

We have recently been blogging our notes on different modi operandi of caching. To sum it up, these were: reverse proxy caching, application caching (view caching, subview caching and object/data-caching), opcode caching, client caching and query caching.

what-are-you-looking-at

We have spent the last two weeks discussing which technologies fits our needs best. Our product installations handle high traffic (i.e. millions of views a day) and we also have products that produce a lot of data that is heavy to compute (i.e. large data sets computed from complex database queries). Thus we have to scale both vertically and horizontally, so to speak. Our needs of caching is thus quite broad, and we have to shop for a lot of technologies in order to fulfill all our needs. This brings us to our …

List of  7 Approved Caching Technologies

In this post we will provide a few notes on each technology.

1. Varnish.
Used for reverse proxy caching. We are not happy with hosting providers ability to configure and fine tune Varnish, so rehearsing your skills in this is certainly not a waste of time.

2. APC Opcode Cache.
Use it. Always. We cannot find examples for when not to. It will speed up execution with no known disadvantages. APC seems to perform better than both competing alternatives (xcache and zend accellerator) under all circumstances.

3. APC for data caching.
You can use APC for data caching, as well. It will outperform memcache by a factor of 10 to 50(!). As opposed to memcache, it does not have distributed access. We have no idea what happens when you exhaust the assigned RAM for data caching. We will have to find out, won’t we?  (Yes we will)

4. Memcache for data caching.
Memcache is still one of the most stable and high performing technologies for its use. It is nevertheless annoying that its performance is significantly slowed down by the fact that it runs on top of TCP/IP (which is also an advantage when it comes to flexibility), even in those cases where only local access is necessary. It seems to us that memcache is run on the same server as the application in more than 90% of all cases.

5. Varnish ESI.
As the name implies, this is a feature of Varnish. ESI is very interesting as it implements subview caching without having to implement it at application level (slower). Implementing subview caching on reverse proxy level can speed things up significantly given the right circumstances. A disadvantage of Varnish ESI is that it introduces more complexity to your source code, as you’ll have to write “Varnish markup” in order to have Varnish do edge side includes for you (<esi:include>). Not a big deal, but definitively a declared enemy of the simplicity ninja. The benefits might make it worthwhile, nevertheless.

6. MySQL for data caching.
This is basically “use MySQL the way you would do memcache”. MySQL for data caching performs quite well, but not really astonishing (memcache is about twice as fast in our tests). MySQL is a convenient technology, as we already require it on all our products, it is also one of the more available technologies regardless of hosting partner (and we have quite a few of them). Another advantage of using database technologies for this purpose is that more sophisticated queries can be applied for purging and invalidating data, than can you in the simpler (but faster) key value databases (memcache et.al).

7. SQLite for data caching.
Same use case as with MySQL above, and with the same pros and cons. It does perform a little better than MySQL. We have decided to look more into this one. Another advantage over memcache is that SQLite and MySQL become persistent caches, (whereas memcache is volatile, i.e. will be blank after a restart).

List of Discarded Technologies

APC as data cache and Varnish ESI are the only technologies which we have not exhaustively used in production for years. Nevertheless we will seek to improve our way of using all these technologies in the time to come, and we will be looking for how to implement support for these in our framework (Aptoma Framework, AFW). To show that we did not just stick to our guns on this one, we present this short list of alternative technologies which we also explored during our tests.

  • Tokyo Tyrant (reason to discard : somewhere in between MySQL and Memcache in performance, and brings no new advantages to the blend)
  • Nginx (not really a caching technology, merely a faster web-server in special circumstances)
  • Lighttpd (Not a caching technology, but can provide faster delivery of static content than Apache can deliver. On dynamic content it does not perform better than a properly stripped and fine tuned Apache.)
  • DBA (Does not perform as well as APC in our tests).

We have more discussions to come!

On our wish-list (todo-list) for caching and performance discussions is as follows.

Better benchmarking

  • Siege
  • Jmeter
  • Apache Benchmark (ab)
  • Httperf

Caching techniques

  1. Pre-loading cache (warming)
  2. Event triggered cache invalidation (cache on update)
  3. Stale cache (set flag but don’t purge, combined with a grace time in reverse proxy)
  4. Better caching on logged in users (Varnish ESI use case)
  5. Setting proper headers (for improved client caching and more)

Notes From some of our Benchmark Tests

Tokyo Tyrant

Tokyo Tyrant is a memcache-like layer on top of Toky0 Cabinet, which is a fast key-value database. (see : http://sameerparwani.com/posts/tokyo-cabinet-and-tokyo-tyrant) Installation was easy, everything required was available at http://1978th.net/. As a  PHP-wrapper we used http://mamasam.indefero.net/p/tyrant/downloads/2/

We tested Tokyo Tyrant with default settings (defaults matter). The same goes for the comparisons, Memcache and MySQL (query cache is off). The time given bellow is the time it took to set and get 1000 variables off 1 kB.

TokyoTyrant

  1. put: 0.102526473999
  2. get: 0.121464586258

TokyoTyrant disk hash

  1. put: 0.108086037636
  2. get: 0.123809480667

TokyoTyrant disk B-Tree

  1. put: 0.111338186264
  2. get: 0.129682970047

Memcache

  1. put: 0.0864425897598
  2. get: 0.0702331066132

MySQL without Query Cache

  1. put: 0.112287640572
  2. get: 0.164221072197

Tokyo Tyrant is slower than Memcache. Tokyo Cabinet can probably outperform memcached if accessed directly, but no PHP-bindings for this purpose were available for our tests.

What is exciting is that Tokyo Tyrant is faster than MySQL for persistent data caching. Tokyo Tyrant is also supposed to have some other features which we have currently did not have time to test. Please share any experiences you might have with Tokyo Tyrant.

MySql, SQLite, DBA, Memcached and APC tests

The test : Write, then read and add to an array 100 000 MD5-hashes. For the relational databases, the insert is done with a multi-row-insert or a  transaction.

MySQL

  • Create : 2038.615942 ms
  • Read : 13782.1378708 ms

SQLite

  • Create : 2084.01703835 ms
  • Read : 4064.75901604 ms

DBA (NOTE! Only with 10 000 elements this time)

  • Create : 10192.3089027
  • Read : 10065.6449795

Memcached

  • Create : 3493.89410019
  • Read : 3219.08593178

APC

  • Create : 25127.6450157 ms
  • Read : 172.363996506 ms

MySQL will insert a lot of rows at the same speed as SQLite. W/O query cache, it will be outperformed by a factor of three by SQLite.

SQLite supports :memory: instead of files, which can speed it up as a relational cache, but it is no longer persistent between restarts. MySQLs MEMORY-engine is more of the same.

DBA is a copy-on-write which means that every write will increase its file-size. This makes all operations slower the more writes you have to do. Performance loss was at times huge, but it can be fixed by issuing an optimize-command. DBA writes slowly, (50x slower than the relational databases, and much slower than the other hash-buckets). Reading from a clean file will make it perform somewhere in between the hash-buckets (memcache, APC) and the relational databases

Memcached is a little slower to write to than the relational databases, due to the databases doing all writes in a single command. Read is about the same as with SQLite and only twice as fast as MySql. (Bear in mind that this is a best case scenario for the relational databases)

APC is 5-10x slower than Memcached in writes, but 10-20x faster to read from in this test.

Test conclusions

DBA has few advantages over the relational databases. APC can replace Memcache in some of the areas in which we use memcache today (data/object-caching).