MySQL – What Engine to Use AKA ‘in retrospect my first mistake’

This is the first post in a short series on MySQL. In the end of this post I will try to list as many sources as I can remember to who and what told me these lies. Most of this will be copy-paste from our sekrit wiki and requires some knowledge of SQL, databases and MySQL.

One of MySQL’s more powerful features is its support for multiple different database engines and the first thing you should do when creating a table is to select the storage engine best suitable for your needs. In MySQL 5.1 there are three engines worth mentioning, MyISAM, InnoDB and MEMORY but new interesting engines are in development such as Falcon, Maria, PBXT and solidDB.
http://solutions.mysql.com/engines.html

MyISAM

MyISAM is the default engine of MySQL and a good choice when your access levels are >90% read or >90% write on a high load system or in any case on a low-load system. Both SELECTs and INSERTs on MyISAM tables can be incredible fast, but where it falls through is how it handles locks. The problem is that table-level read/write locks are the only ones supported. This means that when a UPDATE or DELETE operation is executed all other queries are put on hold until completed, and when a INSERT operation is executed all other INSERTs are on hold. ( Note: this means that several read operations can be executed simultaneously. I’ll discuss more on INSERT DELAYED in a future post.) This could result in multi-second or even multi-minute delays on SELECTs when the SELECT/INSERT ratio is outside the aforementioned thresholds. MyISAM has fast imports and exports of backups and has mysqlhotcopy for faster dumps.

CREATE TABLE t ( i INT ) ENGINE = MYISAM;
CREATE TABLE t ( i INT, j INT, PRIMARY KEY ( i ), INDEX ( j ) ) ENGINE = MYISAM;

A thousand times faster full-text index. MyISAM handles indexes by storing the record byte offset with each index row. The result of this is that all index lookups happen at the same speed and that UPDATEs and INSERTs will not always result in an index update as the byte offset doesn’t always change. The overhead when adding several indexes is also kept at a minimum as each index just need to keep track of the byte offset. INSERT DELAYED will also delay updating indexes until other queries have finished. MyISAM also supports full-text index on VARCHAR, TEXT, etc where each word is stored in a B-tree. Full-text index does consume a lot of space, but a query such as this:

SELECT * FROM articles WHERE body = "% database %";

might run thousands of times faster when a full-text index is added and the query is re-written as:

SELECT * FROM articles (body) MATCH AGAINST ('database');

Please note that MATCH AGAINST won’t display partial word-hits, e.g. “base” won’t locate “database”.

http://dev.mysql.com/doc/refman/5.1/en/myisam-storage-engine.html

InnoDB

The InnoDB engine is an overall good engine that supports some more advanced features that you don’t get with MyISAM with row-level locks and transactions being the most important ones. The row-level lock means that a table can have multiple read and writes at the same time and because it supports Multi version concurrency control (MVCC) a row can be read while being written to. Where there is little system to the data structure in MyISAM, InnoDB orders it by primary key. The result is that for each INSERT, DELETE or UPDATE, rows need to be moved, row position recalculated and indexes potentially updated. This makes INSERTs into a InnoDB table slower than a MyISAM table, but since data is ordered by primary key, selecting based on it is extremely fast. Where MyISAM first reads primary key index file, finds byte offset for the data row and then reads the database file, InnoDB would do this in one step.

InnoDB handles indexes by storing the primary key clustered with all index records, something that could result in a large overhead. It’s therefore recommended to have a small integer or some other space-efficient data type as primary key and not a wide VARCHAR that might be 10-20 times the size.

CREATE TABLE t ( i INT ) ENGINE = INNODB;
CREATE TABLE t ( i INT, j INT, PRIMARY KEY ( i ), INDEX ( j ) ) ENGINE = INNODB;

http://www.innodb.com/innodb

Memory

A MEMORY table is in it self persistent, but its data is not, as it’s stored in memory. This makes it very fast, but prone to data loss. Obvious usage for a MEMORY table is to cache subsets of other tables ( SELECT INTO ) or processed subsets but still retain the power to do SQL queries for even more limited subsets with increased speed. Compared to Memcached this is in some cases far superior in simplicity and probably also in speed. A MEMORY table can of course also be used to store a serialized object, but unless the power of a conditional query is needed, Memcached is probably better for the job. The size of all MEMORY tables are governed by max_heap_table_size with a default value of 16 MB. Worth mentioning is that memory is not recovered when deleting a row in a MEMORY table, but will be reused when inserting a new row. To recover the unused memory do a

ALTER TABLE table ENGINE=MEMORY

to rebuild the table. The default index type is hash, but BTREE is also supported. Using hash is several times faster, but allows identical hashes on identical data that can lead to different kind of issues. These issues are nonexistent with BTREE.

CREATE TABLE t ( i INT ) ENGINE = MEMORY;
CREATE TABLE t ( i INT, INDEX ( i ) ) ENGINE = MEMORY;
CREATE TABLE t ( i INT, INDEX USING BTREE ( i ) ) ENGINE = MEMORY;

http://dev.mysql.com/doc/refman/5.1/en/memory-storage-engine.html

Summary

In conclusion, data in a MEMORY table isn’t persistent but works well for temporary data such as serialized objects or statistics counters and is great for storing calculated subsets of large tables and then do heavy subset calculations on that subset again.

If you don’t need transactions, MyISAM is good for any short table because of the high SELECT/INSERT speed and when the dataset is kept trim the table-lock won’t be a problem. Another case where MyISAM performs well is when a table is either very read- or write-oriented. Multiple SELECTS can query a table without blocking each other and the raw speed of MyISAMs writes results in good performace even with table-lock for each operation.

If you need transactions, InnoDB is one of the better alternatives for MySQL at the moment but is also a good choice for active tables where table-locks will kill your performance.

We’ve had problems with large statistical data in our product NettTV, in combination with MyISAM. Large selects would block the inserts for minutes, at worst. During this period, the tables were effectively constipated, and one would be led to believe that no statistics were gathered during this interval. Later on, the statistics would flod in. MyISAM, referring to the earlier discussion on it, just does not fit with this use-pattern. InnoDB with its row-locks and MVCC probably wouldn’t give the same performance on the statistics part and would still use a lot of CPU but wouldn’t constipate the table the same way.

The newer MySQL engines are exciting but most are at the moment too unstable and with too variable performance for anything but testing, but in a five year perspective a few of the mentioned names probably will see its usage.

Some links on MySQL performance, ordered by importance

http://video.google.com/videoplay?docid=2524524540025172110&q=google+engedu
http://www.amazon.com/gp/product/0596101716
http://www.mysqlperformanceblog.com/2007/05/02/uc2007-presentation-and-notes/
http://jayant7k.blogspot.com/2007/07/mysql-query-cache.html
http://www.petefreitag.com/item/430.cfm
http://forge.mysql.com/wiki/MySQL_Tutorials

Some links on the new MySQL-engines

http://www.mysqlperformanceblog.com/2007/08/01/landscape-of-transactional-storage-engines-for-mysql/
http://forge.mysql.com/wiki/Maria_RoadMap_Design
http://dev.mysql.com/tech-resources/articles/falcon-transactional-engine-part1.html

To be continued

More information related to the MySQL engines can be read in later posts. We will attempt to release one each week to come. Stay tuned using rss.