MySQL – Configuration, query cache and other thingamajigs (part 2 of 2)

This is the third post in a short series about MySQL. Read the first here, and the second here. This one covers this and that and has no set topic. The post is quite long, so I’ve split it in two (read the first part here). This also hopefully make you add us to your rss-reader, and keep you coming back for more.

Nested updates

When INSERTing, REPLACEing, UPDATEing or DELETEing several rows, nesting them together in a single long query is faster, preferably with INSERT DELAYED where applicable. The nested queries are faster in them self since the query is only interpreted once and only needs to update indexes once whereas several single rows inserted require one update each. When considering this remember that on a MyISAM table with a more even read/write load, the higher overhead of multiple fast queries might be better than a rare longer lasting table lock. UPDATEs can’t be nested, but can be written as a temporary INSERT followed by an UPDATE.

/* CREATE TABLE */
CREATE TABLE associates ( id INT, value INT, name VARCHAR(10) );
CREATE TABLE tmp_associates ( id INT, value INT, name VARCHAR(10) );
/* INSERT */
INSERT DELAYED INTO associates SET ( value, name ) VALUES
		( 4, 'Mr Pink' ), ( 2, 'Mr White' ),
		( 4, 'Mr Orange' ), ( 6, 'Mr Brown' );
/* UPDATE */
TRUNCATE TABLE tmp_associates;
INSERT INTO tmp_associates VALUES ( 3000, 'Mr Pink' ),
		( 22, 'Mr White' ), ( 11, 'Mr Orange' ),
		( 91, 'Mr Brown' );
UPDATE associates, tmp_associates
		SET associates.value = tmp_associates.value
		WHERE tmp_associates.name = associates.name;

Subselects

In most cases, don’t use them, use JOINs. The foreachesque way of thinking that leads to subselects is logical and tempting to use, but as most RDBs work in sets, they are in most cases dead slow compared to JOINs. It should be noted that in some cases subselects can be hundred of times faster, but those cases are special and you should really know what you are doing before opting for it. Of all the JOINs (inner, outer, left, right, cross and more) STRAIGHT_JOIN is one you shouldn’t use. STRAIGHT_JOIN tells the database to always join in the order that you specify and to not use its optimizer. Have we learned anything from Hollywood it’s that computers (and dirty apes) are smarter than humans, don’t flatter yourself by thinking otherwise. :)

Boolean flags

Apparently Torfinn here at Aptoma did research into this some time ago and discovered that NULL values are much slower compared to the number zero and allows nicer query-syntax. In MySQL TINYINT, BOOL and BOOLEAN are all synonyms for the same single-byte data type so it doesn’t matter which one you choose, but TINYINT with the values 1 and 0 are more logical and easier to understand for someone who doesn’t know this. The CREATE string for avoiding NULL values is like this:

CREATE TABLE t ( deleted TINYINT NOT NULL DEFAULT 0 );

The conditional part of a query can then be written like this:

SELECT id FROM t WHERE !deleted;

Temp tables

If you feel the write rate of your table is too high for optimal query cache usage and you don’t have any rush inserting new rows, consider inserting them into a temp table and at set intervals INSERT INTO the main table. This leads to faster initial INSERTs, potentially more query cache hits and less overhead updating indexes.

Do more once, not some foreach

In normal programming languages foreach is great, but for SQL it’s seldom optimal. If you end up doing a query foreach something, look into doing two foreachs in PHP and one larger SQL-query. One foreach to gather conditional data (build a string suitable for an IN condition) and one after the query is executed where you first locate the row(s) you need. Ordering the data in the same order in both PHP and SQL will save you lots power as you can read and match rows from the returned data set sequentially.

I’ll provide an example. The previous paragraph could be a bit hard to grasp. In this first function a date restricted query is executed foreach user looking for its name. In the second function all user names are added to an IN() function and a single query gathers the data for all listed users within the same date restriction and is ordered by user name. Then some PHP orders the users in the same order ( hopefully — be careful about collation and special chars) and foreach user locate matching row in the result set.

The bad, with potentially a bunch of queries:

function weekReport() {
	$lusers = array( 'Joe', 'Frank', 'Bill', 'Ted' );
	$errors = array();
	$today = date( 'Y-m-d 00:00:00' );
	$sql = "SELECT SUM( errors ) AS errors FROM problems WHERE
	created < '{$today}'
		AND CREATED >=
		DATE_SUB( '{$today}', INTERVAL 7 DAY ) AND name = ";
	foreach ( $lusers as $luser ) {
		$result = mysql_query( $sql . "'{$luser}';";
		if ( $row = mysql_fetch_assoc( $result ) ) {
			$errors[$luser] = $row['errors'];
		}
	}
	return $errors;
}

The potentially good way, with one query:

function weekReport() {
	$lusers = array( 'Joe', 'Frank', 'Bill', 'Ted' );
	$errors = array();
	$today = date( 'Y-m-d 00:00:00' );
	$sql = "SELECT name, SUM( errors ) AS errors
		FROM problems WHERE
		created <  '{$today}'
		AND CREATED >=
		DATE_SUB( '{$today}', INTERVAL 7 DAY )
		AND name
		IN ( '" .  implode( '','', $lusers ) . ''' . ' )
		GROUP BY name, ORDER BY name DESC;';
	$result = mysql_query( $sql );
	$lusers = array_sort( $lusers );
	$found = true;
	$row = false;
	foreach ( $lusers as $luser ) {
		if ( $found ) {
			$found = false;
			$row = mysql_fetch_assoc( $result );
		}
		if ( $row && $row['name'] == $luser ) {
			$found = true;
			$errors[$luser] = $row['errors'];
		}
	}
	return $errors;
}

Configuration

The default cache sizes aren’t very performance tuned so you probably want to increase them if you have the resources and access to do so.

SHOW VARIABLES;

will show current MySQL settings and these are some of the more important ones:

When using InnoDB
innodb_buffer_pool_size – Total cache size for InnoDB databases. Default 8 MB and can be set to up to 50-80% of total system memory when only using InnoDB tables.
innodb_log_buffer_size – How much log to buffer in memory before writing to disk. Increasing this could give a performance boost.

When using MyISAM
key_buffer_size – Total cache size for MyISAM databases. Default 8 MB and can be set to up to 50-80% of total system memory when only using MyISAM tables.

Query Cache
query_cache_size – Total cache size for query results. Must be set to more than 0 byte and a few megabyte should be enough in most cases.
query_cache_type – Query Cache status. 0 or OFF is deactivated, 1 or ON is always on and 2 or DEMAND is on when using key word SQL_CACHE. ON is the easiest as no modification to the SQL statements are required but will decrease performance on cache misses. DEMAND won’t have the same performance hit on all queries and could be the best choice when all programmers know each tables access levels and know when Query Cache will be needed and when not.

When using MEMORY
max_heap_table_size – Total data size for MEMORY tables.

Helpful commands

Here in the end I will just list a few commands you should know. Search the MySQL manual to see what they do.

SHOW VARIABLES;
SHOW PROCESSLIST;
SHOW FULL PROCESSLIST;
SHOW STATUS;
FLUSH STATUS;
SHOW INNODB STATUS;
SHOW TABLE STATUS table;
SHOW CREATE TABLE table;
SHOW INDEXES FROM table;
SHOW KEYS FROM table;
OPTIMIZE TABLE table;
ANALYZE TABLE table;

Presentations

MySQL Performance  Blog from the authors of High Performance MySQL has several good presentations worth reading. In some of them you need to read between the lines ( as they for the most part consist of bullet points ), but that should be easy enough.
http://www.mysqlperformanceblog.com/mysql-performance-presentations/