Optimising MySQL
By Flib
2009-05-11
Category: MySQL
The Problem
Many times you find that as you grow the bottleneck becomes the database. While you can scale out across multiple machines in many ways, for many people this just isn't an option.
In this article I try to explain a few ways to optimise your MySQL use to allow you to squeeze just a little more work from it before you have to scaleup to a bigger server or scaleout to many servers. I assume that you have little control over the basic server parameters besides perhaps basic my.cnf access.
The Solution
The first thing to realise, is that unless your server is very badly setup, optimising the queries and the application itself will pay off more than optimising the server itself.
Turn On Slow-query Logs
To paraphrase Richard Koch, 'your server will spend 80% of the time working on 20% of the queries'. It makes sense to use this as a starting point, since if you can cut that 80% in half, then you have just gained an extra 40% of capcity on your server.
The slow query log will log any queries that take longer than a certain time. You can enable it by editing the /etc/my.cnf (on CentOS/RHEL) or the /etc/mysql/my.cnf (on Debian) config file and adding or enabling the following in the [mysqld] section.
log_slow_queries = /var/log/mysql/mysql-slow.log
long_query_time = 2
This is not free though, all logs take some overhead to write to the filesystem, while not massive, it can make a difference on a loaded filesystem.
Simply turning on the logging isn't going to speed anything up, but it does give you the tools to see what is slowing it down. I will cover the actual optimisation of a query in a later in this article.
Turn On Logs For Unindexed Queries
Further to the slow query log, you can go one better and log all queries where indexes aren't used. You can do this my putting the following in the [mysqld] section of the my.conf file.
log-queries-not-using-indexes
While these queries aren't always slow to the extent that they get in the slow log by default, they are normally very inefficient and as such should also be candidates to be optimised.
Simplify Your Permissions And Remove Resource Limits
MySQL is fairly intelligent when it comes to permissions. If you haven't set any table-level or column-level restrictions for any user, then it will skip bothering to check them at all.
The same thing goes with resource limits for a user. If you don't restrict the number of connections per hour or the number of queries per hour, then MySQL wont bother keeping track of these values in the first place.
While not a massive overhead, on a heavily loaded server they can make a big difference.
Avoid Mixing Reads and Writes on MyISAM tables
MySQL uses a cache to speed up access to data in tables. This works very well when you are only reading from the table as each read can benefit from the cache left from the read before, meaning that the more reads that are issued the faster they get up to a certain point. However, as soon as a write (INSERT/UPDATE/REPLACE etc) then this cache is flushed and reads are slow again.
If you must mix writes but knowning when the data is written is unimportant, then INSERT DELAYED can help for MyISAM based tables. This will allow writes to be queued up (configurable in my.cnf (delayed_insert_limit and delayed_queue_size)). What you can get away with depends on your application.
Avoid Unnecessary Connections
Each connection has a cost. For a connection with only a single query sent over it, this can be up to 30% of the total effort required to service the request. This is one of the reasons persistant connections are popular.
Not every language has them, and they aren't usable in all cases, but they can make a massive difference where they can be used.
PHP has the mysql_pconnect() function. This is almost a drop in replacement for the mysql_connect() function, however if a connection is already present it will use that instead of opening and closing another connection to the database.
Analysing Query Execution
With any of the slow or unidexed queries identified in the above sections you can do the following.
Take the query as is, prefix it with EXPLAIN, so you should have something similar to the following
EXPLAIN SELECT * FROM table1, table2 WHERE table1.id = table2.table1_id;
Upon executing this query you should get a response from the server. This will contain at least one row for every table touched in the query; table1 and table2 in this example.
ie.| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
|---|---|---|---|---|---|---|---|---|---|
| 1 | SIMPLE | table1 | ALL | id | NULL | NULL | NULL | 17 | |
| 1 | SIMPLE | table2 | eq_ref | PRIMARY | PRIMARY | 4 | exampledb.table1.id | 1 |
As you can see, the first row has one possible key, however no key was available to be used. This will have slowed down the query. If we add a key for this column, then we should be able to increase the speed of execution and reduce the load on the server.