Scaling with MySQL replication

№ 178JUL 13, 2007

To deal with Drupal's growth, we're adding a second database server to drupal.org which is useful for at least two reasons. First, we'll be able to handle more SQL queries as we can distribute them between multiple database servers (load balancing). Secondly, this new server can act as a "hot spare" that can immediately take over if the other database server fails (high availability / fail-over).

The current plan is to configure both database servers in master-slave configuration, which is the most common replication model for websites. This model provides scalability, but not necessarily fail-over. With a master-slave configuration, all data modification queries (like INSERT, UPDATE and DELETE queries) are sent to the master. The master writes updates to a binary log file, and serves this log file to the slaves. The slaves read the queries from the binary log, and execute them against their local copy of the data. While all data modification queries go to the master, all the read-only queries (most notably the SELECT queries) can be distributed among the slaves. By following this model, we can spread the workload amongst multiple database servers. And as Drupal.org's traffic continues to grow, we can scale horizontally by adding more slaves.

While MySQL does the database replication work, it doesn't do the actual load balancing work. That is up to the application or the database abstraction layer to implement. To be able to distribute queries among multiple database servers, the application needs to distinguish between data modification queries and read-only queries.

Care needs to be taken, as the data on the slaves might be slightly out of sync. It may or may not be practical to guarantee a low-latency environment. In those cases, the application might want to require that certain read-only queries go to the master.

There are different ways to accomplish this:

Drupal executes all SQL queries through db_query(). Traditionally, big Drupal sites manually patched db_query() to use query parsing (regular expression foo) to separate read queries from write queries. This is not convenient and it doesn't provide a good solution to deal with lag. Fortunately, work is being done to provide better support for database replication in Drupal 6. It's our intend to backport this to Drupal 5 so we can use it on drupal.org until Drupal 6 has been released and drupal.org has been upgraded to use Drupal 6.
MediaWiki, the software behind Wikipedia uses $db->select() and $db->insert(). They have some documented best practices to deal with lag.
Neither the Pear DB database abstraction layer or its successor Pear MDB2 seem to support database replication.
Wordpress uses HyperDB, a drop-in replacement for Wordpress' default database abstraction layer that provides support for replication. It was developed for use on Wordpress.com, a mass hosting provider for WordPress blogs. Because HyperDB is a drop-in replacement, they don't have a clean API and just like Drupal 5, they have to use query parsing to separate read queries from write queries. It's not clear how they deal with lag.
Joomla! 1.0 does not separate read queries from write queries, but Joomla! 1.5 will use functions like $db->insertObject() and $db->updateObject(). Joomla! 1.5 won't support replication out of the box, but their API allows a clean drop-in replacement to be developed. It's not clear how they would deal with lag.
PostNuke uses the ADOdb database abstraction library, which at first glance does not support database replication either.
Java applications use the statement.executeQuery() and statement.executeUpdate() that are part of the standard class libraries. It's not clear how they deal with lag.

What other popular applications support database replication, and what do their APIs look like? I'd like to find out what the ideal API looks like so we can still push that for inclusion in Drupal 6.

Based on the research above, I think we should get the best of all worlds by introducing these three functions (and deprecating db_query()):

db_select_slave(); // results might be slightly out of date
db_select_master(); // results always up to date
db_update(); // for UPDATE, INSERT, DELETE, etc

Even if they don't actually do anything useful in Drupal 6, and just map onto the deprecated db_query(), they set a better standard, and they allow for a drop-in replacement to be developed during Drupal 6's lifetime. Ideally, however, Drupal 6 would ship with a working implementation.