I’ve recently been working with Redis and Mongodb more, and I wanted to write up the blog post that I wish I had read a few weeks ago. There’s a couple of concepts in databases - replication and sharding - that are quite well explained for individual database technologies. I have not however, been able to find a general explanation (database agnostic) for these concepts.
Replication is what it sounds like, you replicate your data across multiple nodes in a cluster of servers. If your database supports replication, it has some election process in the cluster to elect a master. The usual scheme is that all writes and reads go to the master and then those writes and reads get propagated asynchronously to all the other members of the cluster (slaves).
Each database makes different guarantees on when this data will be consistent between all nodes. What exactly happens when a master goes down and one of the slaves takes over is also different between most systems. In general though the data is eventually consistent between database nodes, and when a master goes down the remaining slaves hold an election. A new master is elected by consensus and that master takes over serving all requests.
There’s some tricky stuff about how replicas handle network partitions in which there can be two different masters in the cluster for an indeterminate amount of time, and how the data is merged back together. That’s outside the scope of this post, but you can read more about how Redis handles this type of situtation by checking the Consistency under partitions section.
Sharding sounds sort of complicated but it’s a really simple concept, if a bit difficult to actually pull off. In applications using a traditional stack (like LAMP for instance) you have a single database that’s serving requests from a set of clients (maybe web API servers that are making queries against it). As you scale up the API servers, you put more and more load on this database server. Since it’s only one server, your options for scaling it up are limited to getting a bigger and bigger machine.
If your on premises, this gets tough fast. You have to keep making room in your rack and re-racking/re-purchasing bigger hardware for this database server. If you didn’t buy big enough up front, then you could have serious problems.
If you’re using a cloud provider, this also gets difficult. You have to keep upping the instance size and there’s downtime for taking your database offline. You can do this after hours but if your product is used globally, it’s downtime no matter what you do. The cost of increasing the instance size can also be prohibitively expensive past a certain point.
Faced with this problem, what would the ideal solution be. Well what do you do if you need to add more capacity to your API servers? Throw a new server behind the load balancer and boom, instant capacity. You can do the same thing here (with some caveats) and that’s sharding.
With sharding, the idea is to spread the load on your big DB instance across a number of smaller DB instances. You do this by:
- Choosing a good key to shard on (a way to distribute the data)
- Choosing a scheme to route queries against that data to the correct instance
- Bringing up the right number of instances
Say for example that I have a database with one billion records. I can stick those billion records in one database, or I can choose to distribute those records across a cluster of five servers with 200 million records each.
+-----------------------------------+ | | | Shard 1 - Records 1-200M | | | +-----------------------------------+ | | | Shard 2 - Records (200M + 1)-400M | | | +-----------------------------------+ ... +-----------------------------------+ | | | Shard 5 - Records (800M + 1)-1B | | | +-----------------------------------+
You can do this with no support from the database drivers themselves. You just need some code to figure out which server a particular record would be on for any given query. If you want to abstract this, you can even make a makeshift load balancer which will route queries to different shards and return the result (as long as that process would not consume too many resources that is).
Choosing a good sharding key is the hardest part of the process. You don’t necessarily want to distribute the data evenly (data only consumes storage after all), you want to distribute the access (and thus the load generated on memory and CPU) evenly across all servers. It’s possible to do sharding with a skew of load against different shards and then buy bigger instances for shards which require more resource, but I think getting an even distribution is a better objective in general.
This isn’t quite as easy as adding a new server behind a load balancer and calling it good. Every new server that gets added has to take records from some other servers in order to distribute the load (re-sharding). If you have to re-shard you might need to take some downtime, or figure out a way to live migrate the documents and serve some type of error message for users affected while the migration for records they are searching for is in progress. It really depends on the individual use case, but in almost all cases adding a new database server even in a sharded setup is painful.
When to use each
The simple answer is - Replication is for high availability and Sharding is for performance reasons. Using replication, you are protecting yourself against the possibility that your database (a single point of failure for your application) goes down for any reason. If it does, it’s fast to choose a new master and database drivers for databases that support replication can handle this situation automatically.
With Sharding, you distribute the data across multiple nodes, but each node holds some unique piece of data that no other node has. If one of the nodes go down, you loose access to whatever records were sitting on that shard. It’s true that you have some additional fault tolerance by dint of having your data distributed but each shard is not fault tolerant on it’s own.
For high traffic systems, replication and sharding are two great tastes that go great together. In the previous example with 1B records for instance, I might choose to shard 200M records to each database server and then make each shard a replica set with 3 servers each. That way I get better performance across the masters of the cluster and each master is fault tolerant.