Redis has often been praised for its quick speed, its ability to store data types such as lists or sets, and because it offers persistence, unlike other in-memory key value stores. It has been widely popular in the tech community and has been utilized heavily by Pinterest, Twitter, StackOverflow, and many others. However, as the applications using Redis have grown and as Redis has developed into a more mature and integral portion of a wide array of architectures, it has become clear that two important features were missing: horizontal scalability and high availability.
Horizontal scalability, or scaling by adding more nodes to a system, is especially important in Redis because each instance can only be scaled vertically. That is, the only way to improve the performance or capacity of a Redis instance is to give the individual process more CPU or memory, which can quickly become expensive and unmanageable. You can scale horizontally in Redis by partitioning data across multiple Redis instances.
Several users and companies have rolled their own partitioning implementations for Redis. Most of these rely on client side partitioning. In client side partitioning, clients are responsible for determining where to write or read a given key. The main downside to this approach is that each client must implement the same partitioning strategy, and multiple clients using the same Redis instance will become tightly coupled.
Another partitioning strategy that decouples the clients and the instances is called proxy assisted partitioning. Twitter’s Twemproxy takes this approach. In proxy assisted partitioning, a proxy that looks like a single Redis instance sits between the clients and the Redis instances, managing the partitioning. While this decouples the clients from the routing logic, the proxy introduces a single point of failure into the application.
Another issue is adding or removing nodes in the future. What happens when you need to repartition your data? If you are using Redis as a cache, the answer is not much. The partitioning rules are simply changed to point each key to its new home, and the data will be refreshed the next time it is requested. If you are using Redis as a data store, however, just changing the partitioning rules will result in lost data, as keys will be reconfigured to point to different instances that won’t initially contain the data. The data must be migrated.
When you are relying on a database to power a large enterprise application, downtime is unacceptable. Modern databases need to be fault tolerant and resist failure as much as possible without human intervention. Though crashes are rare in Redis instances, there are error cases that can cause instances to fail or be unreachable.
Since there was no good solution for achieving high availability (HA) with Redis, some companies tried to roll their own. Many others went without (notably StackOverflow). Seeing that these solutions were highly specialized and that regular users were being left without HA options, the programmer of Redis developed an intermediate solution called Redis Sentinel. Redis Sentinel is an HA solution that allows users to configure master Redis instances with a list of replicating slaves. The tool then monitors the masters and votes to promote slaves when masters die. It is a decent HA solution, but it does introduce some additional complexity. It does not include a solution for scaling horizontally between its masters, and it requires at least three additional servers to be set up on machines that will survive independently of the Redis instances.
Philosophy of Redis Cluster
In response to the problems listed above, Redis Cluster was released with the 3.0.0 release of Redis on April 1, 2015 after over four years of development. According to the founder, it is: “…basically a data sharding strategy, with the ability to reshard keys from one node to another while the cluster is running, together with a failover procedure that makes sure the system is able to survive certain kinds of failures.”
The design of Redis Cluster is more advantageous than many of the other solutions described above, mainly because it is not a standalone application or proxy. It is a built-in mode of operation for the Redis instances themselves. That is, the nodes handle all the sharding, replication, and routing. This means you can interact with the Redis instances almost exactly as before and the routing mechanism is centralized. Since the configuration is shared among all of the Redis instances, it is guaranteed to have the same uptime as the Redis instances themselves.
Redis Cluster was designed as a general solution for high availability and horizontal scalability for all users of the database. As such, it was designed from the ground up with the major value additions to Redis in mind: performance and a strong data model. Because of this, Redis Cluster implements neither true availability nor consistency of the CAP theorem.
True consistency is given up in favor of performance. The slowdown incurred by executing all writes synchronously would violate the performance guarantees that Redis makes. All replication is thus performed asynchronously. It is possible to lose data if a failure occurs after a master has acknowledged a write but before replication has completed.
Similarly, strong availability is given up in favor of retaining Redis’ solid data model. Strong availability requires that all writes be successful, even when the nodes cannot reach each other across the network. This means that in the case where there are two masters for the same data (e.g., a master is split from the cluster, one of the master’s slaves is promoted in its place, and the master is still reachable by clients), both must accept writes. When the old master rejoins the cluster, the data from both must be merged. Redis never actually merges data because it is difficult to correctly merge lists, sets, and some of the other advanced data structures Redis supports. Instead, it will always accept the most recently written state of the data. Luckily, there is only a small window of time when these network partitions can occur because Redis masters stop accepting writes as soon as they recognize they have been separated from the cluster.
Overall, the design of Redis Cluster is always focused on providing as much availability and consistency as possible while retaining Redis’ speed and strong data model. It won’t be perfect for all cases, but it is perfectly reasonable for most applications that would otherwise use Redis.
How It Works
Redis Cluster is an active-passive cluster implementation that consists of master and slave nodes. The cluster uses hash partitioning to split the key space into 16,384 key slots, with each master responsible for a subset of those slots. Each slave replicates a specific master and can be reassigned to replicate another master or be elected to a master node as needed. Replication is completely asynchronous and does not block the master or the slave. Masters receive all read and write requests for their slots; slaves never have communication with clients.
In addition to sharding at database inception, Redis Cluster supports resharding, or migrating keys to different instances. Unlike most other solutions, however, it also supports migrating the data associated with those keys. This is especially valuable if you are using Redis as a data store and need to add or remove masters. Redis Cluster will manage the transition and ensure that all of the data is available throughout and after the migration.
Each node in a cluster requires two TCP ports. One port is used for client connections and communications. This is the port you would configure into client applications or command line tools. The second required port is reserved for node-to-node communication, or gossip, that occurs in a binary protocol and allows the nodes to discuss configuration and node availability.
When a master fails or is found to be unreachable by the majority of the cluster as determined by the nodes’ communication via the gossip port, the remaining masters hold a vote and elect one of the failing masters’ slaves to take its place. If the new master does not have a slave after the election, one of the redundant slaves can be reshuffled to replicate the new master using replicas migration. When the failing master eventually rejoins the cluster, it will join as a slave and begin to replicate another master.
Changes from Redis
Redis Cluster allows you to use almost all of the original functionality of Redis with two major exceptions:
Multi-key operations such as set operations only work if all of the keys involved in the operation are contained on the same node. These operations require all of the involved keys to be in memory, and copying the data across the network is not currently supported. There are ways to ensure that keys are hashed to the same node, but the best bet is probably to perform these operations on the client as you would with any other sharding solution.
Additionally Redis Cluster does not support multiple databases. The non-distributed version of Redis allows you to have separate databases running on the same instance. The main purpose of this is to store different kinds of data in different databases to avoid collisions if you have a monolithic Redis instance. Multiple databases should no longer be necessary with Redis Cluster since monolithic instances are discouraged in favor of distributed, targeted clusters.
Redis Cluster is a high availability and horizontal scalability solution for Redis.
The cluster is self-managed. It abstracts configuration away from clients and removes the single point of failure introduced by proxy servers.
Redis Cluster maintains the performance and strong data model you know and love from Redis.
It does not provide perfect availability or consistency, but it does provide pretty good guarantees that make it a solid solution for many projects.