Should I Containerize My Database?

JW Walton

The use of containers, isolated processes running on the same kernel, is dramatically changing the world of software development. The benefits are easy to see: from increased velocity of deployment to smaller attack surfaces, and consistency of delivery with greater horizontal scale, one might wonder, “Why not run everything in containers?” That is a great question, and one our clients constantly ask. Unfortunately, the answer isn’t always black and white, but we have developed some guidelines based on successful engagements that can help guide your decision.

First, it should be pointed out that just putting any part of your application in a container comes with overhead like maintaining a container orchestration layer. Most of the time though for things like a front-end web app or a stateless API layer the benefits easily outweigh the cost of getting a container environment up and running. However, as you start to look at the bottom of the application stack, the cost/benefit analysis of containerizing a database service becomes more complicated. So this is where our guidance is focused.

Two Types of Database Applications

We think of data management solutions in two broad classes:

Solutions that rely on vertical scaling and resiliency at a storage level (e.g., RAID). These include traditional RDMS solutions such as MySQL, PostgreSQL, and SQL Server
Solutions that scale horizontally and handle resilience at the application level. These include “NoSQL” solutions such as ElasticSearch or Hadoop based solution

Vertical Scaling Solutions

Generally, vertical scaling solutions like MySQL, Postgres, Microsoft SQL, etc., should not go in containers. These database platforms require high I/O, shared disks, block storage, etc., and are generally storing persistent data. They are generally not designed to handle the loss of a node in a cluster gracefully, which often happens in a container-based ecosystem. While it is possible to overcome most of the obstacles to using these platforms in a container by mapping persistent storage from outside the cluster or mapped volumes on each cluster node, this adds to the complexity of the overall solution and leaves you with all the same challenges (i.e., preserving the persistent data locations, I/O performance). The best course with these database platforms if you are in the cloud is to use your cloud provider’s Platform as a Service offering, where you gain the flexibility of containers without the added complexity. If you’re not in the cloud, just keep these databases running on physical or virtual machines where you have granular control of the I/O and storage.

Horizontally Scalable Applications

For horizontally scalable applications (Elastic, Cassandra, Kafka, etc.), containers can and probably should be used. The defining characteristics for database applications that fall into this category are that they can withstand the loss of a node in the database cluster and the database application can independently rebalance. It should be noted that getting these platforms working well in a container cluster will take more planning and design consideration than a web app. For example, great care must be taken to ensure the minimum database nodes for a quorum, container hosts must be sized appropriately, and database containers should be distributed across multiple hosts of the container cluster to ensure preservation of the data in case of a node failure. The good news is most container orchestrators have tools to address these requirements.

Which Type Is My Database Application?

Often, you will hear advice like “put NoSQL databases in containers.” We would approach this advice (and the advisors!) with caution. You can’t necessarily decide by the type of database. For example, some graph databases like Neo4j should not be deployed in a container because they don’t use redundant storage techniques, whereas other graph databases such as Giraph could work well in a container. The key is to understand how your solution manages and stores data. One good indicator you are probably in the vertically scaling camp (and should not containerize) is that your solution recommends RAID storage.

Containers for Development

There is one area where it makes sense to run any and all applications in a container, in your development environment. Not only can this provide consistency across different developer’s environments but it also allows for faster development environment setup and can include unique extras like pre-seeding a development database image with test data that can be reliably reset to a known state. The concerns around running vertically scaling solutions don’t usually apply to development environments. And as an example of how easy it can be to set up your database application in a container, all three of the platforms called out above as vertically scaling (i.e., MySQL, Postgres, and SQL Server), have base images in the Docker Store so getting started working with them in a development environment is as easy as pulling the public image and starting it in your development container environment.

Get Help With Containers

The process of containerizing an application can be a complex undertaking depending on many factors that are often unique to each stack. Credera has helped a number of clients not only deploy their applications and data in containers, but also, re-architecting legacy or monolithic applications to benefit from a microservices architecture that works well with containerized deployments. If you have questions about containerizing data or modernizing your application we would love to help, contact us at marketing@credera.com or in the form below.