It was not long ago that when developing an application all you needed was a relational database to persist and query your data. However, the current generation of applications are more demanding in terms of scale, data flexibility, automation, etc. This requires picking the right database to fit these workload requirements, which will enable app developers to focus on adding business value rather than constant maintenance of a sub-optimal database.
Just as video streaming platforms like Netflix needs to cache data at various stages so that millions of viewers can watch their shows without a lag, an ecommerce store needs to be able to let their users search for products in their catalog at a blazing speed. These applications require versatile means of storage, whether it’s a caching layer, full-text search, or an online transactional processing (OLTP) database.
Moreover, new paradigms like serverless programming are slowly encroaching into the storage layer as well. It’s fundamental for the developers, data modelers, and architects of today to incorporate these new paradigms into their tool belts.
NoSQL databases are part of this modern architecture and emerging as a leader in meeting these challenging workload requirements and fit well with the cloud native approach. Before we can understand the need for NoSQL, we must discuss challenges associated with relational databases (RDBS), now considered the legacy solution. Relational databases are a mature technology, having been around for over 40 years. There is substantial expertise available in the market if you are modeling relational data. SQL is popular because it is a powerful, entry-level tool which can answer complex questions in seconds or minutes. However, SQL databases have two major limitations that can be solved by NoSQL.
1. The RDBS Scalability Challenge
One limitation of RDBS is its poor scalability. How will they behave when your application grows and needs to handle terabytes of data? Will they perform as well when their size is 10 TB vs. 10 GB? Likely not, as its performance declines when the data size grows. Thus, relational databases scale poorly.
To combat this problem, we could beef up hardware and install expensive SAN storage for large amounts of data. Unfortunately, this component becomes a single point of failure for your organization. You would need the best database administrators to maintain the system, which is a losing battle as the data continues to grow.
In order to gain performance in relational databases, we can use read replicas to separate the read and write workloads. However, as the data size grows, the queries become slower and manual intervention is required to tune the database. Horizontal scaling is not a viable option for most relational databases as they were designed for a pre-cloud era to run on a single node.
The NoSQL Solution:
How does a NoSQL database solve the scalability problem better than RDBS? Many NoSQL databases can scale infinitely as the storage engine is a distributed hash table that can be spread across a cluster of commodity hardware. The diagram below shares how data partitions are spread across different nodes and how a primary key is mapped to a specific partition. Comparatively, RDBS can only be implemented as single monolithic instances.
Horizontal partitioning and load balancing in NoSQL
When querying in NoSQL, you use the key column to look up which node the data live on in the hash table, which has an algorithmic efficiency of O(1), which is the fastest. This means your application performance will remain the same whether your database size is 10 GB or 10 TB and if it runs on a single node or a 1,000-node cluster.
NoSQL databases can horizontally scale across multiple nodes because they use log structured merge-trees (LSM), which are append only and use sequential IO writes that are much faster than the random IO writes used by RDBS-balanced trees (B-trees).
Not having to worry about how your database size impacts the application would enable your organization to scale as quickly as possible, without incurring additional costs.
2. The Inflexible Schema Challenge
Another challenge for SQL databases is handling frequent schema changes. For example, an online retailer may have different product attributes that are only relevant to specific types of products (e.g., amount of RAM or SSD type for computer parts or fabric type, color, and size for apparel). Managing all these specific attributes in normalized tables can become cumbersome and makes them brittle.
You would need to constantly alter your table to add extra columns, and soon you will exceed the maximum column limitations imposed by the RDBS engines. Storing all these specific attributes as columns is also not an efficient use of disk space.
The NoSQL Solution:
NoSQL storage engines were designed to address inflexible schema concerns. They are organized into document databases in which records are encapsulated in a single JSON object. For example, while a RDBS may store information within an applicant’s resume in various normalized tables, a NoSQL database would capture this in a discrete record.
Wide column databases are powerful, since they can have both common columns for all records and flexible columns for specific records (similar to attaching key-value labels to a record). A wide column database like DynamoDB or Cassandra could be the tool your organization needs to effectively manage your semi-structured flexible data.
NoSQL databases are tools that can be easily implemented by any organization, by adopting managed, cloud-based services like AWS, DynamoDB, MongoDB, Cassandra, etc. Managed NoSQL databases have the power to free database administrators from various tasks like capacity planning and maintaining indexes and statistics for better performance. They can now focus on picking the right database that fits their workload. Developers don’t have to worry about handling flexible data structures for an evolving application. They can instead focus on developing features that solve their customers’ problems.
If you’re interested in continuing the discussion on how NoSQL databases can unlock value for your organization, please reach out at firstname.lastname@example.org.