The primary reason for any shift to the cloud is its scalability capabilities. In previous blogs I have discussed the scalability of applications using Windows Azure. The focus now is on the scalability of the data off which these applications run. If you do not plan for data scalability, then you could end up architecting a solution like this:
In this scenario, although the application is load balanced and scalable, the data center is still a bottleneck because it does not scale well. The application can only perform as well as the data is able to read/write.
In Windows Azure, the maximum database size is 50GB. The reason for this? The idea is that the data should be ‘partitioned’ by logical divisions that make the data as scalable as the application is. Essentially the goal would be to change the above scenario to one more like this:
In this scenario, the data spans across several databases and possibly across datacenters. This is not redundancy, although Windows Azure certainly supports it, this is partitioning of the data. Each of these four databases holds a portion of the total data for the application, and to get the most out of this separation, careful thought must be placed into how the data is divided.
So, how should the data be divided? Geographic region is usually a good first thought. For instance, consider an application that serves up data for a distributorship. Let’s say that this distributorship has 5 geographic regions that it serves. The Windows Azure application could use 5 separate SQL Azure databases, rather than one large one. In each database, the data for each individual geographic region is stored (accounts, contracts, sales, etc). The application would have access to all 5 of the databases; however, the vast majority of queries to the data will only need to hit one of these databases. Dividing the data in this manner has made it significantly more scalable than one single database connection would be.
The catch here is that it is extremely important that a great deal of thought is put into how to divide the data. If the application is often doing read/writes on data from several of these databases, then it could actually slow the application. Every application will be different, and once the decision is made, it will likely be difficult to change the way it is partitioned. But done correctly, the application will scale much better.
Modernize applications and support business initiatives with Microsoft Azure