Define the parameters for your project, such as data (what kind, how much, size, format, sources), how you are going to use the data, what kind of growth are you projecting, how many concurrent users on your site, performance, uptime, etc. Know what criteria is essential to your business requirements and rank them in order of importance. As you can see, this is a long list, but it will help you in your evaluation, by allowing you to ask the right questions.
Some considerations when evaluating your solution:
* Scalability. There are many aspects of scalability. For data alone, you need to understand how much data you will be adding to the database per day, how long the data is relevant, what you are going to do with older data (offload to another storage for analysis, keep it in the database but move it to a different storage tier, both, or does it matter?), where is this data coming from, what needs to happen to the data (any pre-processing?), how easy is it to add this data to your database, what sources is it coming from? Real time or batch?
In some situations, your overall data size stays the same, in other situations, the data continues to accumulate and grow. How is your database going to handle this growth? Can your database easily grow by adding new resources, such as servers or storage space? How easy will it be to add resources? Will the database be able to redistribute the data automatically or does it require manual intervention? Will there be any down time during this process?
How many servers and what kind of disk capacity are required to handle the data you will store? Too many servers translates into higher hardware, data center and personnel costs. In some situations there may be significant peaks and valleys in your data usage, such as ecommerce on Black Friday (holiday shopping in December). How easy is it to scale up and down in size? Can the cloud be used during the periods of higher resource usage?
You must be able to make projections about all aspects of your data and database growth. No matter how well a database can do all these things, you should still do continuous monitoring of resource usage so you can proactively scale up to stay ahead of your usage and not overload the database.
* Uptime. Applications have different requirements of when they need to be accessed, some only during trading hours, some of them 24x7 with 5 9’s availability (though they really mean 100% of the time). Is this possible? Absolutely!
This covers a number of features, such as replication, so there are multiple copies of the data within the database. Should a single node or storage device go down, there is still availability of the data so your application can continue to do CRUD (Create, Read, Update and Delete) operations without interruption, which is Failover, and High Availability.
Sign up for CIO Asia eNewsletters.