This vendor-written tech primer has been edited by Network World to eliminate product promotion, but readers should note it will likely favor the submitter's approach.
The economics, scale, and manageability of cloud storage simply cannot be matched even by the largest enterprise datacenters.
Hyperscale cloud storage providers like AWS, Google and Azure dropped prices by up to 65 percent last year and promised a Moore's Law pricing model going forward. AWS provides eleven 9's of durability, meaning if you store 10,000 objects with Amazon S3, you can, on average, expect to incur a loss of a single object once every 10,000,000 years. Further, Amazon S3 is designed to sustain the concurrent loss of data in two facilities by storing objects on multiple devices across multiple facilities.
Unfortunately, up until recently, cloud storage is really only useful for the data you don't use instead of the data you actually use. In other words, cloud storage is cheap and deep but hasn't been able to offer the performance of local storage. For cloud storage to be useful for unstructured data storage, it needs to provide equivalent flexibility, performance and productivity as enterprise storage systems. The cost advantage by itself, compelling as it is, simply won't be enough.
In order to use the cloud for both your active data and your inactive data, it has to feel equal to or better than the local filers that are already in place. For this to happen, the following key requirements must be in place:
* Cache locally: Given the user expectation of LAN-like file access times, active data needs be cached locally while inactive data is stored in the cloud. While most data isn't access very often and is perfectly suited for the cloud, active data needs to remain close to the user. Machine learning based on file usage, "pinned folders," or a combination of both methods needs to be employed to make sure the right files are cached locally while less used files recede back into the cloud.
* Global deduplication: Global deduplication ensures that only one unique block of data is stored in the cloud and cached locally. With the commonality of blocks across files, global deduplication reduces the amount of data that is stored in the cloud and sent between the cloud and the local caches as only changes are stored and sent. For example, when Electronic Arts centralized its data in cloud storage, its total storage footprint dropped from 1.5PBs to only 45TBs. The time needed to transfer 50GB game builds between offices dropped from up to 10 hours to minutes as only the changes to the builds were actually sent.
Sign up for CIO Asia eNewsletters.