Obviously we can never test infinite scalability. But I can tell you that by design there's nothing in the system that's limiting your growth. You can extrapolate that by looking at the design. The Google File System used to scale to 10,000 nodes and even more, but beyond that most enterprises don't care. But if they want more, we can give the more.
What does your product, the Cohesity Data Platform, actually consist of?
Our building block is a 2U appliance with 4 server nodes in it (though you can get them with 3 nodes), each of which has a dual 10Gbps network connection. Cumulatively, the storage on that clustered system is 96TB of hard drives and 6TB of SSD storage. Software comes integrated with the system. [While the early access program is focused mainly on backups of unstructured data in VMware environments, support for structured data in Oracle environments should be available by the time of general availability in Q3 or Q4, Aron says. Down the road, look for easier integration with various network devices as well, he says.]
What might your system replace?
If you look at secondary storage today there is a lot of fragmentation. We have test and development, data protection and analytics environments floating around. Even within data protection you have workflows like backup software, storage, tape/archival and cloud storage, all catered to by different vendors. Our vision is to converge these workflows onto one platform, and when you accomplish that you can see your sprawl going away. In the last 10 years, whatever innovation has been done in secondary storage has really addressed just a point problem, like de-duplication or copy data management. I think we are the first to comprehensively look at this whole space. One side benefit of this is that all the data that sits in secondary storage is "dark". You have no insights into it. By virtue of the fact you're converging analytics on this we can light up your dark data. This solution is aimed at disrupting and displacing most if not all of these secondary storage products. Some we can partner with though. We come with integrated analytics but we also wish to expose our underlying distributed file system to others like [Hadoop Distributed File System].
Is primary and secondary storage typically found on different sorts of devices?
There's a hardware and software difference. For hardware, primary storage is for mission critical stuff, so you'll go buy an all-flash appliance for that to make sure you meet your SLAs. But for secondary that makes no sense, that would be extremely costly for data that you'll hardly ever touch. That can go on cold storage, like tape. If you try to mix this in one appliance the pricing becomes very strange.
Sign up for CIO Asia eNewsletters.