Correctly sizing a disk backup with deduplication to meet your current and future needs is an important part of your data protection strategy. If you ask the right questions upfront and analyze each aspect of your environment that impacts backup requirements, you can avoid the consequences of buying an undersize system that quickly exceeds capacity.
First and foremost, it's important to understand that this sizing exercise is different than the process of sizing a primary storage system. In primary storage you can simply say, "I have 8TB to store and so I will buy 10TB." In disk-based backup with deduplication, a sizing exercise must be conducted based on a number of factors. Here's what to consider:
* Data types. The data types you have directly impact the deduplication ratio and therefore the system you need. If your mix of data types is conducive to deduplication and has high deduplication ratios (e.g., 50:1), then the deduplicated data will occupy less storage space and you need a smaller system. If you have a mix of data that does not deduplicate well (i.e., 10:1 or less data reduction), then you will need a much larger system. What matters is what deduplication ratio is achieved in a real-world environment with a real mix of data types.
* Deduplication method. The deduplication method has a significant impact on deduplication ratio. All deduplication approaches are not created equal.
Zone-level with byte comparison or alternatively 8KB block-level with variable length content splitting will get the best deduplication ratios. The average is a 20:1 deduplication ratio with a general mix of data types.
64KB and 128KB fixed block will produce the lowest deduplication ratio, as the blocks are too big to find many repetitive matches. The average is a 7:1 deduplication ratio.
4KB fixed block will get close to the above but often suffers a performance hit. A 13:1 deduplication ratio is the average with a general mix of data types.
* Retention. The number of weeks of retention you keep impacts deduplication ratio as well. This is because the longer the retention, the more the deduplication system is seeing repetitive data. Therefore, the deduplication ratio increases as the retention increases. Most vendors will say that they get a deduplication ratio of 20:1, but when you do the math, that is typically if the retention period is about 16 weeks. If you keep only two weeks of retention, you may only get about a 4:1 reduction.
Here is an example to highlight this: If you have 10TB of data and you keep four weeks of retention, then without deduplication you would store about 40TB of data. With deduplication, assuming a 2% weekly change rate, you would store about 5.6TB of data, so the deduplication ratio is about 7.1:1 (40TB ÷ 5.6TB = 7.1:1). However, if you have 10TB of data, and you keep 16 weeks of retention, then without deduplication you would store about 160TB of data (10TB x 16 weeks). With deduplication, assuming a 2% weekly change rate, you would store about 8TB of data, which is a deduplication ratio of 20:1 (160TB ÷ 8TB = 20:1).
Sign up for CIO Asia eNewsletters.