Switching to data reduction, the great benefit of this approach is that it creates very little system impact at all. The key is that data is never hashed, so there are no high impact calculations being performed. Instead, block changes are logged as they happen. In terms of system impact, this tracking process barely even registers on a server. When it's time to run the backup, the changed blocks are already known and they can be moved very efficiently via a block-to-block copy (sometimes dedupe products will move data via the file system, creating another point of stress).
An added benefit of data reduction: because impact is so low, you can usually run multiple backups per day without much trouble, giving you better recovery points. The size of the files, like databases, make no difference because nothing is done at a file level. With data deduplication it is close to impossible to protect a database more than once a day because the hashing process creates too much impact and takes too much time.
So which to choose? As always, it depends on your specific needs. If reducing the impact of backup processing is a major concern, then data reduction is clearly better. If minimizing bandwidth utilization is key, then deduplication may be better, though the difference will depend on application characteristics.
But don't forget that backup is only part of the data protection challenge. You have to look at recovery scenarios as well and there again you can find large differences between products and techniques. As a technology buyer, always do your homework, ask lots of questions, and focus like a laser on yourspecific server and application mix. A product may sound great "in general," but nobody uses a product "in general." You use it in your data center, not someone else's data center. And if a vendor can't explain to you precisely how their product will help specific to your own environment, then the only thing left to show them is the door.
Sign up for CIO Asia eNewsletters.