Online backup service Backblaze frequently provides interesting storage analysis based on hard drive statistics gathered from its data center. We’ve seen Backblaze figure out the most reliable hard drives based on tests in 2014 and then again last May. Now the company's talking about how it determines if a hard drive is likely to die, another return to a topic broached in 2014.
Every day Backblaze retrieves a SMART error report for each of the more than 67,000 hard drives the company has in the company’s Sacramento data center. The company tracks five specific SMART errors that it says are the most helpful in figuring out whether a hard drive is about to fail.
In the company’s experience, 76.7 percent of its failed hard drives reported at least one of these five SMART errors before kicking the bucket—a statistically large number, though that still means that 23.3 percent of Backblaze's dead drives gave no warning at all before failing. Meanwhile, only 4.2 percent of still-operational drives have reported these five SMART errors.
If you’re not familiar with SMART it stands for Self-Monitoring, Analysis, and Reporting Technology. It’s a self-analysis feature built into modern hard drives. The catch is you usually need third-party software to retrieve your hard drive’s SMART report—though you can also fetch the report via the command line.
The five key SMART errors Backblaze tracks are the following:
SMART 5: Reallocated sectors count
SMART 187: Reported uncorrectable errors
SMART 188: Command timeout
SMART 197: Current pending sector count
SMART 198: Uncorrectable sector count
The last two errors are similar, but Backblaze includes them because some hard drive makers include one error in their reports but not the other.
Each SMART error reports a “raw value” when it happens. Unfortunately, this error number can vary by vendor. Regardless, all Backblaze needs to track is whether that raw value is more than zero. If it is, then the company takes a harder look at what’s going on with that drive.
So how can a regular user employ these error reports to know what’s up with their hard drive? First, if you want to use these SMART errors you need to keep a record to see how many of these errors are reported over time. That gives you an idea of how serious the problem is.
As BackBlaze’s Andy Klein, director of product marketing, pointed out in the company’s blog post, a hard drive is more likely to fail if it “jumps from zero to 20 Reported Uncorrectable Errors (SMART 187) in one day” as opposed to a hard drive that reports one SMART 187 error every month for five years.
Sign up for CIO Asia eNewsletters.