Hard Disk SMART data is ineffective at predicting failure
As manufacturers continue to release larger hard disk year after year, regardless of what hard drive one gets, the last thing anyone wants to see is a failing hard disk, particularly if it contains a lot of content that is not backed up. Google has recently released quite an interesting paper going into detail with statistics about its infrastructure’s hard disk failures. Their main findings were that the drive’s self-monitoring data (S.M.A.R.T.) does not reliably predict failure and that the drive temperature and usage levels are not proportional to failure.
What is interesting about Google’s hard disk usage is that unlike most businesses that typically use 10,000kRPM and 15,000kRPM SCSI hard disks in their servers, Google uses cheaper consumer-grade serial ATA and parallel ATA 5,400kRPM and 7,200RPM hard disks. They consider a hard drive failed when it is replaced during a repair. All drives have had their S.M.A.R.T. information gathered excluding spurious readings.
Going by their statistics, hard drives tend to fail the most in their early stage with about 3% failing in the first three months and then at a fairly steady rate after 2 years, with 5 years being the typical end-of-life. When it came to analysing the S.M.A.R.T. data, they found that only four main values were closely related to the failed hard disks which include counts for scan error, sector reallocations, offline reallocations and sectors on probations. One interesting discovery was that no hard disk has had a single spindle failure or at least a spin retry count in the S.M.A.R.T. data.
Unfortunately, even with these four S.M.A.R.T. values, 56% of the drives that failed did not have a single count in any of these four values, which means that over half of the hard drives have failed with not even a sign warning from the S.M.A.R.T. data. Finally, when it came to temperature, despite most expectations, they found that the cooler the drive was, the more prone it was to failing. Only when it came to very high temperatures did the rate of failure start increasing also.
Even though Google used consumer grade hard disks, it is worth noting that unlike the average home PC, chances are that most of the hard disks in Google’s servers are only ever switched on once at the time of installation, run continuously until the point of failure and run in a temperature controlled environment. In home PCs, the hard disks are regularly spun up & down and the temperatures vary from room temperature up to their operating temperature each time the computer is used also. As a result, it would be interesting to see what the statistics would be like from a large survey of hard disks used in the home.
There are 10 comments
- New on Forum
- Posted on: 01 Mar 07 04:52
- CD Freaks Member
- Posted on: 01 Mar 07 13:45
- Retired Moderator
- Posted on: 01 Mar 07 19:31
- CD Freaks Junior Member
- Posted on: 01 Mar 07 20:12
- CD Freaks Junior Member
- Posted on: 01 Mar 07 20:13
- MyCE Resident
- Posted on: 01 Mar 07 22:02
Most popular headlines
- Wed 4 Dec 02:12 by DoMiN8ToR
A leaked roadmap from Intel provides more information on Intel's Fultondale and Pleasantdale SSDs and reveals the codename of a SSD series, the Temple
- Wed 4 Dec 05:12 by Kerry56
The USB 3.0 Promoter Group announced today that the development of a new type of USB connector has begun. It is called USB Type-C, and will be b
- Mon 2 Dec 05:12 by DoMiN8ToR
An official statement from an OCZ employee learns us the company will honor product warranties. Last week OCZ announced it would file for bankruptcy a
- Mon 2 Dec 06:12 by DoMiN8ToR
Windows 8 market share dropped from 7.49% to 6.66% this month, Windows 8.1 market share increased from 1.72% to 2.64%. Combined both Windows 8 version
- Tue 3 Dec 03:12 by DoMiN8ToR
Toshiba Electronics Europe today announced it has launched a new enterprise SSD line-up. The PX03SNx series is available with capacities of 200GB (PX0