Myce.com Latest Updates

Hard Disk SMART data is ineffective at predicting failure

Posted at 28 February 2007 21:00 CET by Seán Byrne

As manufacturers continue to release larger hard disk year after year, regardless of what hard drive one gets, the last thing anyone wants to see is a failing hard disk, particularly if it contains a lot of content that is not backed up.  Google has recently released quite an interesting paper going into detail with statistics about its infrastructure’s hard disk failures.  Their main findings were that the drive’s self-monitoring data (S.M.A.R.T.) does not reliably predict failure and that the drive temperature and usage levels are not proportional to failure. 

What is interesting about Google’s hard disk usage is that unlike most businesses that typically use 10,000kRPM and 15,000kRPM SCSI hard disks in their servers, Google uses cheaper consumer-grade serial ATA and parallel ATA 5,400kRPM and 7,200RPM hard disks.  They consider a hard drive failed when it is replaced during a repair.  All drives have had their S.M.A.R.T. information gathered excluding spurious readings. 

Going by their statistics, hard drives tend to fail the most in their early stage with about 3% failing in the first three months and then at a fairly steady rate after 2 years, with 5 years being the typical end-of-life.  When it came to analysing the S.M.A.R.T. data, they found that only four main values were closely related to the failed hard disks which include counts for scan error, sector reallocations, offline reallocations and sectors on probations.  One interesting discovery was that no hard disk has had a single spindle failure or at least a spin retry count in the S.M.A.R.T. data. 

Unfortunately, even with these four S.M.A.R.T. values, 56% of the drives that failed did not have a single count in any of these four values, which means that over half of the hard drives have failed with not even a sign warning from the S.M.A.R.T. data.  Finally, when it came to temperature, despite most expectations, they found that the cooler the drive was, the more prone it was to failing.  Only when it came to very high temperatures did the rate of failure start increasing also. 

Even though Google used consumer grade hard disks, it is worth noting that unlike the average home PC, chances are that most of the hard disks in Google’s servers are only ever switched on once at the time of installation, run continuously until the point of failure and run in a temperature controlled environment.  In home PCs, the hard disks are regularly spun up & down and the temperatures vary from room temperature up to their operating temperature each time the computer is used also.  As a result, it would be interesting to see what the statistics would be like from a large survey of hard disks used in the home.  

Further information can be found in this Ars Technica article and in this Google paper. 

Click to share

There are 10 comments

No longer with us
Posted on: 01 Mar 07 01:10
    So what Google is saying is that S.M.A.R.T. stands for Stupid Moronic Anal Retentive Technology? :B
    No longer with us
    Posted on: 01 Mar 07 03:18
      Hard Drives are now at ridiculous sizes. It takes forever to format them, and who needs such big space unless your pirating movies. They need to do away with current HD technology, as it's now the slowest part of the computer.
      cobi64
      New on Forum
      Posted on: 01 Mar 07 04:52
        Hound - So you're saying Google pirates movies? I'd hate to be editing my camcorder video with only a 40gb drive.
        Shadowman69
        CD Freaks Member
        Posted on: 01 Mar 07 13:45
          Nothing really new. A lot of users already know this since their SMART HD failed and only AFTER THE FAILURE the SMART started saying "it seems your HD has a problem...". It's a shame that usually is just too late... :r
          DrageMester
          Retired Moderator
          Posted on: 01 Mar 07 19:31
            The last harddrive that failed for me didn't have any SMART warnings until after a partial failure, but it gave me approx. one hour to rescue the newest versions of some files that hadn't been backed up for a week, and then it failed more or less completely. @Shadowman69: Apparently the SMART technology would be more appropriately named SMART-ass technology, since it waits until the hard drive has failed before telling you.
            Waethorn
            CD Freaks Junior Member
            Posted on: 01 Mar 07 20:12
              You guys should talk to Steve Gibson! Hard drives will show SMART failures usually only after all the reserved sectors are used up. By that time, though, yes, everything is FUBAR. :r
              Waethorn
              CD Freaks Junior Member
              Posted on: 01 Mar 07 20:13
                BTW: I wrote to Steve to ask him about his opinion on the article and on SMART Status in general....just waiting for a reply.
                DeadMan
                MyCE Resident
                Posted on: 01 Mar 07 22:02
                  Such a shame that they did not name names.
                  No longer with us
                  Posted on: 02 Mar 07 00:10
                    SMART didn't warn me one bit. One day I turned on my computer and click...click...click...
                    No longer with us
                    Posted on: 02 Mar 07 15:34
                      You don't need to worry about "click...click...click...", they are just limping bits.

                      Post your comment

                      You need to register before you can comment

                      Like us

                      Most popular headlines

                      Android 5.0 Lollipop contains serious SMS bug on Nexus 4, 5 and 6

                      Users on the Android Issue Tracker report a SMS bug in Android 5.0 Lollipop runn...

                      Intel reports breakthrough in SSD costs - to release 10TB SSDs

                      Intel plans to release SSDs based on 3D NAND in 2015 with "disruptive cost&...

                      Microsoft's recent SSL patch causes issues - update system remains seriously flawed

                      An important update Microsoft released last Tuesday fixing a critical vulnerabil...

                      VirusBulletin tests 48 antivirus scanner for Windows 8.1 - Avast doesn't pass

                      Antivirus test organisation Virus Bulletin has tested 48 virus scanners for Wind...

                      Windows 10 build with kernel version 10.0 pops up - OneCore ready?

                      The Chinese website IThome.com and Russian AngelWZR report that a new build...

                      See all headlines
                      Follow Myce.com