Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents - 

...

Anything more can shorten life of the disk but ultimately is moreso a waste of time.

TestDescriptionFrequency

Short

Short test ≤ 2 minutes to determine defective drive. By performing three separate tests the disk can reliably be confirmed faulty in  short amount of time. These tests include an electrical test, a mechanical test, and a Read/Write 
Long

Conveyance

Scheduling Tests - 

Testing can be scheduled and automated to avoid having to remember running tests manually on a regular basis. This can be done several ways, however, using smartd.conf is discussed below.

...

  • SMART 5 - Reallocated_Sector_Count - Count of reallocated sectors. The raw value represents a count of the bad sectors that have been found and remapped.[24] Thus, the higher the attribute value, the more sectors the drive has had to reallocate. This value                                                                   is primarily used as a metric of the life expectancy of the drive; a drive which has had any reallocations at all is significantly more likely to fail in the immediate months
  • SMART 10 - Spin Retry Count - Count of retry of spin start attempts. This attribute stores a total count of the spin start attempts to reach the fully operational speed (under the condition that the first attempt was unsuccessful). An increase of this attribute value                                                        is a sign of problems in the hard disk mechanical subsystem.
  • SMART 187 - Reported_Uncorrectable_Errors - The count of errors that could not be recovered using hardware ECC
  • SMART 188 - Command_Timeout - The count of aborted operations due to HDD timeout. Normally this attribute value should be equal to zero.
  • SMART 194 - Temperature - Indicates the device temperature, if the appropriate sensor is fitted.
  • SMART 196 - Reallocation Event Count - Count of remap operations. The raw value of this attribute shows the total count of attempts to transfer data from reallocated sectors to a spare area. Both successful and unsuccessful attempts are counted.
  • SMART 197 - Current_Pending_Sector_Count - Count of "unstable" sectors (waiting to be remapped, because of unrecoverable read errors). If an unstable sector is subsequently read successfully, the sector is remapped and this value is decreased. Read                                                                                     errors on a sector will not remap the sector immediately (since the correct value cannot be read and so the value to remap is not known, and also it might become readable later); instead, the drive firmware                                                                                         remembers that the sector needs to be remapped, and will remap it the next time it's written.[57]

                                                                                 However, some drives will not immediately remap such sectors when written; instead the drive will first attempt to write to the problem sector and if the write operation is successful then the sector will be marked                                                                                 good (in this case, the "Reallocation Event Count" (0xC4) will not be increased). This is a serious shortcoming, for if such a drive contains marginal sectors that consistently fail only after some time has passed                                                                                   following a successful write operation, then the drive will never remap these problem sectors.

  • SMART 198 - Offline_Uncorrectable - The total count of uncorrectable errors when reading/writing a sector. A rise in the value of this attribute indicates defects of the disk surface and/or problems in the mechanical subsystem.

When a Disk is Suspected Bad -

Confirm the device in question and run a long test so that the entire disk is scanned and tested. As stated earlier this can take several hours to complete but will provide a comprehensive overview of the disk's health.

The SMART log (smartctl -x /dev/<device>) can then be attached to a ticket with Exxact to validate RMA replacement if within the 3 year warranty we provide. If the system is older than that, most drive manufacturers offer a 5 year warranty and they can be contacted directly.

...