The problem of failing disks during RAID rebuilds and why RAID 5 is past its shelf-life.

Many IT admins contact us in a state of disbelief after discovering multiple cascading disk failures during a RAID rebuild. Why does this happen? Well, when you rebuild a RAID, every sector of each disk is read. The rebuild process is the equivalent of an Ironman challenge for your disks. This can make silent disk failures all manifest at once. It’s not that disks actually fail during RAID rebuilds (although this can happen), it just that when every sector of a disk is read, the failure threshold gets tipped. You can help mitigate against this by using disk rotation and employing hot spares on your array.

Avoiding the “bad batch” problem

Cascading disk failure is not the only problem that affects RAID. Another one is the so-called bad-batch problem, which can lead to near-simultaneous failure of RAID arrays. This can occur if there is a flaw in the manufacturing process or a design flaw in the disk. This flaw is then replicated in all of the disks from the same batch. If the flaw is serious enough it can result in near-simultaneous disk failures of multiple disks in your array. Some IT administrators procure the same type of disk from different vendors in order to avoid this problem, which can work. However, some RAID controllers don’t play well with different firmware versions, as is often found on disks procured from different batches. This can introduce a brand new set of headaches.

The one parity scheme, as used on RAID 5

The one parity scheme, as used on RAID 5

Why RAID 5 is now (almost) redundant?

RAID 5 was fine back in the day when arrays of smaller capacity disks were commonplace (like 250GB X 4). Now, with larger disks (2TB, 6TB etc.), the probability of a failed read during a RAID rebuild process becomes too high. During a rebuild, every sector has to be read. If there are any errors on a second disk, the rebuild will halt. With arrays containing individual disks of 2TB+ that is a big ask and makes RAID 5 unsuitable for most modern IT environments.

Two independent parity schemes, as used on RAID 5

Two independent parity schemes, as used on RAID 5

RAID 6 to the rescue

Enter RAID 6. This type of array uses two independent parity schemes. So even if one particular disk develops unreadable sectors, there is a second parity strip and your RAID rebuild should complete successfully.  

Drive Rescue are based in Dublin, Ireland. We offer a complete RAID 0, 5, 6, 10 data recovery service for HP Proliant, Dell PowerEdge, Fujitsu Primergy servers. We also offer a NAS data recovery service for Synology, ReadyNAS and QNAP devices.