HP Proliant servers are a very common on-premise server in Ireland. These systems come in two main form factors – blade or tower. Their blade series includes models such as the DL360, DL380 and DL385. While their tower series includes models such as ML10, ML110 and ML350.
Recently, we recovered data from an HP Proliant ML350. This Windows Server 2019 server running VMware virtualised machines. Using 4 X HP SAS disks, it’s RAID 5 array had gone into degraded mode. While this can be very frustrating, “degraded mode” is actually like a self-protection mechanism of the server. It occurs when unrecoverable errors are detected in one or more of the disks. Its role to prevent any further damage that might occur due to silent data corruption. The server subsequently became unbootable.
Examination of the 4 x 1.2TB HP SAS (EG001200JWFUT) disks (formatted in EXT4) proved interesting. Disk 0 was fine. Disks 1 and 2 were seriously over-heating. Our infrared thermometer recorded temperatures of 48 and 49 degrees Celsius respectively. While disk 3 was clicking. Great…
We made bit images of each of the 3 working disks. Then using a SFF-8492 cable we connected each of the disks to our Areca SAS card. It is important to note that this PCIe card was not a RAID controller. The last thing you want is for a RAID rebuild process to initiate with a missing disk. Specialised software is required. Using a non-RAID SAS card means the integrity of the images remains sound.
We now had to ascertain the exact RAID parameters used in the original array. If you don’t use the exact parameters, corrupted files will be inevitable. The HP documentation as to the parameters used, was unsurprisingly lousy. Therefore, we used a HEX editor to find the original RAID parameters – namely the block size, the offset and the block order. With these parameters now electronically recorded using the high-tech medium of Microsoft Notepad and using specialised RAID re-build software, we could start the re-build process. This took a number of hours, but eventually, we had on our recovery system several VMDK and -flat.vmdk files. Exactly what we were looking for! Our file integrity checks revealed all files to be intact. The client was extremely fortunate. Some VMDK virtual disk files can be unwieldy, fragmented and liable to corruption during RAID array failure events. Anyway, the client’s data (Excel files, PDFs, ROS certificates and BrightPay payroll data) could now be extracted onto a 4TB external disk for delivery.
This recovery process saved this Dublin accountancy practice hours and hours of labour time that would otherwise have been spent reconstructing files.
How RAID 5 failure and recovery could have been prevented…
First of all, RAID 5 should not be considered a backup. In this particular case, the client should have had a valid up-to-date backup of their main server. There are swathes of virtual machine backup applications (such as Veeam and Nakivo) out there which can backup locally and to the cloud.
RAID 6, which users dual-parity can sometimes be a better and safer alternative to RAID 5. This is especially true in cases where disks are over 1TB in size which is commonplace in many of even the most basic servers.
If your on-premise RAID server stores a lot of data that is infrequently accessed, you should have a data scrubbing regime in place. The scrubbing process reads all the data and checks for consistency. For some file systems like BTRFS, you can use the “BTRFS scrub” command. For EXT4, it does not checksum data. However, it does allow for metadata check-summing which can help detect early disk problems.
Drive Rescue is based in Dublin, Ireland. We offer a complete RAID data recovery service for HP Proliant and HP Microserver systems. Whether your data is stored in bare metal format or VMDK, VDI or VHD virtual disk formats – we can help recover your data.