Main menu:

Site search

Categories

August 2017
M T W T F S S
« Jul    
 123456
78910111213
14151617181920
21222324252627
28293031  

Tags

Blogroll

Data recovery from an Intel 1.8” SSD drive

data recovery from SSD dublin ireland

Problem
Recently, a user from Athlone was using his HP Elitebook laptop connected to his mains power. (His battery was no longer able to hold a charge). Unfortunately, when rushing to his answer his phone, he accidently tripped over his laptop’s power lead and almost instaneously his system powered off. No worries, he thought. This happened to him before and did not pose much of a concern for him. After a sudden shutdown before, his laptop booted up successfully.
He finished his phone call and went to switch his laptop back on. But this time he got the “operating system not found” error message. He turned it off and back on again. But still, the same error message re-appeared. He brought the laptop to his local computer repair shop. They removed the Intel SSD hard drive from his system. It appeared on their system alright, but it was only detected as having a capacity of just 8Mb! They tried running some data recovery programs on the drive, but any data was proving to be elusive. They referred him to Drive Rescue data recovery.

Using our eqiupment, specialised to recover data from SSD devices, we discovered that the logical-to-phyical table of the drive had gone corrupt. The role of this table is to translate LBAs (logical block addresses) to physical addresses (chip number and page number of chip). When there is the sudden loss of power, this layer often goes corrupt.

Solution

We first put the drive into safe-mode and then used our equipment to find the corrupt L2P module. After we found the corrupt module (containing the corrupt table), we then had to upload a “good” module to match the model number (SA1M160G2HP) and firmware version (02HA). After some extensive searching of our firmware database, an exact-match module was found and uploaded onto the RAM of our recovery system. After installation, a new volume appeared, but it was now only showing 48GB. This was certainly not the size of the original volume. Something else was awry. We decided to change the recovery mode of our system from “tech-command” to “translator-table” mode. With this parameter changed, the whole 160Gb volume appeared. The downside of TT mode is that it automatically changes the read mode from UDMA to PIO which gives a data transfer speed of only 4Mbps. But slow-and-thorough is always better than fast-but-incomplete.
All of the client’s data was recovered, namely his .PST file, his .ACCDATA file (Sage) and his Microsoft Office data files onto a brand new USB external hard drive. The client has informed us that he is now going to decommission his old HP Elitebook which gave him such a nasty surprise in favour of a new MacBook. An early Christmas present to himself.

Recovering data from a Seagate 7200.11 disk with bad sectors

recover data from disk with bad sectors dublin ireland recovery service2

One of the most insidious types of disk failure on conventional hard drives is due to bad sectors. A bad sector is a sector on a hard disk that cannot be read, written or corrected by the drive’s ECC (Error Correction Code) mechanism. Typically bad sectors develop slowly over time and the average computer user often gets no indication that there is a problem with their disk.

Bad Sector Symptoms

A disk with bad sectors will sometimes show these error messages.

You need to format the disk in drive F before you can use it ” (Windows operating system)

The disk in drive E is not formatted. Do you want to format it now?” (Windows operating system)

The disk you inserted was not readable by this computer” (Mac operating system)

recover from bad sectors on apple mac disk drive

Modern drives more likely to develop bad sectors

Modern multi-terabyte drives using perpendicular recording are more likely to develop bad sectors compared to their smaller capacity brethren. This is because they use higher areal densities (more bits are squeezed into the same space), the track widths are narrower and their disk heads fly lower. These attributes make the drive’s signal-to-noise ratio decrease. As the SNR ratio decreases, the likelihood of bit-errors developing increases. However, this is somewhat compensated for by manufacturer’s use of more sophisticated ECC algorithms.

Other factors which increase the probability of bad sectors:

Age – As disks age, the probability of bit-flip and other magnetic distortions occurring increases. Hard disks are not the only magnetic storage medium to deteriorate with age. For example tape storage is notorious for age-related degeneration.

Thermal Asperities – Tiny particulates of contaminants inside the drive can cause a phenomenon known as thermal asperity. Typically, it occurs when a giant magneto-resistive drive head collides with a contaminant on the disk platter. As a result, heating will occur on the disk head and platter surface. This causes some sectors in the affected area to become unreadable.

Dirty Power – If the host system of your hard disk is delivering intermittent over and under voltages to your HDD, then the probability of bad sectors increases. Read/write heads need a stable power supply to perform read, write and erase functions properly.

Recovering data from a drive with bad sectors

In a small number of cases where the disk has relatively few bad sectors, commercially available data recovery programs can be successful in recovering your data. I emphasize a small number of cases because, as soon as these programs hit contiguous bad sectors, they will often get stuck or freeze the host computer. These programs are really only designed for disks with relatively few bad sectors. If the disk has any underlying read/write head issues, repeated retries can precipitate failure of one or all of the disk heads. Users who torture their drives by using these DIY programs on damaged drives risk permanently losing their data.

The right equipment and skills

If your data is in any way important, the best way to recover data from a disk which has extensive bad sectors is to take it to a professional data recovery company. They will have the equipment to deal with bad sectors. But, having the right equipment is not always enough. In the same way that a kitchen fitted out the with latest cookers, mixers and knives does not make a Michelin starred chef – the data recovery technician must have experience, skill and insight. Some bad sectors recoveries are straightforward procedures. Other cases will be more complex.

Data recovery from a Seagate Barracuda 7200.11

Take for example a client we were helping last week. A vetinary surgeon in Waterford was using a HP desktop PC with a Seagate Barracuda S-ATA 7200.11 as a server for his x-ray machine. Recently, the drive became inaccessible and his local IT support company were unable to retrieve his files. They were of vital importance to him. Without them, he would have to schedule appointments with dozens of his customers again in order to x-ray the animals for a second time. This would have imposed a huge time burden on his staff and would have been damaging to his reputation.

Our preliminary scans revealed that his disk had extensive bad sectors. To complicate matters, most of these bad sectors were in the inner tracks. This can be the worst place for a drive to develop bad sectors as the Master File Table file is usually stored here. This file is important because it acts as an index for the whole drive.

The data recovery from this drive involved a number of steps. Using our equipment, we disabled SMART on the drive. (In this case SMART only gave the user a warning after the drive failed to boot up). During the recovery process, SMART will actually hinder the recover process. Next, we put the drive into PIO mode instead of UDMA mode. Recovery from disks with bad sectors is made easier in this mode as it enables much better quality reads. Then we set the read time-out and read block size. Optimal read time-out will vary depending on the extent of the damage. Sometimes a time-out of 1200 millie-seconds is needed for reading from badly damaged sectors. In this particular case, our setting of 550 millieseconds proved optimal. We set the read block size to 60 sectors. The means our equipment would read 60 sectors at a time. This setting proved most suitable. Two hours into the process, the MFT file was successfully copied. The rest of the data took nearly 16 hours to copy.

xray dicom

Our process was successful. All of the x-ray files in .dicom format were recovered. We are able to use the excellent MicroDicom reader (created by Simeon Antonov Stoykov) to verify the integrity of the images. Our client was able to securely login to our systems to view his recovered files. A lot of labradors, poodles and cats had been saved a second trip to have their innards photographed!

Protection against bad sectors

For standalone disks, there are some inbuilt protections against bad sectors: Error Correction Code and SMART. ECC is designed to detect errors and try to remedy them. These are merely safeguards and nothing else. Hard disk manufacturers are sometimes a bit reluctant to over-burden their disks with ECC mechanisms for fear it creates too much of an operational overhead on the disk’s performance. Then there is the problem with SMART. Theoretically, this mechanism is designed to detect bad sectors and then report them to the user. But because a SMART alert entitles the user to an RMA (Return merchandise authorisation), manufacturers have set the SMART threshold extremely high. Often, the user only gets notified by a SMART alert when the drive is at an advanced stage of failure or has already failed.

Myths about bad sectors

A pervasive myth among general computer users is that only disks in Windows systems develop bad sectors. The reality is bad sectors are operating system agnostic. Whether a disk is running the latest version of Mac OS X, Linux or the latest version of Windows; it is still possible for the disk to develop bad sectors.

Hard Disk Sentinel to the rescue

There is an wealth of “disk health” programs available that promise to monitor the health of your disks. Some of these are adequate, others just do what SMART monitor does already. However, one disk health application which stands out for its accuracy is Hard Disk Sentinel. It provides some predictive indication if bad sectors are developing on your disk.

In the context of RAID arrays, bad sector management becomes much more sophisticated and would merit another another blog post to detail its intricacies.

Last word

Ulimately, the best prevention against bad sectors is backing up your data on a regular basis. Our vet client now uses an external hard drive to backup his x-rays. We recommended to him that he use a quality online backup service as an extra layer of protection.

IP Expo Europe – October 2014 – Latest hard drives and other storage devices…

Some of the Drive Rescue team were at IP Expo in London earlier in the month investigating some of the newer drive technologies. (Drives which we might be recovering from in a couple of months time…) Below are some snippets from the exhibition.

wd sentinel nas recovery

Western Digital Sentinel DX4200 4-bay NAS device which runs Windows Storage Server 2012 R2 Workgroup OS. Powered by an Intel Atom dual core processor. 4 X 3.5 storage bays and 1 x 2.5 bay for a boot drive. It uses the relatively new Windows Storage Spaces (software RAID) to create a storage pool and uses 4GB ECC memory which is upgradeable to 16GB. WD’s proprietary StorCentral dashboard is used which looks very slick. The layout of the GUI is extremely well executed, making most important functions intuitive to use. WD claims their Smartware Pro is “a single solution for backup and storage”. Their generous supply 4 x USB 3.0 ports adds a lot of credence to this claim. The only bugbear I have about this device is that the RAID “Software Spaces” is still in its infancy as a RAID software. Some users might be a bit apprehensive of using such a new RAID application to manage their important data.

WD30ERFX WD NAS red 3.5

A 3 TB NASware drive from Western Digital. Commonly known as WD Red drives, they are proving very popular for their good compatibility with NAS devices and good reliability. (WD smiley mascot not included…)
toshiba THNSNJXXX  240gb ssd

A Toshiba 240GB enterprise-class SSD. When idle, it has a meagre power consumption of 1 Watt. If this was mechanical HDD of the same capacity it would probably consume around 5-6 Watts when idle. It is little wonder SSD’s are proving so popular in data centres where operational cost is a key determinant of hard drive choice.

DSC00879 (Medium) (2)

Many users wonder where and how their data is actually stored when they put it in the “Cloud”. Here is a 2TB S-ATA drive which Toshiba market as an Enterprise “Cloud” HDD. You will see the spec sheet that it claims to be able to handle 180TB a year with a Read Error Rate of only 1 per 1014. So if you ever wanted to know what kind of hard drive is used to store your data “in the Cloud” look no further…

toshiba AL13SXB300N SAS 2.5

A 2.5” enterprise-class 15000rpm SAS drive. The 2.5” form factor is now becoming the de facto standard for enterprise-class drives. A key driver of this trend is their lower power consumption compared to their 3.5” brethern.

encrypted hard drive datashure  (Medium)

While this device might look like a new type of door entry system – it is actually a portable encrypted hard drive from iStorage. It uses military grade XTS-AES 256 hardware encryption. No drivers required. It is the armored Humvee of the hard drive world. It even has a brute force “hack defense mechanism” and a “self destruct” feature. Available in capacities of between 250 GB and 2 TB. As all of the encryption is on-board the device – it might prove a very useful conduit for super-confidential information without hassle of setting up encryption software on different systems.

lacie thunderbolt drive recover

LaCie has now been taken over by Seagate. However, the LaCie brand will be retained and Seagate intends to push their LaCie Thunderbolt range of drives for Mac users and the creative professional market. This new range of drives come with Thunderbolt 2 ports as standard. While Thunderbolt 1 had bi-directional transfer speeds of 10 Gbps, Thunderbolt 2 has a whopping speed of 20 Gbps. (This is four times faster than USB 3.0). These mammoth throughput speeds might seem a bit superfluous for the average user, but are really in demand by those working with large file formats such as the ultra high-definition 4K video format.

lacie little big disk thunderbolt retrieve data

This is the baby of the LaCie Thunderbolt 2 family affectionately named “Little Big Disk”. A data transfer rate of 1375 MB/s and daisy-chaining capability. But, perhaps the biggest surprise about this little storage device is that inside its sleek black casing, it uses a PCIe SSD. Combined with a Thunderbolt port, this type of drive is perhaps the future of portable storage devices. The best things in life really do come in small packages.

How to mount and retrieve data from an Apple Filevault encrypted disk

In our last post, we discussed about Bitlocker encryption, which is native to some Windows operating systems. However, if you’re an Apple owner and have encryption enabled on your system – you’re probably using Filevault version 1 or 2. (There are other Mac encryption applications out there like Sophos SafeGuard, but these tend to have a very small user base)

Filevault was first introduced by Apple in their Panther operating system. This legacy version of Filevault used AES with cipher-block chaining. But, only after only a couple of months on the market – the rumor mill in IT security circles started spinning suggesting that Filevault’s 1 AES-CBC encryption could easily be hacked. Moreover, Filevault 1 was causing Apple users major filesharing headaches and misfiring Time Machine backups.

With the introduction of their Lion operating system (10.7), Apple decided to call time on Filevault 1 and launched Filevault 2. With this version, instead of AES-CBC – they decided to use AES-XTC with an elephant diffuser. This offers users a much more secure encryption system. Apple also ironed out the filesharing and Time Machine glitches.

Even though Filevault 2 offers a much needed improvement. Sometimes, due to a corrupt operating system or corruption in Filevault itself – the disk will have to be slaved and manually mounted via a third party system. This can be performed by a couple of simple Terminal commands using a Mac system.

Last week, we dealt with a very simple recovery. The user, a small charity organisation, had an Filevault encrypted disk. Their operating system crashed. Their IT admin had removed the disk and slaved it to his Mac, the volume was proving invisible to his operating system (Mavericks).

The fix for this problem is simple. While the OS will not be able to see the disk, it usually has to be manually mounted using Terminal commands.

We first used the “disktuil list” command in Terminal. This should list all disks attached to your host system.

mac terminal window 1
 
 
In this case, the partition which stored the data was “disk1”. You can use the “diskutil mount disk disk1” to mount the disk. But, occasionally, this will not work and you need to try the “diskutil mountDisk” command (mountdisk being one word this time). This worked. We now got the message “mounted successfully”. The volume now appeared under disk management. When you click on the volume, a Filevault dialog box appears and requests for your passphrase and you should be able to gain access to your data again.

mac terminal window 2

As this quick job was for a small charity, run by a hardworking and passionate organiser – we decided the cost of this job would be gratis.

1TB HGST Bitlocker protected disk : Data Recovery Case Study

1tb hgst disk data recovery ireland

 

Bitlocker is a common encryption application available in Windows 2008, ultimate and enterprise editions of Windows Vista, Windows 7 and Windows 8. It protects a computer owner from data theft in case of the loss of a system, or a storage device and protects against outside attacks through a network.

Bitlocker uses an Advanced Encryption Standard (AES) algorithm in Cipher Block Chaining mode with or without a diffuser. Most default deployments of Bitlocker use AES-128 or 256 bit encryption with an elephant diffuser algorithm.

Bitlocker uses a Full Volume Encryption Key (FVEK) to protect the data. In turn, this key is protected by a Volume Master Key (VMK). Like a lot of encryption applications, Bitlocker allows for multi-factor authentication via Trusted Platform Media chip, PIN number and USB.

Most deployments of Bitlocker are troublefree. However, occasionally due to disk failure or corruption of the encryption application itself, data recovery from a Bitlockered disk will be needed.

Last week one of our clients, a user from an Irish Government agency, had a Bitlocker-protected disk in a Lenovo laptop. The volume became inaccessible. Their I.T. department removed the m1TB HGST disk from the laptop and attached it to another Windows 7 Entreprise system with a TPM chip onboard. The disk would not mount and was “invisible” to the system.

Challenge

The data was of critical importance and of a confidential nature. Due to confidentiality concerns, the user was not backing up to the department’s server.

Solution

We examined the drive. Using special tools our technicians accessed the Host Protected Area of the HGST disk. The G-List aand Translator tables were all corrupt. Using our equipment, which can access the HPA directly, our technicians repaired the corrupt firmware.

This made the drive bootable again. This time, when connected to one of our recovery systems, a Bitlocker volume or Full Volume Encryption File System (FVE-FS) could now be recognised. This could be recognised by the signature “-FVE-FS-” at the start of the volume – always a promising juncture when recovering from an encrypted disk. The client then emailed us their 48 digit Bitlocker key in a .txt file.

We then used the following command to to unlock the volume:

manage-bde -unlock e: -RecoveryPassword XXX48-digitkeyXXX

where e: was the Bitlockered volume.

After having being inputted, the volume’s partitions appeared. We invited the client to login to our sytsems remotely to view and verify their data.

Result

All their files were recovered – intact. Even though, their drive was bootable again, we extracted a copy of the data onto a USB external drive as a precaution.

Lesson : disk encryption and comprehensive backup policies should be in lockstep with each other.

The main takeaway from this case is that disk encryption and comprehensive backup policies should be in lockstep with each other. Disk encryption applications are not like other PC applications where their actions can be easily reversed. If corruption does occur with a whole disk encrypted volume – it is not unknown for some users to lose access to their data irreversibly. As for the users who deliberately refrain from backing up to their company’s or organisation’s server out of confidentiality concerns – alternate practical back-up policies should be drawn up. This could be in form of local backup or backup to a personal Cloud-based service.

 

 

 

 

Data Recovery from G-Tech NAS (RAID 0, HFS+)

g-tech nas data recovery dublin ireland

 

 

 

 

 

 

 

 

A Dublin-based digital marketing agency were using a G-Tech NAS to store their Photoshop and Final Cut Pro files. There was over 2 years worth of design work backed-up onto the device. Last Monday morning, the folder shares for the device were not accessible from any of their computers. Thinking that it was only a glitch, they rebooted the device. Still no dice. They called their tech support company. Their technician suspected that their RAID array had failed. Not enamoured with the prospect of having to re-do years of graphic design work. Their tech support company recommended Drive Rescue.

 

Our technicians examined their NAS. RAID 0 was being used. The drives were formatted in HFS+ (which is the default file system for Apple). We performed diagnostics the drives (Hitachi Ultrastar 7K4000 4TB X 2). One drive (drive 0) passed the diagnostic test with flying colours. However, it’s counterpart (drive 1) had extensive bad sectors. We imaged both drives using a hardware imager designed for data recovery. Then, using images of drive 0 and drive 1, our technicians used a hex editor to find exact parameters of the G-Tech’s RAID array. This included determining the stripe size, the disk offset and parity. Once these had been calculated, it was now possible to start the rebuild of the array. This took nearly 13 hours to complete. All our work was not in vain however – all of the client’s data was successfully recovered for the very satisfied client.

Data Recovery from Iomega Stor Center iX2 NAS

 

data recovery from iomega nas store center dublin

 

The Iomega Stor Center is a common NAS device used in Irish workplaces and homes. It is fairly robust, intuitive to use and can be easily configured to work with any network.

But, like any storage device, it is prone to failure – like one Dublin-based software development company discovered last week.

They were using their Stor Center ix2 as a surrogate server to store everything from PDF files to back-ups of their C++ and Java files which they use to write their software.

Last week, one user could not access the shares and attributed it to a glitch. Then his colleagues discovered they could not even see the network shares anymore. They went to investigate it further. The indicator warning light on their Iomega StorCenter was flashing. They logged into the management console of the device. It was then they discovered that one of the disks, disk 0, was offline. As the ix2 is only a two bay device, it can only be set-up in RAID 0, RAID 1 or JBOD. This device was setup in RAID 1 which meant that half of their data was on a drive which had gone offline. They removed disk 0 and slaved it to a PC system but it was totally dead. In terms of solutions, they had run out of road.

They delivered the two Seagate ES.2 drives to us. Our diagnostics revealed that Disk 1 was in perfect health but Disk 0 had a failed PCB (failed inductor chips).

ES.2 drives use Seagate’s F3 architecture. This means that the ROM on the PCB holds unique adaptive information needed for the operation of the drive.

The drive was brought to our rework station. Here our technicians used hot-air to carefully remove the drive’s ROM chip. (De-soldering such a delicate chip can be messy operation). We had an ES.2 donor board already in stock. The removed ROM chip was then carefully micro soldered onto the donor PCB (whose original ROM chip had been removed) . The said PCB was then fitted onto the drive.

The drive spun up with a healthy reassuring spin. Now it was time to image both the drives. The imaging process took 3.5 hours to complete.

Using both drive images, our technicians now set about to rebuild the RAID 1 array. Using a hex editor they determined the exact parameters of the array such as disk order, block off set and stripe size. The RAID rebuild took a couple of hours to complete.

Finally, the volume was mountable again. All of their Java, Pyhton and C++ code libraries and PDFs (all of which had taken months to compile) were accessible once again. Result: a very happy software development team who not have to cover old ground in re-writing code and PDF manuals.

 

Irish Internet Association Seminar – Data Protection and Cyber Security

data protection logo ireland

Last week Ultan O’Carroll of the Data Protection Commisioner’s office gave an excellent presentation on best practice policies for data protection.

Below is a quick snapshot of some key points.

Knowing your data – “If there is anything you need to know – know what data you have and categorise it in some way – whether it is personal, financial and so on” . He further advised delegates that apart from categorising your data, “you need to know where your data is – whether it is on tape, on disk, on your production server and so on”

“Access ontrol” data among your employees – “Not everyone needs to see all the personal data that you hold” For example, sometimes your admin staff only need to  have access to the address details of your customers. If the data is not within their remit, they need not be privy to it. All of this goes back to “knowing your data”.

 Use access logging  – Finding out “who logged in when”, “whether it was local or remotely” and “what password they used”. “We often see things go wrong at this level” said O’Carroll.

Have a plan to deal with data breaches within your organisation –  Dealing with data breaches in an ad-hoc fashion is not the best way. Data controllers must have a plan in place.

Software patching – You should have a policy in place for the patching of software and it needs to be enforced. “We often find that top-level security patches get released but they are only applied for 3 – 6 months after that. In that window, hackers will try to do some reconnaisance on your site”.

Passwords – Having a robust password policy in your business or organisation is essential. For example,  users using the same passwords for their Facebook account and their company database is not secure. Moreover,  passwords need to be transmitted and stored securely. For example, emailing or storing passwords in clear text is not good practice.

Use third parties to independently test your security – There are specialists who can independently test the security of your I.T. infrastrcture. These often have their own sub-specialisations. For example, one penetration tester might specialise in         e-commerce payment gateways whilst another might specialise in network penetration testing.”Test it and test it again”  is the advice.

privacy engineers manifesto ireland

Whilst the above points are just guidelines on data protection best practice – the best data protection systems are often built from the ground up. If you want to find out more information implementing better data protection, an excellent ressource  is    The Privacy Engineer’s Manifesto” by Dennedy, Fox and Finneran.  The authors espouse the view that “privacy will be an integral part of the next wave in the technology revolution and that innovators who are emphasizing privacy as an integral part of the product life cycle are on the  right track”.

The ebook version of the book is free to download at:

http://www.apress.com/9781430263555

Bones Break…so do RAID 5 arrays – data recovery for Physiotherapist Practice

raid 5 recovery dublin ireland

Last week, we got a call from a Dublin physiotherapist`s practice. Their Dell Poweredge server, configured in RAID 5, had failed.

Their I.T support technician identified the problem immediately. However, for him data recovery from a RAID 5 server was unknown territory. For this blog post, here is an abridged version of the RAID recovery process which we used.

For recovery we decided to use Mdadm.  It is a powerful linux-based RAID recovery utility. A good knowledge of Linux command line and in-depth experience of this tool are essential prerequisites for it’s operation.

The first step in the recovery process was to deterimine the status of the server’s drives in-situ.

We used the following command on every disk in the array:

            mdadm – -examine

We were able to determine that drives /dev/sdc1 and /dev/sdd1 drives had failed (sdc1 being in worse condtion). Mdadm revealed that this RAID 5 had experienced double-disk failure.We then carefully labelled each drive and removed them from the server. Then, using a specialised hardware disk imager – we imaged the disks. This means that we would be working on copies of the disks rather than the original ones. In the unlikely event of the data recovery process being unsuccessful – the original configuration and data, as we received it, would still be intact.

The imaging process completed successfully. We now put the imaged drives into the server. With all the prep work completed. It was now time to take the RAID array “offline”. This can be achieved by using the “mdadm –stop” command. The last thing we wanted was for the RAID rebuilding process to start using a failed disk in bad condition  (e.g. /dev/sdc1) To prevent this from happening, we cleared the superblock of this drive using the command:

            mdadm –zero-superblock /dev/sdc1

Now using the output we got from mdadm –examine, we used the following command to  rebuild the array:

            mdadm –verbose –create –metadata=0.90 /dev/md0 –chunk=128 –level=5 –        -raid-devices=5 /dev/sdd1 /dev/sde1 missing /dev/sda1 /dev/sdb1

We now had to check whether the array was aligned correctly using the command:

            e2fsck -B 4096 -n /dev/md0

Using e2fsck it is always helpful to specify the block size before a scan to get a more accurate status of the array. We also used the -n prefix in case the array was mis-aligned and e2fsck attempted to fix it. ( e2fsck should never be executed on an array that is potentially mis-aligned)

 

E2fsck completed successfully and correctly identified the status and alignment of the array.

It was now safe to proceed with the repair and fix command

 

            e2fsck -B 4096 /dev/md0

Notice that no “-n” was used this time. The scan took around 5.5 hours to complete. It found over 26 inode errors, hundreds of group errors and some bitmap errors.

Now, it was time to add the first failed drive back into the array. We used the command:

            e2fsck -a /dev/sdc1

The RAID array now began to rebuild. After a couple of hours, the RAID 5 was totally re-created, albeit in degraded mode. But the the volume was mountable again and all data was now accessible.

The client had over 4 years of Sage Micropay and Sage 50 accounts files on the server. In addition, they had over 6 years worth of PhysioTools data files. This is a software package which they used to create customised exercise regimes for their  patients.  Reconstructing accounts and staff payslips would have been very time consuming and costly. For their staff to re-create patient exercise regimes, it would have incurred a huge time burden on them. Moreover, it would probably have been professionally damaging for thier reputation if they had to inform their patients that their customised exercise regimes had been “lost”.

We advised the client on some best-practice back-up strategies so they could prevent data loss in the future. It is deeply satisfying to help a customer like this when the “plan B” option would have been so disruptive for them. They could now get back to helping their patients with minimum downtime to their business.

. 

 

The mystery of the continually degrading RAID 5 array


wd caviar green 1tb recovery ireland

A couple of weeks ago, an I.T. support administrator for a Dublin finance company called us. He was in a spot of bother. Last week, their RAID array on their HP Proliant server failed. Luckily, they had a complete back-up and no data had been lost. The I.T. admin decided to replace the four Western Digital (WD2500YS) Enterprise S-ATA setup in a RAID 5 configuration. He replaced them instead with four Western Digital Caviar Green disks (WD10EZRX). Using S-ATA to SAS adaptors he connected them to the HP Smart Array controller card. 

He re-built the server, re-installed Windows Server 2008. But three days later, the server was down…again. In this short space of time, the RAID array had changed from “normal” to “degraded” status. He ran diagnostics on the disks. All of them passed. He suspected that the Smart controller card was at fault. He had a redudant server in his office using the same model of RAID controller card. He removed it and installed it in the problematic server and, for a second time, did a rebuild of the array. He tested it for a couple of hours. It worked fine. Then, just as he was about to start the data transfer process from the local backup, he rebooted the server . The dreaded “degraded” message appeared on the screen – again.   

Being a past customer of Drive Rescue data recovery, the I.T. admin telephoned us for advice about the mysterious problem. After his description of his problem, we had a fairly good inkling as to what the cause might be. But, inklings or assumptions are dangerous. Most the great technological failures of mankind (nuclear power plant explosions, aircraft disasters,etc.)  can be traced to someone, somewhere making a wrong assumption. The same applies to the data recovery process. Good data recovery methodology is not based – or never has been – on assumptions.  We asked him to email us his server event logs, hard drive model numbers and exact model of RAID controller. After looking at his server logs, the specs of his controller card and model of hard disks used; it was now becoming clearer as to what the root-cause of the problem might be.   

 

He got the disks delivered to us and we tested them using our own equipment. His Western Digital Green disks were indeed perfectly healthy. The problem with a lot of WD Green disks (and other non-entreprise-class disks)  is that when they are used in a server or NAS – the RAID controller can erroneously detect them as faulty. The reason for this is quite simple. In some RAID setups, if the controller card detects that the disk is taking too long to respond, it simply drops it out of the array. But, it is normal for error recovery in some non-entreprise / non-NAS classified disks to take approximately 8 seconds to complete.  With error recovery control (ERC) enabled on a disk, error recovery is usually limted to 8 seconds or under. (For example, with Hitachi branded disks ERC is limited to 7 seconds) This means the RAID controller will be less likely to report ERC-related false positives. 

In this case, the Smart RAID controller, used commonly by HP Proliant servers, was detecting some of these disks as faulty when they were not.  The most common type of error recovery control used by Western Digital is TLER (Time Limited Error Recovery). Most WD Caviar Green drives do not have this function. WD Red (NAS disks) and WD entreprise-class disks do have error recovery control.

 

 RAID controllers (especially dedicated hardware controllers like those from LSI, Perc etc.)  are very sensitive to read / write time delays. When a hard disk does not use error recovery control , a RAID controller will often report false positives as to the status of the array or, like what happpened in this case, the controller will simply drop the “defective” disk out of the array.

Entreprise-class disks (such as the WD Caviar RE2, RE2-GP)  or disks made specifically for NAS devices (WD Red) have error recovery control enabled by default.

In this case, the I.T. admin replaced the WD Caviar Green disks with four 1TB WD RE SAS drives. He then rebuilt the RAID 5 array.

Yesterday, we got a nice email from him. The server has been running smoothly ever since. He has rebooted it a couple of times. The event logs are free from disk errors. He no longer has to worry about the uncertainty of the company’s server continually degrading.  He can even sleep more soundly at night.