Recently, we came across a user who accidentally wiped important folders from their 2TB PNY SSD. Realising their mistake, they popped open the lid of their Dell desktop system and disconnected the PNY 960GB S-ATA SSD from the motherboard. This was a very good first step to take. Because a data recovery application should never be installed on a drive which you intend to recover from.
The user then placed the PNY SSD inside an 2.5” disk enclosure. He then downloaded EaseUS, a popular data recovery application, to his laptop. He then connected the enclosure containing the disk to the laptop. EaseUS detected the disk straight away and he let the scan run over night.
What’s wrong with this?
Now you might be thinking “what’s wrong with this process?”. Surely, the user was doing everything right here? Well, he actually made a serious error here in his data recovery process. This is because when performing a deleted file recovery operation on an SSD, the disk must be imaged first. Why? Well, unlike with a conventional HDD, SSDs have use TRIM and Garbage Collection processes to permanently “clean-out” files which have been previously subject to “delete” commands from the OS. Once an SSD is connected to a system, these processes can get activated. And, yes, this happens even if you’re connecting your disk via an enclosure as many of these (enclosures) now support the UASP protocol which supports TRIM. Having TRIM and Garbage Collection active inside your SSD is like having one of those robocleaner home vacuuming devices inside your disk hoovering up deleted or lost file fragments. A process you certainly don’t want happening when trying to recover data. Our user was able to partially recover some of his files but a substantial number of them were unrecoverable.
How could this have been prevented?
This particular user missed a crucial stage in the data recovery process. This stage is called imaging. Imaging basically means that you make a bit-copy of the SSD to reflect a “point-in-time” without the influence of file corrosive TRIM or Garbage Collection. There are many disk imaging utilities on the market such as Macrium Reflect. Once you’ve made a complete disk image, then run your recovery software on that. But performing a deleted file recovery on the original disk is not a good idea!
A customer recently contacted us. The had a 4TB WD My Passport USB (wdbyft0040bbl) external drive which they accidentally dropped. When they connected the drive to their PC, they could hear a noise sounding as if the “drive arm” was stuck. They asked us “maybe if I just opened up the drive and moved the arm that sounds stuck. Do you think that would work?”. The customer’s query was totally understandable. Because let’s face it losing data is not a pleasant experience. And most users just want to get their data recovered quickly.
Is it safe to open up my hard disk with a stuck head?
As you can imagine, opening up a hermetically sealed hard disk drive “to fix a stuck head” is not a good idea for a number of reasons. First of all, there is the issue of dust. When you open up a hard disk, no matter how clean your home or office environment is, dust will start contaminating the platter surfaces very quickly. If you don’t believe this, just get a handheld mirror, wipe the surface clean and then leave it on a table for ten minutes. Unless you happen to live inside a filtered-air laboratory or in a space capsule, you’ll very quickly see specs of dust that have accumulated on the surface of the mirror. Secondly, after an incident such as an accidental disk drop there is a high chance that the disk heads are damaged. And by simply opening up a hard disk to “move the arm” is unfortunately not going to solve that.
The disk-heads are the tiny components mounted at the end of the disk-head assembly which read the data from the platters. Usually, when an external hard disk such as a WD My Passport disk suffers an accidental fall, these heads often get damaged in the process. This makes them unable to read data. And unfortunately, one more deformed disk-heads can also transform into some rather nasty platter scrapers.
Disk-heads play another crucial function besides just reading data…
But these disk-heads play another crucial role inside your disk. They help position the head-disk assembly (or actuator arm) to be at the right location at the right time just when you need to access a particular file or folder. This is precision engineering at its best. In fact, there is probably no other device in your home or office where an orchestra of precision engineering, electronics and digital signals processing technology converge to perform what superficially seems like a very simple task. Think about that the next time you open up an Excel file…
Disk-head positioning information is calculated by the read channel IC by using two variables – track address and servo-burst patterns. There are thousands of concentrically aligned “tracks” on your disk’s platters. At a particular location on the platters, the disk-heads could be reading or writing data. This is known as the track address or Track ID. The Track ID is detected by the disk-heads, sent to the pre-amplifier and then to the read channel IC. Track address alone however is insufficient to give a really accurate position of the disk heads to the read channel IC. This is where servo burst patterns come into play. These are patterns interleaved with the user areas on the platters. They are like road-markings on the platters which indicate to the disk-heads what position they are at. While a lot of people think that a disk out of the factory is blank, its platters already come with firmware information in the System Area and servo burst information between the User Areas. For HDD manufacturers, the process of writing servo-burst information to hard disk platters is actually very complicated. The task requires a special servowriter consisting of a laser interferometer for radial positioning and a clock head for phase information.
MacGyver “fixes” to move stuck but damaged disk-heads to another portion of the platter can make the problem even worse.
When a magnetic hard disk is powered-on (initialised), its heads go into seek mode looking for data track or servo burst data. In a healthy hard disk drive, this only takes a few seconds. However, if you try to manually “lift” a damaged actuator arm off the platter onto another location in the hope that the disk will spin up normally again – you could risk incurring even more damage. MacGyver “fixes” to move a stuck actuator arm to another portion of the platter can make a “stuck arm” problem even worse.
Why moving the actuator arm after a fall often won’t fix your disk
Healthy disk-heads are normally meant to skate above the platter surface on a cushion of air. However, damaged disk-heads have the potential to scour sections of the platter surface which were previously healthy. That’s the very last thing you want happening.
In this particular case, luckily the user eschewed the temptation to open up his WD My Passport disk and do a MacGyver on his disk. Instead, in our clean-room, we removed the disk-head assembly (on which the heads are mounted) and replaced them with an exact-match donor replacement. This is a tricky process because all the disk heads (in this case there were 6) must align perfectly with the platters. Then, there is always the probability that the disk might reject the new disk-head assembly. In order to prevent this, we use a highly specialised hardware disk imaging device. This can image a disk at extremely slow speeds (as low as 10 kB/s) and read sector sizes in really small increments (such as 32-byte sectors). At the end of the data recovery process, all the effort was worth it. All of the client’s photos and documents were successfully recovered. These were presented to the client on a brand new external hard disk drive. A disk which he promises he’s never going to drop…
Drive Rescue Data Recovery offers a full data recovery service for accidentally dropped disks. Is your disk making a ticking noise? Does the “arm” of your external hard drive sound stuck? Do you need to fix an external WD or Seagate drive with a stuck arm? We can help recover your precious data. Common models we recover from include WD My Passport 4TB, WD My Passport 2TB (WDBYFT0020bbk), WD My Passport Ultra, WD My Passport for Mac, WD Elements 2TB, Seagate Ultra Touch, Seagate One Touch (STKC4000402), Seagate Expansion Portable, Seagate Backup Plus, Seagate Basic (STJL2000400), Adata HV300 and LaCie Rugged Mini. Call us on 01-485 3555.
Users and IT admins who format a hard disk using Windows Explorer or Disk Utility (MacOS) even when there is important data stored on it. This data would not be backed up onto any other medium.
When the damage is self-inflicted users will be absolutely kicking themselves. They will rebuke themselves for being so careless…This is especially painful if they’ve just bought a new drive and formatted their old one by mistake. Ouch!
However, accidental disk formatting happens to IT admins just as frequently. The IT admin will usually have to do a lot of explaining to a sometimes very irate user.
Automatic pilot mode – Some people will be in automatic pilot mode and will willy-nilly issue a format command without first checking the volume ID of the disk they are about the format. This can easily happen if a user has 3-4 different internal or external disks attached to the host system. It’s really not a good idea to be in automatic pilot mode when formatting disks.
Rushing – Over the years one thing that we’ve noticed with IT admins is that accidental disk or device formatting normally happens on a Friday when there is a rush to complete tickets before the week-end. Back in 1873, Molière wrote that “unreasonable haste is the direct road to error” and that holds true as ever. Really, the formatting of disks needs to be done in a slow and methodical way. Rushing and disk formatting really do not mix.
Can data from an accidentally formatted disk be recovered from?
It depends on a number of factors. The first factor you need to consider is whether your disk is an HDD (hard disk drive) or SSD.
SSD – Accidentally formatting (quick format) an SSD means the probability of permanent data erasure is higher. This is because SSD controller processes such as TRIM and garbage collection optimise space by “cleaning up” any deleted files.
HDD – Accidentally formatting (quick format) a HDD means the probability of successful data recovery is slightly higher. This is because HDDs do not use automatic background clean-up processes on formatted drives.
Other factors – Successful data recovery of an accidentally formatted disk also depends on how full the disk was. When a disk is near-to-full capacity, from our experience, “clean-up” algorithms don’t tend to work as effectively. Also, bear in mind that if you’ve formatted an SSD, the brand / type of disk controller can have a huge influence. For example, Kingston SSDs use some very efficient SSD controllers where formatted data can disappear permanently within minutes.
Good to Know:
If you have accidentally formatted a system using an SSD or an external SSD drive – turn if off immediately. This prevents TRIM and garbage collection processes from running.
Preventing Accidental Formatting of Disks
Label all disks. You can label them anyway you want. You can name them after Shakespearean characters or after islands in the West of Ireland. The key thing is that you clearly differentiate your disks. Labelling can be done virtually via Windows Explorer or Disk Utility. Or, it can be done physically via a label. We recommend Brother P-Touch handheld labelling devices. They are inexpensive, reliable and quick.
Disconnect all disks – When performing a format operation on a disk, try to disconnect all other disks from the system.
Disk serial numbers are your friend – If using third-party software to format disks, the disk name might not appear. However, the disk serial number will be provided instead. This is also printed on the disk label. Use the last four numbers of the label’s serial number to verify that the disk you’re about to format tallies with the number appearing in the software’s GUI.
Don’t place too much trust in the Cloud – Many IT admins simply place too much trust in Cloud services such as OneDrive, Sharepoint or iCloud. They assume that everything the user needs is there. When in reality, the user might not be syncing their important data to the cloud or there might be a syncing misconfiguration. IT admins needs to login to these services, preferably with the user and verify the data is there before any format operations are executed on an endpoint device.
At the recent Embedded World 2023 conference and exhibition, the “Great Tesla Recall” of 2021 was still being talked about by speakers, exhibitors, and attendees. Given that Tesla is such a high-profile company, it’s probably no surprise. Now you might be wondering, “What was this recall all about?” Well, most electric vehicles now have their own digital storage device installed. This is essentially the car’s hard drive. Tesla installed an 8GB eMMC module in their “S” and “X” models. It failed prematurely, prompting the manufacturer to instigate a massive recall.
What went wrong with the hard drive installed by Tesla?
Not surprisingly, manufacturers don’t use electro-mechanical (HDDs) but instead use NAND-based storage. This is the same type of storage you would find in an SSD. Tesla used an 8GB eMMC NAND module in the MCU (Media Control Unit) of these models. However, this NAND module wore out a lot sooner than anticipated. This resulted in thousands of owners reporting problems with their in-car systems such as the touchscreen display, the autopilot, the window demisting system, and even the turning indicator lights. Just like a computer with a failing storage device, hardware devices and applications start doing some pretty weird things.
Why didn’t Tesla give dealers or owners a new disk that they could slot in to replace the failing one?
Now this is where it gets complicated. Tesla designed the Media Control Unit with an embedded storage device known as an eMMC. Unlike an M.2 SSD, SD Card, or CFExpress card – it can’t just be slotted out. Embedded NAND usually has to be micro-desoldered off.
What can IT technicians learn from Tesla’s hard disk disaster?
Let’s start off with Tesla’s calculation of disk wear-out. Tesla used a TLC NAND module from the Korean manufacturer SK Hynix. (For the record, SK Hynix are a well-respected player in the NAND and DRAM flash market.) This type of NAND has approximately 3000 Program / Erase (P/E) cycles. Tesla did the maths. They calculated that this NAND module would provide a useful life of 11-12 years before it wore out. But that did not happen. For some Tesla users, this problem started to manifest itself after just two years. The miscalculation occurred because some onboard devices needed to access the firmware modules way more than expected. Some onboard devices executed data logging more than expected. (Data logging can add a huge sequential read overhead to NAND-based storage due to write-amplification effects.) And to compound this issue, these same firmware modules had to be updated more than expected. So, it’s small wonder that this little but constantly accessed 8GB NAND module got exhausted and died.
Compared to MLC and TLC NAND, QLC NAND does not have great endurance…
Matching the right storage choice with the use-case.
Anybody who performs advanced data recovery on NAND-based devices (such as S-ATA, M.2 NVMe disks) could have told Tesla this. On the NAND plane, the area of the disk that stores the disk’s own firmware modules is typically subjected to more repeated sequential reads than any other area. And because this area of the disk is not usually covered by wear-levelling algorithms or ECC, it tends to wear out even quicker. Maybe Tesla should really have used a pSLC partition specifically for their device firmware modules. Or, maybe Tesla should have used a separate NAND IC specifically to store firmware of their cars’ vital functions. Choosing the right type of storage device to fit the use-case is important. Let’s say you support a video editing business with a moderate to heavy workflow. Installing QLC NAND-based SSDs inside their systems would probably not be a very wise decision. Likewise, fitting some QLC or TLC based storage in a CCTV’s NVR box which uses a ring buffer could be asking for disaster.
Don’t skimp on storage
Let’s be frank here: Tesla used a paltry 8GB NAND storage chip to serve their cars’ control and data systems, when in fact this storage disk should have been of a much larger capacity. A Tomy toy car would probably use a bigger chip… What a lot of people forget about NAND-based storage such as SSDs and SD cards is that, as a rule of thumb, you should always keep at least 20%-30% of their space free. This is because all those disk housekeeping functions such as TRIM, ECC, wear-levelling, and garbage collection need a bit of free disk space to perform optimally. This is especially true if your storage device does not natively use over-provisioning. To use a real-life example: Let’s say you’re assisting a user tomorrow who needs a new SSD in their laptop. On their existing disk they’ve already used 400GB. Migrating them to a new 512GB SSD is probably not going to be suitable, because in a few months’ time that disk usage could be well up to 450GB and then all those essential SSD house-keeping functions are going to struggle. This increases the probability of events like uncorrectable bit-errors, partition damage, or, like with Tesla, exhausted NAND. It is interesting to note that when Tesla revised their chip choice they choose a 64GB NAND module.
The benefits of removable storage
There are other things we can learn from this Tesla saga as well. Because Tesla used embedded eMMC NAND, it made the recall and repair process very difficult. In a lot of cases Tesla had to replace the whole daughterboard housing the MCU. In the context of data storage devices, removability allows for serviceability. Devices that have removable storage can, in general, be more quickly serviced than those using embedded storage. Let’s say you’re the IT manager who works in an engineering works and you have a problem with a CNC machine. The electronic control unit of the machine might use embedded NAND to store its PLC data. However, let’s say your machine now starts to develop programming problems. In the absence of any serial port access, wireless, or remote access, embedded NAND might prove problematic – especially if the machine’s nearest service centre is in Sweden or Switzerland. On the other hand, a machine that stores all its PLC information on a removable SD or CFExpress card allows for a much easier troubleshooting option. It’s easier to send an SD card to Malmo than a 100KG machine.
A problem that many technicians working with edge or IoT computing devices are noticing is that 4G or 5G coverage, or Bluetooth connectivity, is not always a given. This makes over-the-air (OTA) device updates impossible. However, using a removable storage device like a microSD card means updated firmware can be applied in a much more flexible way.
So, even though the world of digital storage might be changing exponentially quickly, it all goes back to brass tacks of matching the right storage device to the right use case. Oh, and factoring in failure as a given…
Last week Drive Rescue attended Embedded World 2023 in Nuremberg, Germany. This is one of the biggest convergences of Asian, European, and American disk manufacturers in Europe. Some of the latest NAND-based storage devices were on display. Moreover, it was a pleasure to discuss the latest storage trends from teams all over the world.
How the EU’s Right to Repair legislation might influence computer manufacturers’ choice of storage disk in the future
The European Commission is expected to implement Right to Repair legislation which will impact manufacturer component choices in devices such as PCs, laptops, and mobile systems. For example, by 2027 most manufacturers selling electronic products in the EU market will have to devise designs for removable batteries. It is also speculated that the implementation of non-removable SSDs (such as eMMC flash memory and BGA SSDs) might also be discouraged by future EU regulation.
The Dangers of QLC NAND
For those involved in the procurement of SSDs, you might have noticed some manufacturers offering QLC NAND in some of their drives. Many disk manufacturers at Embedded World 2023 were unanimous and candid in their sentiments regarding QLC NAND. While QLC NAND (4 bits per cell as opposed to 3 bits per cell for TLC NAND) is faster and cheaper, it also wears out a lot quicker. For example, some QLC NAND only allows for a paltry 100 Program/Erase (P/E) cycles. Applications such as those used in crypto-mining, data-logging, RAW continuous burst photography and video recording would chew up these cycles in no time – leaving you with a burnt-out SSD. The bottom line is that QLC-based SSDs are fine for use cases such as PCs and laptops used for internet browsing or basic office tasks. But using QLC SSD for any sort of write-intensive applications could be asking for trouble.
SSDs and Vibration
The difference in durability between SSDs and HDDs is stark. Drop a laptop running an HDD on a relatively hard surface and more likely than not you could be looking at some disk-head damage. Drop an SSD-running laptop on the same surface and the disk will hardly notice. However, SSDs are not as hardy as you might think. Heat can damage them, and also vibration. The super-nice team from Biwin Storage (OEM manufacturers for HP, Lenovo, and Acer branded disks) explained just how insidious continued SSD exposure to vibration can be. In vibration-heavy environments such as manufacturing facilities and ship’s engine rooms, vibration can loosen the solder joints between NAND ICs and the disk’s PCB. This can result in a failed SSD. For this reason, Biwin Storage have introduced SSDs which use an “underfill” epoxy-resin coating which secures a stronger adhesive bond between the NAND IC soldering balls and the PCB. This makes disk failure due to vibration much less likely to occur.
An Apacer CorePower SSD using Tantalum electrolytic capacitors.
Power Loss Protection
NAND-based storage devices with in-built power loss protection were a huge theme at Embedded World 2023. Sudden power loss can be a huge issue in sectors such as manufacturing, where machinery or PLC controllers subject to sudden power loss can result in hours (or even days) of downtime. This power loss can occur due to an overburdened power grid (a very common problem in some countries), or it can be the result of human error. Some manufacturing operatives will sometimes kill power to machinery before it has fully shut-down – resulting in data loss.
SSD Controllers and VW Golfs
The controller chip is at the heart of any SSD device. It manages data reads, writes, and erase functions. It performs data scrambling. It performs encryption. It performs error correction, garbage collection, and wear levelling. You could say that the SSD controller is the brain of the disk. A representative from Transcend (a major SSD manufacturer) described controllers as being like VW Golfs. The latest generation Golf is going to be more efficient and more sophisticated than the last one. And that’s an apt analogy. SSD controllers are vastly more sophisticated than those from a decade ago. For a start, they use less power. Some of them have already deployed 5th-generation LDPC error correction. Some use dynamic scheduling. And some controllers now even have the ability to predict bit errors before they happen using Predictive-LDPC (Pre-LDPC). All a far cry from the early 2010s when SandForce controllers were seen as top-end…
A Kioxia CD7 SSD using EDSFF (Enterprise and Datacentre Form Factor)
The new SSD form factor for enterprises and data centres
Just when you thought another SSD form factor was impossible – along comes EDSFF (Enterprise and Datacentre Standard Form Factor). This has been developed by the Storage Networking Industry Association (SNIA) in response to the demand from enterprise and data centre customers for an alternative to the M.2 form factor. There are a number of reasons why M.2 is not that suitable for this cohort of customers. Firstly, M.2 disks are relatively small. Even the “2280” iteration is only 20mm by 80mm in size. This allows limited space for NAND flash chips. In contrast, the EDSFF E1 form factor is 318.75mm in length. This extra surface area not only allows for more NAND but also greatly facilitates more heat dissipation. Moreover, in terms of data bandwidth, EDSFF has a theoretical architecture of PCIe x16. In reality though, at the time of writing most manufacturers such as Kioxia are just using PCIe x4. And another great advantage of the EDSFF standard is it’s hot-swappable. In theory, at least, making the serviceability of these disks much easier.
We came across a rather interesting case recently. One of our customers, a videographer for a Dublin-based marketing agency, was on assignment in Parma, Italy. The video footage for a multinational food company took two days to shoot. After the first day of filming, the SD card inside their Sony Alpha camera was running a bit low on space. Luckily, they had brought a small portable Seagate One Touch external drive with them. So, after the first day of filming, they were able to transfer their footage (XAVC-S, a proprietary Sony video format) to the external drive. Before they wiped the card, they dutifully checked the folder size on the One Touch disk to make sure it was the same as on the SD card. It was, so they formatted the card and popped it into the camera ready for the shoot the next morning. What could possibly go wrong?
As expected, the second day of shooting did not take as long as the first. After filming, they brought their camera gear back to their hotel and took a stroll around the city. They came across a shop selling artisanal food produce where they bought some cheese and olive oil.
Back in Dublin, they opened up their flight bag and, to their shock, discovered an unbelievable mess inside. It was like a mini-Amoco Cadiz disaster had unfolded inside – except with extra-virgin artisanal olive oil in lieu of crude oil. It now dawned on them what must have happened. The artisanal olive oil bottle had a cork top on it. This must have popped due to the air pressure on the flight home. Their clothes were sodden with this viscous liquid. And to their horror their Seagate One Touch was covered in it. They were now getting flashbacks of formatting that SD card. This was turning into a nightmare. Miraculously, their Sony Alpha camera ensconced inside a camera case escaped a soaking.With some trepidation they connected their One Touch drive to their MacBook’s USB port. It was dead as mutton.
On recommendation of a colleague, our distraught customer delivered the disk to Drive Rescue. Our diagnostics revealed a dead PCB. Opening the main chamber of the disk (Seagate ST1000LM024) in our clean room thankfully did not show any olive oil ingress. Phew! In this particular case, anything could have been wrong with the PCB. It could have been a failed IC (chip) such as a diode, transistor, or motor controller chip. Or, the problem could have been related to a short-circuit on one of the tracks. We took the decision to replace the PCB with a new one which we had already in stock.
Now we would need to de-solder the ROM (EEPROM) chip off the old PCB. This is a crucial step because this tiny chip contains servo parameter adaptives unique to the drive. This would need to be transferred to the new board.
The clean-up process:
Before we started the de-soldering process we used copious amounts of isopropyl alcohol to clean up the olive oil. (In this case, flux would have been of little use because that really only works well in cleaning oxides). Using a hot-air gun and with the assistance of anti-ESD tweezers we removed the Windbond ROM chip from the Seagate’s PCB. It was vital that the temperature of the hot-air gun was correct so as not to damage this tiny chip.
Winbond ROM Chip pin-out design – such a chip plays a crucial role in storing HDD servo-adaptive information.
We then carried out the same de-soldering process on the new (donor) PCB.
It was time to micro-solder the original ROM of the damaged drive onto the new PCB. This was the most intricate part of the task because the pin of the chip must align perfectly with the 8 place markers of the chip. Just one of these pins out of place and the chip will not make a proper bond with the PCB. This process involves adding a tiny piece of solder to each of the markers. Too much solder applied here will result in “solder bridges” – a surefire way to create a short-circuit.
After waiting a while for the solder to cool and settle, it was then time to place the PCB back onto the disk and see if it would ID. The disk ID’d successfully and the HFS+ volume appeared. Their X-AVC S footage could now be transferred to another disk.
Lessons from this case:
The important lessons of this case… Firstly, don’t skimp on SD cards. They are relatively cheap these days. Formatting an SD card with an active workflow on it is really not a good idea – even when you have it backed up to a second location. Secondly, liquids should never be placed in the same luggage as electronic equipment. Ideally, electronic equipment, including hard portable disks, should be transported in a protective case such as those made by Peli. These cases use watertight O-ring seals which ensures IP-67 liquid resistance. They also use foam padding to protect your equipment from shock damage. A worthy investment!
Is your Seagate One Touch not showing up on your Windows or Mac computer? Drive Rescue offers a complete clean-room data recovery service for Seagate One Touch external drives which are inaccessible or clicking, such as the Seagate One Touch 1TB, Seagate One Touch 2TB (STKB2000400), Seagate One Touch 4TB (STKC4000402) and Seagate One Touch 5TB (STKC5000400).
One of the system administrators of a healthcare organisation recently contacted us.
They were decommissioning around 18 of their Dell laptops. For data security purposes, he removed all the Crucial MX500 S-ATA SSDs from the systems and attempted to use Crucial Storage Executive software (hosted on a desktop PC) to perform a SecureErase function on them. The only problem was SecureErase was not executing on any of them. This left him with in a bit of a pickle because even just formatting the SSDs using Windows Disk Management is not considered secure. This is because, there is a high probability that a “Windows format” is going to miss areas on the NAND flash of the SSD like the user space area, the overprovisioned space, the spare blocks and bad block locations. SecureErase is designed get into all of these nooks and crannies.
He was beginning to think the problem was related to the TPM chips inside the Dell laptops and was not relishing the prospect of re-inserting all the SSDs. As a previous customer of Drive Rescue, he contacted us – did we have any suggestions?
Get the Sequence Right…
We did actually! This is a known problem with the Crucial Storage Executive software. Sometimes, the “PSID revert” utility has to be run before “Sanitize”. PSID revert involves reading the label of the disk and inputting the PSID code, as written on Crucial MX500 series SSDs, into the CSE software. Without following this sequence, the Sanitize (SecureErase) function will not work. This is just a quirk of the SSD management software.
This morning we got a nice Starbucks gift card in the post from the kindly systems admin who was very relieved to have found a quick and secure solution to this problem.
A fire damage hard disk with extensive burn marks on label. However, the label in this case is still partially readable.
Surprising facts about disks and fires…
Successful data recovery from a hard drive which has been exposed to a residential, office or industrial fire depends on a number of factors. These include factors such as the level of exposure to the fire. It depends on the level of smoke particle ingress. It depends on whether the label has been burnt or not. It depends on whether the disk is a hard disk drive (HDD) or solid state drive (SSD). And recovery can also depend on how much exposure the disk had to fire suppression agents such as water.
Burnt disk labels – If you have an HDD or SSD damaged by fire, sometimes the biggest challenge can be a burnt label. The reason for this is simple. If you have an HDD with a fire-damaged PCB but is otherwise mechanically sound, using specialised data recovery equipment such as PC-3000 it’s firmware can be emulated and the volume read. However, in order to emulate a disk’s firmware, you need to know the disk family and the model number. Without this firmware information emulation cannot take place. Similarly, if an HDD involved in a fire requires a head-disk assembly (HDA) replacement swap, it’s also imperative to know the model number. HDA swap operations need to use exact-match donor parts. Likewise with an SSD, you might have a fire damaged SSD which could be read using disk emulation. But you need to know the model first. You also need to know what controller chip the disk using. We really wish disk manufacturers would use fire retardant labels…
SSDs will survive a fire better than a HDD – The NAND chips on SSDs can survive temperatures of up to 300 degrees Celsius. (Controller chips are much more sensitive to heat though) In contrast, HDDs exposed to temperatures of over 60 degrees Celsius you will see bit errors start to multiply. Moreover, with HDDs exposed to fire their disk-heads are liable warp and are also liable to make contact with the platters due to excessive heat.
The water damage incurred by sprinkler systems or fire crews can be worse than the damage incurred by the fire itself – This one surprises a lot of people, but water (used for fire suppression purposes) often does more damage to hard disks than the fire itself. Within a very short space of time, micro corrosion sets in on the PCB components (such as diodes, capacitors and tracks) causing short-circuits. These short circuits can prevent a disk from initialising.
Smoke Damage – Electro-mechanical hard disks are hermetically sealed units designed to block out any contaminated air. They use a rubber gasket to secure the seal between the chamber and the lid. Even in polluted industrial environments, this mechanism works well at keeping contaminants out. However, the intense heat of a fire can cause a disk’s rubber gasket to deform or melt paving the way for the ingress of smoke particles. For the disk, this can be catastrophic. Smoke particles on the platters are the equivalent of rocks on a railway track. These particles can accumulate under the disk-heads blocking the read/write signals, scouring the platter surface but can also cause the disk-heads to overheat.
A fire-damaged PCB board.
Off-site backup provides the best protection against data loss due to fire damage. Even if you think your premises has a low fire risk, it can often be an adjoining premises that’s the source.
Your server or comms room should have a high-sensitivity smoke detection system (HSSD) smoke detector installed which is regularly tested.
Try to maintain an off-site inventory of disks inside your systems. A record of disk model numbers can sometimes make the difference between a failed or successful recovery. IT asset management tools like LanSweeper can automate this task.
If adopting a belt-and-braces approach in mitigating the fire risk to your data, you could consider fire-retardant DAS and NAS solutions from ioSafe. These storage devices running DSM (from Synology) offer protection of your disks from fires up to 840 degrees Celsius for up to 30 minutes. They also offer IP68 water protection – very useful protection from sprinkler systems and over-zealous fire crews.
HP Proliant servers are a very common on-premise server in Ireland. These systems come in two main form factors – blade or tower. Their blade series includes models such as the DL360, DL380 and DL385. While their tower series includes models such as ML10, ML110 and ML350.
Recently, we recovered data from an HP Proliant ML350. This Windows Server 2019 server running VMware virtualised machines. Using 4 X HP SAS disks, it’s RAID 5 array had gone into degraded mode. While this can be very frustrating, “degraded mode” is actually like a self-protection mechanism of the server. It occurs when unrecoverable errors are detected in one or more of the disks. Its role to prevent any further damage that might occur due to silent data corruption. The server subsequently became unbootable.
Examination of the 4 x 1.2TB HP SAS (EG001200JWFUT) disks (formatted in EXT4) proved interesting. Disk 0 was fine. Disks 1 and 2 were seriously over-heating. Our infrared thermometer recorded temperatures of 48 and 49 degrees Celsius respectively. While disk 3 was clicking. Great…
We made bit images of each of the 3 working disks. Then using a SFF-8492 cable we connected each of the disks to our Areca SAS card. It is important to note that this PCIe card was not a RAID controller. The last thing you want is for a RAID rebuild process to initiate with a missing disk. Specialised software is required. Using a non-RAID SAS card means the integrity of the images remains sound.
HP SAS Disk x 4
We now had to ascertain the exact RAID parameters used in the original array. If you don’t use the exact parameters, corrupted files will be inevitable. The HP documentation as to the parameters used, was unsurprisingly lousy. Therefore, we used a HEX editor to find the original RAID parameters – namely the block size, the offset and the block order. With these parameters now electronically recorded using the high-tech medium of Microsoft Notepad and using specialised RAID re-build software, we could start the re-build process. This took a number of hours, but eventually, we had on our recovery system several VMDK and -flat.vmdk files. Exactly what we were looking for! Our file integrity checks revealed all files to be intact. The client was extremely fortunate. Some VMDK virtual disk files can be unwieldy, fragmented and liable to corruption during RAID array failure events. Anyway, the client’s data (Excel files, PDFs, ROS certificates and BrightPay payroll data) could now be extracted onto a 4TB external disk for delivery.
This recovery process saved this Dublin accountancy practice hours and hours of labour time that would otherwise have been spent reconstructing files.
How RAID 5 failure and recovery could have been prevented…
First of all, RAID 5 should not be considered a backup. In this particular case, the client should have had a valid up-to-date backup of their main server. There are swathes of virtual machine backup applications (such as Veeam and Nakivo) out there which can backup locally and to the cloud.
RAID 6, which users dual-parity can sometimes be a better and safer alternative to RAID 5. This is especially true in cases where disks are over 1TB in size which is commonplace in many of even the most basic servers.
If your on-premise RAID server stores a lot of data that is infrequently accessed, you should have a data scrubbing regime in place. The scrubbing process reads all the data and checks for consistency. For some file systems like BTRFS, you can use the “BTRFS scrub” command. For EXT4, it does not checksum data. However, it does allow for metadata check-summing which can help detect early disk problems.
Drive Rescue is based in Dublin, Ireland. We offer a complete RAID data recovery service for HP Proliant and HP Microserver systems. Whether your data is stored in bare metal format or VMDK, VDI or VHD virtual disk formats – we can help recover your data.
Without an encryption key, if threat actors or intelligence agencies cannot access an encrypted storage device such as a laptop HDD, contrary to popular belief, they will not try to brute force it. Nor, will they use a quantum computer. If it’s really important, more likely than not, they will deploy what is known as a side-channel attack. Such an approach does not endeavour to “break” the encryption of the storage device, but rather, gain access to the protected volume by side-stepping it.
One of the most common side-channel attacks exploits DMA ports. But what are DMA ports? Well, first some context, in the 1990’s with the proliferation of multimedia use, some computer manufacturers wanted to equip their devices with data transfer speeds faster than the 1.5 Mbps or 12 Mbps afforded by USB 1.0 and USB 1.1. This gave rise to DMA ports such as FireWire (IEEE 1394) which allows peripheral hardware devices to access the host memory directly. In the mid-1990’s, Sony and Apple were pioneers in equipping their devices with FireWire ports, giving their multimedia users vastly improved data transfer speeds. So, for example, in the early 2000s, USB 2.0 allowed transfer speeds of 400Mbps while FireWire 800 (IEEE1394B) enabled double those transfer speeds. Today, on consumer and enterprise-class computing devices, the most common DMA ports in use are Thunderbolt and USB Type-C. Lesser known hardware components having DMA access include network cards and external GPUs.
ThunderBolt USB-C ports on a MacMini system
How DMA ports can provide a backdoor to your data
Ok, so let’s say you have a HDD or SSD in a laptop which is using a full-disk encryption application such as BitLocker? Could a threat actor access your data? Theoretically, yes! Here are a few side-channel permutations to consider.
Cold Boot Attack – This type of attack occurs when a threat actor performs a memory dump from a computer system’s RAM. This attack vector exploits remanence – a phenomenon where some data still resides in RAM shortly after the power of the host system has been turned off.
A TPM module from Asus. This connects the Lower Pin Count bus on a computing device. This same bus can be sniffed…
Recovering a BitLocker Key using an FPGA and data sniffing software – Microsoft and many hardware manufacturers extoll the virtues of using a Trusted Protection Module (TPM) to store the cryptographic keys of BitLocker. Unfortunately, this is not as secure as most people think. For example, using a field programmable gate array (FPGA) card (such as a Lattice Ice 40) combined with software like LPC_Sniffer, which can sniff BitLocker Volume Master Keys from the Low Pin Count bus used by the TPM chip. However, this only works if BitLocker’s pre-boot authentication is disabled.
Combined with software like ThunderClap, a FPGA card such the Intel Aria can be used to circumvent Apple’s FileVault encryption.
Bypassing Apple File Vault Encryption using ThunderClap – Some Apple users believe that if their MacBook is encrypted with FileVault 2 that they are immune from such attacks. Not according the developers of ThunderClap however. This powerful software, used in conjunction with an FPGA card (such as Intel Arria), mimics an Ethernet card and enables the sniffing of data packets to and from an encrypted macOS system.
But surely, software and hardware vendors have implemented protections against DMA attacks?
Software and hardware vendors are well aware of such attacks. This is why they have introduced input-output memory management units (IOMMUs). This acts as a gatekeeper to the system memory only allowing privileged devices to access sensitive memory regions. Apple was one of the first mainstream computer manufacturers to embrace this technology enabling it by default on OS X 10.8.2 Mountain Lion. Today, macOS is one of the few mainstream operating systems that has IOMMU enabled by default. However, even in macOS, its implementation is not fully watertight. Some security researchers have found that a single IOMMU page uses shared mappings (i.e. user data could be stored in the same memory space as the peripheral used by the attacker). So, for example, a threat actor or investigator could in theory, use a modified hardware device such as a trojanised Thunderbolt dock to access the memory of a macOS system. This operating system is supposed to be protected from rogue hardware devices (like a modified Thunderbolt dock) by hardware whitelisting. However, this security mechanism could be easily thwarted by using an “Apple approved” PCIe bridge board (taken from a Thunderbolt dock, for example) and using that to bridge a nefarious DMA device.
Aside from IOMMU, there are other protections against DMA attacks. For example, Microsoft provides Kernel DMA protection for Windows 10 and Windows 11. But, in Microsoft documentation, there is rather worrying admonition that “This feature doesn’t protect against DMA attacks via 1394/FireWire, PCMCIA, CardBus, ExpressCard, and so on”.
How to access an encrypted SSD just like the CIA…
The “DarkMatter” files of WikiLeaks gave us a brief insight into how intelligence agencies like the CIA access encrypted hard disks. Not surprisingly, they don’t use any FileVault or BitLocker “bruteforcing software” which tries to use multiple combinations of passwords to bypass disk authentication. Instead, and perhaps not surprisingly, they exploit DMA ports. More specifically, it was discovered that they use a device known as a Sonic Screwdriver. This device, using modified firmware of a Thunderbolt-to-Ethernet adaptor can change the boot path of MacBooks whilst injecting keylogging malware into system files which have the ability to harvest encryption credentials.
We need to talk about self-encrypting SSDs…
The term “Self-encrypting disks” (SEDs) has to be the biggest misnomer in the data storage world ever! SEDs basically use an AES processor to enable encryption. The data is protected using a disk encryption key (DEK). Each disk is automatically encrypted with this. For users, such as governments and corporate entities, it means that disks can be erased by simply deleting the key facilitating easier asset decommissioning and disposal. And while “self-encrypting” drives are encrypted, for most SSD manufacturers, by default, any sort of authentication protocol is disabled. This means that while their users are very re-assured by using a “self-encrypting disk” the reality is, if that disk was lost or stolen, any dog on the street could connect it a standard PC system and all their files would be accessible. Moreover, even if authentication on self-encrypting drives (SED) is enabled, many S-ATA SEDs can be subject to what are known as “hot plugging attacks”. This involves an adversary or investigator disconnecting the S-ATA data connector of a disk and connecting a data cable of another system without cutting its power. In a substantial number of cases, this normally grants access to the data because the SED, even with authentication enabled, still thinks it is connected to the original host. The main condition needed for this approach to work is that the second system, to which the disk is being connected, must have a hot-swap compatible motherboard.
And another problem with self-encrypting drives is the unknowns involved with Vendor Specific Commands (VSC). Basically, every SSD manufacturer has their own language command set for their disk models. These commands can be used for diagnostics, maintenance and firmware repair. They are also proprietary – therefore not very open to public scrutiny. And, like with any proprietary software, this opaqueness presents a security problem. In fact, security researchers from the Netherlands have successfully used SSD VSCs to access encrypted data on some models of Crucial MX, Samsung T3 and T5 SSDs. And it is also rumoured that the NSA’s Equation Group extensively used Seagate and Western Digital VSCs in designing their HDD firmware rootkits. These vulnerabilities remind us of the importance of projects such as the OpenSSD Project which advocates for SSD firmware to be open-source and fully transparent.
A WD My Passport 4TB external disk. This series of disks have a number of critical hardware vulnerabilities. And if these can’t be exploited to access the data, you can just patch the ROM of this drives using commercial data recovery equipment.
WD My Passport disks provide a classic example of the weaknesses of hardware encryption. This line up of portable disks has encryption keys which can be bruteforced. Some of these models use a very leaky random number generator for key protection. Other My Passport models use hard-coded AES-256 credentials. Moreover, when their ROM can be “patched” by data recovery systems.
Practical Prevention: To protect highly confidential information using BitLocker, it is essential that the application is configured correctly. BitLocker should always be setup with pre-boot authentication using an alphanumeric PIN. Make sure you have SecureBoot enabled which helps prevent devices with unsigned firmware code booting up. A BIOS password is recommended. In standby or hibernation state, some Windows systems will store the BitLocker encryption key in RAM, therefore it is recommended that you disable standby or hibernate mode on the systems you wish to protect. To enable IOMMU in Windows systems, you will need to access the BIOS. The protection will be either listed as “IOMMU”, “I/O Memory Management”, “Intel VT-d” or “AMD Vi”. For protection of external storage devices, you might want to give hardware encryption a wide berth. Instead, you can an open-source encryption like VeraCrypt for whole disk encryption.
Drive Rescue, Dublin, Ireland provide a full hard disk recovery service for disks encrypted with BitLocker, FileVault, VeraCrypt and many other leading data recovery applications. We also provide a recovery service for WD My Passport external disks including My Passport Slim, My Passport Ultra and My Passport for Mac.