Having your Apple MacOS stuck in a dreaded boot loop can be an exasperating experience. (For those of you lucky enough not to know what a boot loop is, it occurs when an operating system cannot successfully boot to the desktop screen. Instead, on system power-up, the OS goes through the familiar boot-up process but halts at a certain point. If you’re lucky, you’ll get an error message which might give a hint of what the problem might be). In MacOS, boot loops can occur out-of-the-blue due to OS corruption or they can typically occur after the user has attempted to install a fresh version or updated version of their operating system.
Recently, we had a client who experienced this very problem. They tried to upgrade their operating system from Catalina to Big Sur. However, their 256GB SSD did not have enough space. The installation of the OS update files never completed, but now on start-up of their system they would receive a message that “An error occurred preparing the software update”. As a result, they were unable to access their desktop and they had no recent backup.
Luckily, we had heard about this problem before. The earlier versions of the MacOS Big Sur (11.6.1, 11.6.2) installer files have a bug in them. Namely, the installer setup file does not check the size of the disk before the installation process begins proper. Therefore, if you don’t have the pre-requisite of 35GB of free space needed to store the temporary install files, this re-boot loop problem manifests itself. This bug also interferes with FileVault 2 encryption hence making the APFS volume invisible to Target Disk Mode (TDM). TDM will see “Macintosh HD” but not “Macintosh HD – Data” which is the folder you want! And if you’re thinking some bootable Linux tool could image the disk – because of the problem with FileVault 2, that avenue is also closed off.
Thankfully, there is a solution to this problem, albeit convoluted, which goes beyond the scope of this blog. But the long and short of it is this; we got all the data back for our delighted client. The lessons of this case are simple. Always have a complete backup before you start upgrading your MacOS system (or any OS for that matter). And secondly, always try to avoid deploying the first iterations of an operating system because, even with MacOS, these versions can be more bug prone.
Why is SSD firmware super-important to running of your disk?
The host system does not directly interface with the NAND containing your data. Instead, it interfaces with the firmware directly. The firmware holds the File Translation Layer which maps physical blocks to logical blocks. The firmware also performs crucial tasks like data scrambling, bad block management, interleaving, wear levelling and TRIM.
Isn’t firmware the code that’s also used in personal printers, toasters and fitness monitors right?
Yes, but in storage devices such as HDDs and SSDs it tends to more multi-faceted and much more complex. For example, Travis Goodspeed giving his talk “Implementation and implications of a stealth hard-drive backdoor” at Sec-T (2014) revealed how it took him “10 man months” to reverse engineer a Seagate Barracuda hard disk. He and his team also had to “kill” 15 hard disks in the process. So yes, the firmware found in your HDD or SSD is in a different ballpark than the firmware found in your Fitbit.
So, why bother updating the firmware on your SSD?
Well, if a potential problem is discovered it can often be remedied by a pre-emptive firmware update. Now you might be thinking that it’s the disk manufacturers themselves who discover these faults, right? Well, in most cases, it’s usually their customers such as gamers, PC enthusiasts and sys admins who discover them. Such problems could be related TRIM, ECC, bad block management or write amplification. When a problem is discovered, and assuming the disk model in question has a sufficiently large user base, it kind of expected that the manufacturer will release a firmware update to remedy the issue.
Could a firmware update for my SSD brick my drive?
Quite frankly, yes. This is why you should avoid the temptation of hastily applying recently released firmware updates from manufacturers. Because it’s not unknown for a vendor to release a firmware update which can provoke undesirable side-effects (such as dramatic slow-downs of the disk) or in worst case scenarios turning your SSD into a doorstop. This can happen if, for example, if the PMIC (power management IC) or file translation lay (FTL) gets corrupted. Of course, you’re also looking at potential data loss. This is why you should always perform complete disk backup before attempting any firmware update on your SSD.
So, I’ve backed up my data. Now, I can’t apply the firmware update using the manufacturer’s SSD utility (such as Samsung Magician, Crucial Storage Executive, Kingston SSD Manager etc.). What now?
Ok, truth be told. Updating your SSD’s firmware, even with the manufacturers dedicated utility software is rarely a click-and-go process. Some questions to ask before even starting include: are you using the latest version of the utility? Are you running the tool as an administrator? Have you performed a re-boot of your system after installing the SSD utility for the first time? Have you tried disabling your anti-virus or other end-point security software? Is your disk attached directly to your motherboard via a S-ATA or M.2 connection?
I’ve tried all of the above but still can’t apply the firmware update to my SSD. What do I do now?
If all of the above suggestions fail, you may need to create a bootable ISO tool provided by your manufacturer. Such a tool can avoid the layers of abstraction presented by an operating system such Windows. It can also make the firmware update process run more smoothly. So, after you’ve downloaded the ISO file, you need to make it bootable. You can do this using a tool such as the excellent Rufus USB creator. Once your bootable USB SSD utility has been created, boot up your system with it. It should allow you to update your disk’s firmware without the operating system getting in the way.
I think my SSD is failing, will a firmware update fix it?
Applying a firmware update to a failing SSD might actually exacerbate your problem. Writing new firmware to a disk often means that the existing firmware gets wiped. However, if your disk is failing and the new firmware module is unable to be written (to your SSD) – this leaves you in a sort of firmware no man’s land and potentially irreversible data loss. Professional data recovery companies such as Drive Rescue circumvent this problem by using a firmware “loader”. This basically means that the new firmware is loaded onto one of our host systems first and this is then used a “translator” to read the NAND whilst leaving the original firmware intact.
Drive Rescue, Dublin offer a complete data recovery service for faulty or inaccessible SSDs. Popular models we recoverfrom include SK Hynix PC300, PC401 PC601, PC711, Micron 1100 M.2, Micron 1100 S-ATA, Micron 2200, Micron 2300, Samsung Mzvlb256hbhq-000l7, Mzvlb256hbhq-000l7, Mzvlb512hajq, Mzvlb512hajq, PM853T, PM871, PM883, PM991, Kingston A400, Kingston SSDnow SV300, SSDNow V300 and Toshiba Thnsnk256gvn8.
Having data lost due to hard disk failure can be gut-wrenching. But having your data encrypted or wiped by remote hackers can be equally so. You might have heard that QNAP NAS devices have been recently subjected to yet another ransomware attack. And you might remember that during the summer WD My Book Live were subject to remote hackers running data wiping software on them.
In the most recent QNAP case, the attackers used 7-Zip to move files on QNAP devices into malicious password-protected archives and encrypt QNAP NAS devices worldwide. Frenzied users across the globe reported how even though their devices were using updated versions of firmware and QTS (NAS Operating System), they still got hacked.
The problem with NAS devices
One of the problems with NAS devices is that ease-of-use is prioritised over security. This problem has been compounded by manufacturers prioritising features over security. Some NAS devices now come bundled with more apps than a teenyboppers smartphone. While more apps might sound great, it exponentially increases your NAS device’s attack surface.
How do I prevent my own NAS or my client’s devices from getting hacked?
First of all, your NAS shouldn’t be connected to the internet at all. However, some users will still want to connect their NAS devices to the internet for remote access, so we’ve included some tips anyway.
Change default usernames and passwords. Do not, for example, use “admin” as the default username. This is exactly why so many QNAP users get caught out by the QSnatch botnet, which first spotted in 2019. It was programmed to launch a brute-force attack against devices using the default “admin” as a username. As for choosing a password, make sure it’s complex and uncommon.
For example, “liverp@@lfc” is not considered a secure password. While it’s complex, it too common to be secure. Would a hacker’s brute-force password database have this? – probably. Use the online Kaspersky Password checker to test the robustness and strength of your password.
Avoid the temptation of using remote NAS access services such as MyQnapCloud, Synology’s Quick Connect service or LaCies’ MyNAS service. While these services are very convenient, they poke a hole in your router, which makes your device, internal network and data more exposed to external attacks.
While many NAS boxes now come equipped with onboard VPN services, such as OpenVPN, you might also want to give these services a wide berth. Just one firmware zero-day attack on your NAS makes it more porous than Swiss cheese.
Instead, if you really need to access your NAS remotely, access it using a VPN connection provided by your firewall device (SonicWall, Fortinet etc). If you don’t have a firewall, you can use VPN services such as Wireguard coupled with Tailscale. Or, you could try accessing your NAS remotely using a service such as ZeroTier.
Disable UPnP port forwarding on your NAS and router to prevent brute-forcing attacks from external attackers.
Make sure FTP access to your NAS is disabled. FTP is an old and insecure file transfer protocol that should never be enabled on your NAS. In fact, if remote access is not required, disable all internet services on your NAS except for DNS and NTP.
Disable multiple login attempts to your NAS (called AutoBlock in Synology devices)
I have permissions set on my NAS so that only the “administrator” can write to it?
A lot of malware in circulation these days uses “privilege escalation“ to bypass read/write and erase permissions. So unfortunately this does not afford you a great deal of protection against ransomware or “wiper” malware.
I have setup my backup application to run snapshots to my NAS, will that protect me?
Not always. Snapshots can get wiped by disk-wiping malware.
So, is it only Synology and QNAP?
Not true, in March 2020 a new variant of the Mirai botnet was scanning TCP ports looking for Zyxel NAS devices. The password brute-forcing attacks would then force vulnerable Zyxel NAS devices offline by using a DDoS attack. In February 2019, D-Link NAS devices were subject to Cr1ptT0r Ransomware. In fact, the situation got so bad, D-Link even began issuing firmware updates for end-of-life NAS boxes.
If I follow all these guidelines, will my NAS be secure now?
No! A zero-day exploit could be discovered tomorrow, which makes your NAS vulnerable. Always follow the 3-2-1 backup methodology. 3 copies of data. 2 on different mediums and 1 backup off-site.
Never forget that a NAS device is not a backup in itself if you don’t have the data stored elsewhere. Some users buy a second NAS for the purposes of backup. This is an option which is well worth considering.
Can data be recovered from a NAS which has been subjected to a ransomware or malware attack?
Sometimes cyber criminals will deploy their malicious encryption software with an inadvertent bug in it. This allows some software vendors (such as Emsisoft) to release a “fix” or “decryption” tool which can mean successful restoration of data.
When it comes to data-wiping malware, sometimes it will only delete file system (EXT3, EXT4, NTFS, XFS etc.) metadata. This then makes a raw data recovery (recovery without original file structure) possible.
Drive Rescue, Dublin Ireland offer a complete NAS data recovery service for Synology (DS120, DS414, DS416, DS718) , QNAP (TS-219, TS-251,TS-451,TS-453), WD My Book, WD My Cloud, WD My Cloud EX2, WD My Cloud EX2 Ultra, Buffalo (LInkstation + Terrastation) and LaCie (2Big, 4Big, 6Big and 8Big)
Here are some of the the top reasons for SSD (S-ATA, PCIe NVMe and m-SATA) failure which we’ve come across in the Drive Rescue lab last year:
File Translation Layer corruption
Failure of solder-joints on printed circuit board
Failure of Power Management IC
Read Disturb Failures
Wear-out of System Area containing firmware
Complete NAND Chip Failure
The above list covers failure modes across all brands and interface types of solid state disk including Samsung, Micron, SK Hynix, WD, Toshiba, HP, Kingston and Apple models. You can find out more our SSD data recovery service here
One of the great drawbacks of the electro-mechanical disks is their propensity to develop bad sectors. And unfortunately, SSDs don’t escape this problem.
Bad sectors are a problem for storage devices simply because they can result in inaccessible or lost data. Moreover, bad sectors often have a happy knack of developing in the same areas of your disk where your most important data is stored.
Typical Symptoms of Bad Sectors on an SSD include:
Your S-ATA or PCIe SSD (such as Samsung, Micron, SK Hynix etc) is causing your computer to intermittently freeze.
In Window’s Event Viewer, you see evidence of “bad blocks” being reported.
Your S-ATA, PCIe (NVMe) or USB (3.1 / USB-C) SSD is not being recognised by your computer
You can see your SSD’s folders and files in Finder (MacOS) or Explorer (Windows) but cannot copy them to another medium.
You’re receive an “access is denied” error message when you try to access your Micron SSD in Windows.
In MacOS, you see error messages like “First Aid found corruption” after running in-built disk repair utilities. Or, you see messages like “The disk you inserted was not readable by this computer”
You’ve tried running a data recovery program like EaseUs or Recuva but it keeps on freezing.
Checkdisk (Chkdsk) freezes at a particular point.
So, why do bad sectors or bad blocks develop on SSDs?
Well, there are a number of reasons. First of all, like with HDDs, SSDs actually leave the factory with some factory-marked bad blocks. This is because the manufacturing process for NAND is not perfect. Imperfections in the NAND wafer, from which NAND dies are cut, are almost inevitable.
As the SSD gets used, grown bad blocks (sometimes known as runtime bad blocks) start to develop. These can occur for a number of reasons including:
Wear and Tear – The insulation layer of the tunnel oxide in NAND cells begins to degrade due to the Fowler-Nordheim tunnelling process which occurs during P/E (Program/Erase) cycles. Altough the wear levelling (WL) algorithms are designed to evenly distribute block usage across the volume, WL is not a perfect process. And don’t forget that some types of NAND have lower endurance than others. On one end of the spectrum, you have high-endurance SLC NAND (which is actually rarely used even in industrial-class SSDs) while at the other end you have QLC NAND which is considered low endurance NAND. Or, to put it into perspective, a 1TB TLC SSD would typically have an endurance rating of 1 DWPD (Data Writes Per Day) while a 1TB QLC SSD would typically have an endurance rating of just .1 DWPD.
Trapped Charge – Sometimes after prolonged usage, electronic charges can get trapped in the nitride layer between the NAND cells. This makes the voltage threshold for program/read or erase operations too high resulting in unreadable or unerasable sectors. The trapped charge problem can also be caused by improper shutdowns of the host system or by power supply issues with the SSD.
Prolonged Storage – If flash-based storage devices such as SSDs have been left powered-off for a while, they can lose charge. This retention loss can result in blocks becoming unreadable and being marked bad by the disk’s Status Register. These bad blocks are also added to the Bad Block Table. Some SSD manufacturers include “refresh” algorithms in their controllers which are designed to recharge cells when the device is connected.
Disturb Failure – NAND cells can get “disturbed” when a bit is unintentionally programmed from a “1” to “0” or vice versa. This occurs when the voltage for cells-to-programmed creates an electric field which interferes with neighbouring cells.
Bad blocks or bad sectors can become very problematic when they start to develop in the System Area of an SSD. This can result in unreadable firmware or unreadable boot initialisation code. The latter scenario can result in your SSD failing to be recognised by your computer.Bad blocks occurring in the user addressable area of the disk can be managed. Most SSDs have a Bad Block Management (BBM) feature which marks blocks as bad (unreadable). BBM then uses “good” cells from the reserved section of the disk to substitute for the bad ones.
Fixing Bad Sectors on SSDs
Over the years commercial products have been patented and developed to cure bad sectors using methods such as hysteresis. But most of these solutions never really resolved the bad sector problem. Just as with HDDs, there is no real way to fix bad sectors on an SSD. However, an experienced data recovery technician can work around bad sectors and try and recover as much as your data as possible using specialised equipment.
Examples of specialised data recovery equipment include:
Slow Sector Reading
Equipment which slow-reads of sectors. The read timeout parameters on a standard operating system are configured for healthy disks. Data recovery equipment allows the technician to read the disk using modified read timeout settings. This means that sectors which a standard operating system such as (MacOS, Windows or a Linux-based OS) would report as “unreadable” are actually readable by the equipment.
Smaller Sector Sizes
Equipment which uses variable sector sizes. For example, an Apple MacOS system will typically read disks in increments of 4096 bytes. Professional-level data recovery equipment allows the technician to read data in increments as low as 16 bytes. This sort of granularity, along with delayed-reads, allows for successful data recovery from bad sector areas.
Data recovery companies can use equipment which can change the voltage supply to an SSD. This means that an S-ATA or PCIe (NVMe) SSD which is unreadable to a standard computer can be successfully read.If the System Area of your SSD has become damaged due to bad sectors, a firmware emulator can be used by a data recovery company to substitute for the original. This can result in previously inaccessible data being made accessible again.
Data Recovery from a Micron 2300 SSD
Here at Drive Rescue we recently came across a prime example of how bad sectors can affect a disk. The Micron 2300 512GB M.2 disk was taken from a Dell laptop. In the BIOS, the system reported SMART predictive failure. The disk was being recognised by the BIOS but not by Windows Explorer. The disk used an M.2 form factor and used 96-layer TLC NAND coupled with an in-house Micron controller. Initial diagnostics reveal that several firmware modules could not be read. Therefore, we used a firmware emulator to substitute for the damaged controller. However, the disk was still reporting extensive bad blocks. We set our data recovery equipment to use a read-timeout of over 20,000 milliseconds. We also set the sector retry rate to 3. Moreover, we used a read block size of just 64 sectors. These parameters gave substantially healthier disk-reads. After almost 24 hours on our recovery bench, the results were very pleasing. The most important files for our project manager client were .XLSX. PDF and .MPP (MS Project). These were all successfully recovered. They only files which were not recovered were some .MOV files which the client could download again anyway. Case closed and our project manager could back to managing projects instead of the painful and time-consuming task of reconstructing files.
Drive Rescue, Dublin, Ireland offer a complete SSD data recovery service for failed Micron SSDs including models such as Micron C300, Micron C400, Micron 1100 256GB, Micron 1100 512GB, Micron 2210, Micron 2200s, Micron 2200v, Micron 2300 NVMe, Micron 5100 Pro M.2, Micron 5200, Micron 5300, Micron M550, Micron mtfdhba512qfd, Micron mtfddav256tbn and Micron mtfddak512tbn. We recover from Micron SSDs that are not being detected or not recognised by your computer. We also recover from Bitlockered Micron SSDs. Excellent success rates and fast service.
The WD My Passport external hard drive is an extremely popular type of external storage device in Ireland. Made by Western Digital Corporation, these portable USB (2.0, 3.0, 3.1, 3.2) drives come in a variety of colours and sizes. Popular capacities include 1TB, 2TB, 4TB and 5TB. However, like any type of storage media, My Passport disks can fail.
Here are the main reasons:
Your WD My Passport may fail due to bad sectors. These occur when areas of the disk platter become unreadable. While almost all disks have some bad sectors, which can be managed by the disk’s firmware, some bad sectors cannot be remedied by the disk’s firmware. If these sectors contain user data – it can result in the data becoming inaccessible. Or, if bad sectors develop in the System Area of the drive (where firmware modules are stored) or where MFT (Master File Table) information is stored – this can also result in inaccessible data.
The Fix: The bad sector problem can be mostly solved by using specialised data recovery equipment which is designed to read and re-read damaged sectors at an extremely slow speed and in very small sector sizes.
2) Lost in Translation
Like all hard disks, your WD My Passport uses a process known as File Layer Translation to translate logical addresses to physical addresses. (Basically, your file system stores data logically and uses FLT tables to translate these logical areas to actual physical sectors on your hard drive. Hard drives use this process because it makes file storage more efficient.) However, sometimes, due to underlying disk problems, the FLT table goes corrupt which means your disk can’t find the data.
The Fix: Any underlying disk problems such as bad disk-heads or bad sectors must be resolved before the FLT can be read properly.
3 ) Oops…Accidental Deletion
If you’ve accidentally deleted data from your WD My Passport disk, you’re not alone. Every year, scores of computer users in Ireland accidentally delete data from their disks. This is often due to the distractions of multi-tasking. Confusing one disk for another is more common than you think.
The Fix: Assuming you’ve not over-written the data with fresh data, your data should be recoverable. This is because, like with any HDD, when you delete data from a WD My Passport, it is not actually deleted. The area of the disk is simply marked as “free” but its data is not actually deleted until you write new data to the disk.
4) Accidental Drop of your WD My Passport
One of the top reasons why a WD My Passport disks fail prematurely is because the user drops it. Even a small drop from a coffee table can result in your drive’s disk-heads incurring damage. In the worst-case scenario, the heads can scrape against the drive platters causing irreversible damage.
The Fix: In most cases, the only fix for this type of problem is to bring the disk into a clean-room and insert a new head-disk assembly. In a small minority of cases, the disk-heads can be remapped by manipulating the disk’s firmware, but this methodology will not always be successful.
5) Accidental Liquid Spillage on your WD My Passport
You’re having a nice relaxing cup of coffee. When reaching over your desk to reach over to pick up yesterday’s unread newspaper, that cup of Java decides to capsize spilling its contents all over your desk and onto your hard disk.
The Fix: Any liquid like coffee, water, beer or tea getting into contact with your disk’s PCB (printed circuit board – the electronic board just inside the plastic casing of your disk) can cause corrosive damage or pre-amplifier failure. This means that the components (such as diodes and resistors) on the disk’s PCB can get corroded by the liquid – a process which sometimes takes weeks. If you’ve been very unlucky, the liquid spill might have caused a power surge to occur inside your disk causing its pre-amplifier chip to fail. The first problem can be fixed by fitting a new PCB or by component level repair. A transplant of the EEPROM chip from old PCB is needed. If it’s the pre-ampflier chip which has failed, this usually means a new head-disk assembly. Both fixes are usually successful in getting your WD disk working again.
6) Spindle Damage
The spindle motor plays a crucial role in spinning your disk platters at 5200 RPM. Most modern My Passport disks use a Fluid Dynamic Bearing (FDB). This is a highly sophisticated mechanism which has to spin the platters at a constant rate but also in a way to minimise NRRO (non-repeatable run off errors). If the spindle motor is even a nano-metre off kilter, it can result in bad reads. However, sometimes, after a knock or fall, the spindle motor will seize. This is because a) its herringbone bearing inside the motor will seize or b) the lubricating oil inside the spindle motor chamber leaks out due to shock damage. The latter process is usually invisible to the naked eye.
The Fix: A special hard disk spindle replacement tool has to be used to extract the old spindle and replace it with a new mechanism. This is a delicate procedure which has to be performed in a clean-room. In most cases, it results in complete data recovery of your WD My Passport disk.
Drive Rescue, Dublin, Ireland offer a complete data recovery service for My Passport disks which are not showing up in Windows or Mac, which are appearing at not initialised, which are generating an “access denied” error message or disks which are not mounting. We recover from all My Passport models including Passport for Mac, My Passport Ultra, My Passport Slim and WD My Passport Go SSD.
Predicting or detecting SSD failure is much harder than predicting HDD failure. If an HDD is failing, it can become slow, it can cause a computer to freeze or go slow. Or, it can trigger a kernel panic or blue screen of death to appear on the host system. And in some cases, the user will hear a clicking, grinding, beeping or chirping noise. A failing SSD however, does few of these things. In fact, failing flash-based storage be quieter than the proverbial church mouse.
That is worrying because a lot of users are not prepared for sudden-death failure of their disk. At least with a HDD, the user sometimes gets a bit leeway to perform an emergency backup. Your SSD could fail in the morning without even giving a peep of warning. SSD manufacturers have brought over a legacy technology called SMART (Self-Monitoring, Analysis and Reporting Technology) to monitor and help predict failure. Designed by IBM primarily for ATA and SCSI disks, it monitors disk parameters such as the Read Error Rate, Reallocated Sectors Count, Power-On Hours, Temperate and Uncorrectable Error Count. And for the SSD-era, parameters such as flash program fail, wear level count and wear-out indicator have been added to the SMART attribute set. But even taking this newly bolted-on features into account, SMART is still an old technology designed for electro-mechanical disks.
How Accurate is SMART?
SSDs are first and foremost electronic devices. And SMART does not take into account failure or impending failure of electronic components. Failing DRAM chip?, problem with write amplification? problem with LBA mapping tables? –SMART, alas, does not have you covered. SMART will continue to merrily push out disk attributes sometimes with little salience to the operation of a modern SSD.
While power-up and power-down events are recorded. SMART gives us now information as to whether these power events were clean or dirty. An SSD could fail with its DRAM cache full to the brim just before a data corrupting power-event, but SMART will be blissfully unaware of it.
SMART is a very siloed tool. It takes into account individual disk performance parameters but does not view them holistically.
SMART is not standardised. While the NVM Express working group is endeavouring to change this, SMART has also been implemented by SSD manufacturers on a non-standardised basis. This means that a sector reallocation event for a Samsung Evo SSD might be defined totally differently by Sandisk Plus SSD.
And because SMART has been implemented by manufacturers on their terms, it has invariably been driven by a commercial imperative. Let’s face it, manufacturers do not want a deluge of RMA’ed SSDs being sent back to them based at the slightest hint of malfunction. Therefore, most manufacturers have set their SMART failure thresholds high.
Why SMART is a problem for the end-user, computer technicians or system administrators
SMART provides a false sense of security to users. They might have a SSD which is on its last legs, but it will pass a SMART test. Here at Drive Rescue, we’ve seen this sort of scenario play out a countless number of times.
The problem of SMART and third-party SSD Diagnostic Tools
Most SSD diagnostic tools such CrystalDiskInfo and SNMP monitoring tools like PTRG rely on SMART information to perform their tests. While these tools can be extremely useful, they can also provide inaccurate information. This is because many SSD disk manufacturers have designed their disks’ firmware so that its telemetry cannot be fully interrogated by third-party tools. These tools sometimes only scratch the surface of what is really going on inside your SSD.
Perform regular backups of your important data. Throw away any notions that SSDs don’t fail or that you’re going to get some warning. Sometimes SSDs fail out of the blue. Backup strategies such as performing 3-2-1 backups are as relevant with SSDs as they were even with the creakiest spinning disks.
Try to use manufacturer-based tools for diagnosing SSD problems. For example, Samsung Magician for Samsung SSDs or Crucial Storage Executive for Crucial SSDs. These tools tend to be slightly more accurate because they are typically allowed more privileged access to your disk’s telemetry data.
Unbelievably, some SSD manufacturers still don’t provide diagnostic tools for their disks. If this is the case, you can use an SSD diagnostic tool like Smart Disk Checker. This will not only read the SMART logs of your disk but will also perform a time-sensitive sector analysis of your disk. This can give you a much better picture of your SSD’s health. This tool is also bootable from USB meaning you don’t have to remove the HDD or SSD from the system.
Let’s face it, some SSD models belch out more heat than a small nuclear power station. For some SSD models, running hot is their normal mode of operation. In fact, with some S-ATA-based SSDs, their metal chassis is not only designed to protect the electronics of the disk, but to also act as a passive heat-sink. For a standard computer, a typical temperature for an SSD under load is between 30°C and 50°C (86°F and 122°F) but this can vary a little between manufacturers. It is also normal to have spikes of heat when your SSD goes from being idle to performing an intensive task, such as a large data transfer.
SSDs use NAND flash memory. This type of storage is non-volatile, which means it doesn’t require a continuous power supply to retain data. The floating-gate transistor (aka FGT, a metal-oxide semiconductor) is a popular type of NAND that is used in SSDs (such as those produced by Intel). Another semiconductor used in NAND memory is the Charge Trap Flash (CFT), but its thermal properties are similar to FGT, so for the purposes of this blog, the impact of heat on FGT-based SSDs will be discussed.
The FGT is composed basically of two types of gates, the floating gate (FG) and control gate (CG). The procedure of removing the electric charge from the FG is the Erase process (erase data), whereas the procedure of storing is the Program operation (write data). This operation requires power, and the temperature can increase significantly when the SSD is subjected to large workloads.
The “electron tunnelling” process used during Program/Erase (write/erase) cycles can damage the cell (FGT). The tunnel oxide, a layer that composes the FGT (as presented in Figure 1), wears out over time, when it is exposed to high temperatures. This wear-out results in electron leakage and bit-errors.
When an SSD is overheating, the controller can malfunction leading to all sorts of erratic disk behaviour such as:
Your SSD is not recognised by Windows.
Your computer can’t see your SSD
Your SSD appears as unformatted.
When you try to copy files off your SSD, your computer keeps on freezing.
You cannot copy files off your SSD.
Some files seem to have disappeared off your SSD for no particular reason.
The Catch–22 of SDDs and Heat
Be careful here! Many internet commentators mention that read/write operations in SSDs perform better at higher temperatures. This is correct; NAND programming has always worked optimally at higher temperatures. Put simply, when your SSD is hot, the read, write and erase operations will be quicker and smoother compared to a cooler disk. Degradation of the cell oxide layers is also reduced because the heat causes less stress.
The M.2 Form Factor and Heat
User demand for lighter and thinner devices is not helping the situation. For example, the M.2 “stick of chewing gum” sized form factor has a relatively small surface area coupled with high data densities. This specification can draw power of up to 7 watts but can push temperatures up to 100C. (At least the SATA-based SSDs have a larger surface area for heat dissipation and can use their chassis, which is often metal, as a heat-sink).
Enter Thermal Throttling to Cool Things a Bit but also Slow Them Down…
Many SSD manufacturers use a function known as Thermal Throttling to prevent their devices from overheating. This monitors the temperature of the SSD via a built-in sensor. When the disk temperature reaches a pre-defined threshold, the thermal management function slows down the SSD’s performance to prevent it exceeding its maximum temperature. This results in fewer bits flipping due to heat and ultimately prevents premature failure. A simplified process of the Thermal Throttling technique is presented in Figure 2. It can be seen that the temperature of operation is above 70°C (158°F) which is “normal” for an M.2. However, to ascertain the normal operating temperature of your SSD, refer to the manufacturer’s specification sheet.
Each manufacturer will implement thermal throttling differently. For example, Samsung SSDs use Dynamic Thermal Guard (DTG). If a disk exceeds a threshold temperature, DTG will reduce the power to the NAND and MCU (controller). This disk self-preservation mechanism usually kicks in at around 75C. For a lot of their SSD models, such as the 950 Pro, 960 Pro and 970 Pro, thermal throttling can be a fairly common occurrence under sustained workloads, such as heavy video editing or when the disk is being used in a busy VM server.
Data Recovery form an Intel SSD PCIe 660p M.2 Disk
Last week, we were dealing with an Intel SSD 660p which was proving toasty even after only being connected for ten minutes. This was making sector reads very difficult. We first had to bring the core temperature of the disk down. For this, we used a custom cooling device made for failing SSDs. This uses a heat sink with a very high surface area which means it maximises the dissipation of heat. It also uses a high velocity fan which cools the disk further using convection. This enabled us to bring the disk’s temperature down from 80 to 52 degrees Celsius. Once the Intel 660’s temperature has stabilised, we were now able to connect it to our PCIe data recovery system. Normal reads were proving impossible. Therefore, we had to use a special PCIe disk reader with adjustable read timeout settings, controller power settings and disk reset functions. At a glacial speed of only 64 sectors per read, the disk took around two days to image. Even after this process, the disk’s NTFS partition table needed some repair to its MFT. However, the effort was worth it – most of the client’s files (.DOC. PDF, XLSX, PPTX were successfully recovered.
Drive Rescue Dublin, Ireland offers an advanced data recovery service for failed SSDs such as the Intel 660p,Intel 7600p, Intel H10 SSD M.2, Micron 1100, 1300, 2200, 2300, 5100, WD SN550, SN750 and SK Hynix PC601, HFM256GDJTNG, HFM512GDJTNG. Serving satisfied customers in Dublin since 2007
The SanDisk Cruzer Blade is a popular model of USB 2.0 memory stick on the Irish market. It uses a monolith NAND (usually TSOP48) TLC chip and an in-house controller designed by SanDisk. The Cruzer Blade range comes in capacities of 8GB (SDCZ50-008G),16GB (SDCZ50-016G), 32GB (sdcz50-032g) ,64GB (sdcz50-064g) and 128GB (sdcz50-128g).
However, like with any USB memory device, it is liable to corruption and events where your data is rendered inaccessible. For example, when you connect your Cruzer USB disk to your computer, you may receive an error message such as:
“You need to format the disk in drive E: before you can use it”.
“USB device not recognised”
The “parameter is incorrect”
Alternatively, your SanDisk Cruzer memory stick may appear to be totally dead when connected to your laptop or desktop computer.
Reasons why SanDisk Cruzer Blade USB devices fail. There are several reasons why your memory stick may fail to be recognised in Windows or on MacOS These include:
Its bootloader has failed. The bootloader is the microcode code needed for your memory stick to initialize. When this fails to load, your disk becomes unrecognisable.
There are two main components of a USB flash drive – the NAND chip (where your data is stored) and the controller chip. The controller chip is like the brain of your memory stick. It controls the read, write and erase processes. It also controls processes such as Error Correction Control (ECC) and wear-levelling. If your controller goes corrupt, the data on your stick may become inaccessible.
The NAND cells on your SanDisk Cruzer Blade may have degraded or have developed uncorrectable bit errors. The partition table (FAT32, NTFS, exFAT or HFS) on your Cruzer Blade USB stick may have gone corrupt. Your SanDisk USB device might have been subject to an over-voltage event. This can occur if a USB port such as on your computer, smart TV or NVR delivered too much voltage to your disk and caused damage to a component such as a diode or resistor.
Recovering Data from your SanDisk Cruzer USB memory stick.
Make sure your Cruzer USB memory stick is assigned a drive letter in Windows. You can check this by going into Disk Management (Control Panel > Administrative Tools > Computer Management > Disk Management)
Try using another computer. It is always possible that a glitch on your Windows or MacOS computer is preventing your Cruzer USB stick from being read.
Connect your memory stick directly to your computer. Do not use a USB hub as an interface between your computer and your USB memory stick. This is because a USB hub can sometimes create device recognition issues.
Mini SanDisk Cruzer Blade Data Recovery Case Study
We recently had a case where an employee of a Dublin-based investment company had a problem with their 8GB SanDisk USB Cruzer drive (SDCZ50C-008G). When they connected it to their Windows computer system, it would not appear in Windows Explorer. They had an extensive collection of research reports (PDF) and financial projections (Excel) stored on it which they badly needed to retrieve. The device was encrypted with McAfee Endpoint Encryption for Removable Media. They had assumed this encryption software was causing the issue. However, their IT support department examined their Cruzer USB disk and discovered that the device was not being recognised by any of their systems. They recommended Drive Rescue.
We connected the inaccessible disk to one of our data recovery systems designed to read flash-based storage at a very low-level. We performed a test read. However, after being connected for less than five minutes, we discovered that the USB drive had already disconnected! This was not looking good. A look at our system’s log files showed that the device had disconnected (virtually) from our systems after only 3.49 minutes. We surmised that, even though the disk was being read at a very low level, our recovery system was dropping the disk because of too many read instability issues. In order to circumvent this problem, we would have to use a second tool in our armoury to maintain the connection between our recovery system and the failing Cruzer disk. This specialised USB reader is designed especially for reading data from failing USB devices. It uses an Arm processor, which acts as an intermediary between the recovery system and problem disk. When the disk, is no longer interfacing directly with the operating system, we can control read-timeouts and disk-reinitialise parameters. In this particular case, the Cruzer USB had multiple unreadable NAND cells. So, we changed the read time-out to 10000 milliseconds and then controlled the disk initialisation rate when our equipment encountered bad cells. Our data recovery systems were now able to read the data in a much more stable and predictable way.
Successful Recovery: All files recovered.
After about seven hours on our bench, the 8GB Cruzer disk finally imaged to an SSD. Connecting the SSD to a standard Windows 10 workstation system presented us with a dialogue box requesting an encryption key. A very welcome sight! The client provided us with their McAfee Encryption key. This granted us access to the drive’s data immediately. Our client could now be reunited with their data again. The prospect of having to re-do hours and hours of painstaking work was now over!