August 2018
The Perils of Problem Fixation: what End-Users and Sys Admins can learn from Aviation Accidents

The tail fin of flight Air France 447 – Problem fixation can have tragic consequences.

On the 29th of December 1972 Eastern Airlines flight 401 takes off from a bitterly cold New York en route to Miami. 163 passengers are on board, most of them hoping to celebrate the New Year in the sun. Approaching Miami the pilot presses the button to activate the landing gear. Normally, a green light illuminating in the cockpit indicates successful deployment. However, in this case, no green light illuminates. The crew removes the bulb and blow on it to remove any dust. They screw it back tightly. Meanwhile, they fail to notice that the autopilot has become disengaged and the aircraft is now losing altitude. However, the crew is so fixated on fixing the bulb they fail to notice the rapid descent of the aircraft. Being night time, they have little visual cues. Their aircraft stalls and hits the ground. Only 75 of the 163 passengers survive the crash.

Fast forward to 2009, Air France flight 447 leaves Rio bound for Paris. Just three hours into the flight, the aircraft’s pitot tubes (used to detect airspeed) malfunction. This causes the autopilot to disengage. The cockpit becomes a cacophony of sirens and alarms. The pilots fixate on raising the plane’s nose when they should have been lowering it. The aircraft stalls. Tragically, according to aviation experts, the type of stall their aircraft experienced was recoverable from. However, the pilots were so fixated on raising the plane’s nose – they fail to recover the aircraft from its stall and it plunges into the South Atlantic.

Problem Fixation and IT Troubleshooting

These two cases illustrate how problem fixation can sometimes have tragic consequences. You might be thinking how does this relate to data loss? Well, unfortunately, problem fixation is also an issue in the world of information technology. Hard disk failure provides a classic example of this. When a disk is failing, some end-users and sys admins can easily mistake the symptoms of a failing disk with another un-related problem. This is understandable because symptoms of a failing disk will often manifest themselves in unexpected ways. For example, a failing disk can cause an operating system such as OS X or Windows to throw up all sorts of spurious error messages. This can lead to some frantic googling of symptoms which have little or no relation to the real problem. Likewise, the symptoms of a failing hard disk can often mirror those of failing hardware components such as RAM or graphics cards. And then of course, there is the issue of viruses and malware – the symptoms of which also closely resemble those of a failing disk. Applications may fail to start or run painfully slow. Some users will perform virus and malware scans in their efforts to remove a non-existent infection. The potential for problem fixation does not end there though. When a failing external hard disk is connected to an operating system spurious error messages like “Data Error – Cyclic Redundancy Check” (Windows) or “you need to format the disk in drive E: before you can use it. Do you want to format it?” (Windows) can be thrown up. As for NAS devices, these can be a veritable Pandora’s box of cryptic error messages which users can end up fixating on. Take for example, Buffalo’s error coding system which begins with “E”. You might get an ”E13” message or “E14” “E15”, “E16” “E22” “E23”or “E30”. All of which might look like something written on the back of a packet of Skittles but are all basically reporting the same thing “there is something seriously wrong with one of your disks”. However, if a user goes down the Google rabbit hole with these error messages – valuable time can be lost, all while a disk’s condition might be deteriorating rapidly.

Don’t Fixate on Error Messages or Symptoms

The philopsher William of Ockham (of Ockham’s Razor fame) once said “With all things being equal, the simplest explanation tends to be the right one”. And there is some truth to this. If a computer system or storage device is acting strangely – sometimes going back to basics is a worthwhile strategy. Start off with performing disk diagnostics first to find out whether the disk(s) are healthy or not. But fixating on particular error messages wastes valuable time especially when the underlying problem might be a failing disk. Finding this out quickly affords you the opportunity to perform a complete disk backup and possibly negating the need for a data recovery service!

