Fifty years ago, because of “technical issues” on the previous flight, Apollo 10, there were thoughts of postponing Apollo 11 and humankind’s first visit to the lunar surface. Those technical issues focused attention on Minneapolis and the quality of Honeywell’s “rate gyros.”
Rate gyros are devices that indicate the change of a vehicle’s angular orientation over time. For example, if you were driving northbound on a freeway and exited under an overpass to join the crossing freeway, a gyro would indicate your heading changing from 0 degrees (north) to 90 degrees (east) to 180 degrees (south) to 270 degrees (west). If the exit ramp had been built with perfect symmetry and you maintained a constant speed, a rate gyro would show a constant rate of change during the turn.
This type of gyro is very important in stabilizing and controlling a launching rocket. It can detect small drifts caused by aerodynamic or thrust asymmetries, which are then counteracted by adjusting the thrust nozzles. Any change in pitch or yaw of the rocket is quickly sensed and corrected.
Honeywell had begun building rate gyros in the 1950s and soon became a leading supplier, leveraging Minneapolis’s superior capability in precision machining to produce the most accurate rate gyro for its size and weight. Honeywell labeled the device the GG440, but the industry knew it as the “Golden Gnat.” The one-inch diameter, two-inch long cylinder was electroplated with a gold finish for protection in harsh environments.
A Saturn V rocket used nine rate gyros in its instrumentation unit (achieving triple redundancy on pitch, roll and yaw) and two in each rocket stage, totaling 15 per Apollo launch.
During the Apollo 10 flight, which carried three astronauts around the moon and back, “the crewmen stated that the FDAI indicated excessive drift in the pitch and yaw axes,” according to a NASA technical note. Honeywell also had provided the Flight Directive Attitude Indicator (FDAI).
This apparent fault spawned an intense study into its cause. Apollo 11 was scheduled to launch in less than six weeks. Instead of denying that its product was part of the problem, Honeywell opened its factory and redirected its engineering in an effort to accelerate the investigation. Any fault was one too many.
The Minneapolis factory already had designed and implemented one of the highest-quality clean-room assembly areas in the country, knowing that even a speck of dust could lead to inaccuracies. The nine rate gyros that returned to Earth were sent to Honeywell for fault analysis. While the factory was testing and inspecting each of these gyros, a floor full of engineers above the factory worked with NASA analysts to scrutinize the design of the sensors and the entire control system for possible faults.
Determining a root cause when a fault can be observed is straightforward. But in this case none of the returned rate gyros showed any irregularity. The entire floor of engineers identified every conceivable reason for the Apollo 10 error.
After two intense weeks of proposing faults and predicting their impact, no cause for the Apollo 10 error could be determined. NASA concluded the case had been thoroughly studied and closed the investigation.
The remaining seven Apollo missions flew with no faults in the guidance and control system.
These “no-fault-found” errors inspired the aerospace industry to establish a “technique to assist in the rapid identification of single-point failures. The Apollo method required many engineers to search diagrams for problems, but this technique is not altogether successful for complex systems,” again according to NASA technical documents.
But once a single-point failure is identified, it can be avoided by adding redundancy in the system. The aerospace industry continuously pushed for higher reliability, higher safety and higher quality.
Quality in engineering and manufacturing in that era was measured by how well a product met its requirements. Product development followed a well-defined, step-by-step process. A familiar example of a “stage-gate process” is obtaining a driver’s license.
The first “gate” is getting a permit. The second gate is passing the written exam. The third gate is passing the driving test. The requirements for each gate are defined and effort is staged to pass each gate, one at a time. This process created a highly safe driving system (at least until mobile devices showed us how boring driving was).
Why does it seem we’re not using this process in industry today? In the mid-1990s business schools adjusted the definition of quality; it became the degree to which a commodity meets the requirements of the customer at the start of its life. This definition was widely adopted as companies and organizations switched from being operationally driven to being financially managed. Companies and business schools focused increasingly on how to become intimate with “customers.” (There was no mention that they might kill them in the process.)
About the same time, software was becoming integral to many products and services. Software has many advantages for innovation. It is easy to modify; it is applicable to many different hardware environments (think of apps for phones); it can customize a product for specific applications, and it can offer solutions that work around hardware limitations.
Such was the case with Boeing’s 737 Max 8 aircraft.
In order to incorporate more fuel-efficient engines in the upgrade of the 737-800 to the 737 Max 8, it was necessary that the larger engines be mounted differently to avoid the need for major structural changes to the airframe. Airframe changes would require the aircraft to complete new FAA flight certification involving years of testing and millions of dollars.
However, mounting the engines differently changed the aerodynamics, creating a higher probability of a stall on takeoff. Boeing developed a software feature, called the Maneuvering Characteristics Augmentation System, to address this hardware issue.
But as we know now, Boeing downplayed the fact that there remained an additional risk of malfunction in angle-measuring sensors.
Boeing blamed the first deadly crash of the 737 Max on pilot error. It took a second deadly crash under nearly identical circumstances to prove to Boeing that there was a safety issue with the plane.
How could this happen?
Since the Apollo program, there has been a cultural shift — from suppliers having to prove to customers that a product wouldn’t fail, to customers now having to prove to suppliers that it did fail.
Honeywell was under the spotlight after a reported failure by the Apollo 10 crew and engaged every possible resource to prove the accuracy and safety of its equipment. Would Honeywell today act any differently than Boeing? Probably not.
As was learned during the Apollo program, diagnosing single-point failures in complex systems is extremely difficult and time consuming. Businesses today, especially software companies, depend on customer learning before, during and after product launch to determine requirements and identify faults.
We consumers have to renew our demand for quality, built on a foundation of excellence. It is possible to design and build safe systems. We shouldn’t settle for anything less.
James Lenz worked for Honeywell in Minneapolis from 1980 to 2002. He is a visiting scholar at the University of Illinois Gies School of Business (firstname.lastname@example.org).