Fault Tolerance

Carl Ludewig
by CARL LUDEWIG
Wednesday, March 2, 2012

A year after Japan's devastating earthquake and tsunami, an independent investigation reveals how much worse the nuclear disaster could have been. When designing fault tolerant systems, the critical point of failure is often not identified until something goes wrong.

In this case, the last line of defense consisted of diesel generators, which were flooded by the incoming rush of water. In a much less life-threatening situation, diesel generators were also the cause of failure at the 365 Main data center in 2007. But we can't just blame diesel engines, for the truth is that nature and complexity will conspire to defeat any man-made plans.

Reactor 3 in the days following the tsunami

The first thing to consider when designing a fault tolerant system is to ask the question, "What happens if all countermeasures fail?" If the answer is the evacuation and potential abandonment of Tokyo, then you need to question the wisdom of building the thing in the first place.

There is some good work being done in the area of passive safety systems, which start from the assumption that no one is around to take action and consider how we can use our knowledge of the laws of physics to control the outcome without relying on external power, moving mechanical parts, fluid, etc.

However, there are no fail-safe systems, only ones that respond to problems better or worse than others. In all my years of engineering, I'm still slightly surprised when a backup system actually functions as intended. Assuming everything will go to plan is folly.

May 19, 2012 San Francisco
Topic: Fukushima