The number glitch that can lead to catastrophe

Getty Images (Credit: Getty Images) — (Credit: Getty Images)

A surprisingly simple bug afflicts computers controlling planes, spacecraft and more – they get confused by big numbers. As Chris Baraniuk discovers, the glitch has led to explosions, missing space probes and more.

Tuesday, 4 June 1996 will forever be remembered as a dark day for the European Space Agency (Esa). The first flight of the crewless Ariane 5 rocket, carrying with it four very expensive scientific satellites, ended after 39 seconds in an unholy ball of smoke and fire. It’s estimated that the explosion resulted in a loss of $370m (£240m).

What happened? It wasn’t a mechanical failure or an act of sabotage. No, the launch ended in disaster thanks to a simple software bug. A computer getting its maths wrong – essentially getting overwhelmed by a number bigger than it expected.

Why is the number 2,147,483,647 important?

How is it possible that computers get befuddled by numbers in this way? It turns out such errors are answerable for a series of disasters and mishaps in recent years, destroying rockets, making space probes go missing, and sending missiles off-target. So what are these bugs, and why do they happen?

Imagine trying to represent a value of, say, 105,350 miles on an odometer that has a maximum value of 99,999. The counter would “roll over” to 00,000 and then count up to 5,350, the remaining value. This is the same species of inaccuracy that doomed the 1996 Ariane 5 launch. More technically, it’s called “integer overflow”, essentially meaning that numbers are too big to be stored in a computer system, and sometimes this can cause malfunction.

Such glitches emerge with surprising frequency. It’s suspected that the reason why Nasa lost contact with the Deep Impact space probe in 2013 was an integer limit being reached.

And just last week it was reported that Boeing 787 aircraft may suffer from a similar issue. The control unit managing the delivery of power to the plane’s engines will automatically enter a failsafe mode – and shut down the engines – if it has been left on for over 248 days. Hypothetically, the engines could suddenly halt even in mid-flight. The Federal Aviation Administration’s directive on the matter states that a counter in the control unit’s software will “overflow” after this specific period of time, causing an error. Although scant details have been released – the FAA and Boeing declined to comment for this article – some amateur observers have pointed out that 248 days (when counted in 100ths of a second) is equal to the number 2,147,483,647 – which is significant.

How so? It just so happens that 2,147,483,647 is the maximum positive value that can be stored by a “32-bit signed register”, commonly installed on many computer systems. On Ariane, by comparison, the software was using a “16-bit” space, which is much smaller and only capable of storing a maximum value of 32,767.

Puzzling limit

Numbers are infinite, so why choose such limited storage spaces for them? The answer is that computers have traditionally demanded efficiency in all things. Storage space used to be much more costly than it is today and processing larger values took longer. If you kept to certain limits, software was expected to run more smoothly. Rocket guidance systems do a lot of critical number crunching very quickly, so these overheads certainly matter. The problem with that, as the Ariane 5 proved, is that such limitations aren’t always foreseen as problematic.

“We have to recognise that in software we are always approximating reality,” explains Bill Scherlis, a software expert at Carnegie Mellon University. “There’s always an engineering trade-off between the cost of having a more precise representation and the benefit of the efficiency.”

Not all rollover glitches are as destructive as these examples, but they do frequently create unexpected effects. For example, in the video game Civilization, an unanticipated bug in this vein caused the peaceful character Gandhi to become uncharacteristically hostile. When players chose a certain mode to play in, the value which defined Gandhi’s aggressiveness rolled backwards past zero to the maximum. Consequently, he would threaten players with nuclear weapons at every turn – to the great amusement of many players.

Psy’s Gangnam Style is credited with ‘breaking’ video-sharing website YouTube (Credit: Getty Images)

Getty Images Psy's Gangnam Style is credited with 'breaking' video-sharing website YouTube (Credit: Getty Images) — Psy’s Gangnam Style is credited with ‘breaking’ video-sharing website YouTube (Credit: Getty Images)

Scherlis notes that the previous limitation reveals the expectations of the original programmers who built YouTube. “Certainly, when YouTube’s software was first developed I think it was probably hard for any developers or designers to imagine that they would overflow [this number],” he says.

It’s often this sort of assumption, which initially may seem reasonable, that causes problems years down the line. The most talked about overflow bug in history, which many will remember, was the much-hyped Millennium Bug. Although largely considered a damp squib, the Y2K problem did cause some headaches.

Fears of a global meltdown from the ‘Millennium Bug’ turned out to be unfounded (Credit: Getty Images)

Getty Images Fears of a global meltdown from the 'Millennium Bug' turned out to be unfounded (Credit: Getty Images) — Fears of a global meltdown from the ‘Millennium Bug’ turned out to be unfounded (Credit: Getty Images)

About 15 years ago programmer William Porquet had the idea of thinking ahead to yet another crucial date – GMT 3.14.07am on Tuesday 19 January 2038. This is the moment when the number of seconds since 1 January 1970 will exceed one of the maximum values of many computers’ date and time registers nowadays. Like the Millennium Bug, failure to prepare for this could result in computer crashes.

“It was in 1999 that I first wrote about this,” comments Porquet. “I acquired the domain name 2038.org and at first it was very tongue-in-cheek. It was almost a piece of satire, a kind of an in-joke with a lot of computer boffins who say, ‘oh yes we’ll fix that in 2037…’ But then I realised there are actually some issues with this.”

Will a January morning in 2038 see computers crashing all over the world? (Credit: Getty Images)

Getty Images Will a January morning in 2038 see computers crashing all over the world? (Credit: Getty Images) — Will a January morning in 2038 see computers crashing all over the world? (Credit: Getty Images)

Porquet is concerned about old bits of software that nobody tends to anymore – on long-established networks, or on old hardware being used in remote parts of the world. How many of them will still be in use 23 years from now and what consequences that could have is anybody’s guess.

“A lot of computer systems,” notes Porquet, “can be caused to fail in a predictable manner. But this is failure in an unpredictable manner.”

Markus Kuhn, a computer scientist at the University of Cambridge explains that time related bugs create interest partly because their consequences are unpredictable, but also because they are “not unexpected” and that people are able to speculate about what will happen when the fateful date arrives.

Kuhn thinks that the 2038 problem will be less significant than Y2K because the Millennium Bug has prepared the computer industry to make the necessary fixes. Indeed, that’s all part of William Porquet’s plan. “I hope it’s something that will take me out of semi-retirement for a very large sum of money,” he says, only half joking.

The speed of Earth’s rotation may also cause a slight time change that could crash computers (Credit: Getty Images)

Getty Images The speed of Earth's rotation may also cause a slight time change that could crash computers (Credit: Getty Images) — The speed of Earth’s rotation may also cause a slight time change that could crash computers (Credit: Getty Images)

For Kuhn, the interesting time problem for computers of the moment is not an overflow glitch per se, but it is one this coming June. The year 2015 will be a second longer than 2014 thanks to a move to correct the discrepancy between astronomical time (the time on Earth based on our planet’s rotation) and atomic time (the most accurate known time-keeping method, in terms of counting seconds). While atomic time, which is to be adjusted with this year’s leap second, is mind-bogglingly precise, it is actually slightly out of sync with astronomical time because the Earth’s rotation is very gradually decelerating. Geological events such as earthquakes can cause changes in the speed of this rotation meaning that the addition of leap seconds, unlike leap years, is variable. The last one was in 2012 and crashed many computers. Fortunately, says Kuhn, we’ll hopefully be more prepared than we were in the past.

It seems like no matter what we do, certain numbers and calculations will always confuse computers, causing malfunction – or worse. “We’ve learned a lot from the Y2K experience and other similar events,” notes Scherlis. “But the reality in which we are always making approximations and having to navigate an engineering trade off? That is with us forever.”

Source link

Top News

The number glitch that can lead to catastrophe

Categories

Technological Innovation

Social Justice Advocacy

Scientific Discovery

Resilience & Overcoming Adversity

People Reads

Coloring Outside the Lines: Dr. Howard Stevenson III’s Financial Revolution for the Next Generation

The Power of the Pivot: Dr. LaShawn Traylor’s Call to Purpose and Impact

Building Beyond Brick and Mortar: Dr. Saquanda Cotton’s Blueprint for Empowerment

Pages of Purpose: How Dr. T.K. Winfrey is Rewriting the Future Through Literacy

Tags

Popular Topics

Popular News

10 Inspiring Stories of People Who Overcame Adversity

New Zealand’s Māori King Tuhetia dies

Random News

Where to Find Scholarships for Black Students

Salisbury toxic hotspots clean-up begins

Major League Soccer unveils ‘MLS Unites’