Cosmic rays from space can disrupt the data stored in computers by flipping bits.
We are all familiar with computers and their usefulness in our daily lives. We use them for various purposes, ranging from entertainment and gaming to finance, accounting, and even solving complex mathematical equations that can determine how galaxies form and model different biological systems.
But are computers 100% perfect?
Computers can crash, become infected with viruses, and be compromised in various ways, such as through bloatware or ransomware. Even the natural world has ways to interfere with computers.
An illustrative image of what happens when a Bit Flip occurs.
Introduction to Bit Flip
Bit Flip is a type of unintended memory data change. Computers store data as bits represented by 0s and 1s. When a piece of data experiences a ‘Bit Flip’, the value of that memory data changes or reverses: 0 becomes 1, and 1 becomes 0.
This Bit Flip occurs when a high-energy charged particle attacks the memory hardware. These particles can be alpha particles or cosmic rays originating from space. When such particles attack the memory hardware, they alter the properties of the electrons used to store data, causing the bits to flip.
Bit Flips are classified as “soft errors.” When a soft error occurs, we can make the necessary corrections by using code to record the bit value at the point of failure and retrieve the correct value. This is different from hard errors, which are usually caused by faulty or damaged hardware. When a hard error occurs, the hardware itself needs to be replaced.
As mentioned, cosmic rays are one of the causes of Bit Flip phenomena in memory devices.
This is a general illustration of what happens when a cosmic ray enters the Earth’s atmosphere. This conversion of cosmic rays into pions and muons is known as the “cosmic ray shower.”
How Do Cosmic Rays Cause Bit Flips?
Cosmic rays are high-energy particles originating from outer space. They primarily consist of protons, along with a small amount of helium nuclei and a small amount of heavier nuclei and quantum particles.
When these cosmic rays reach the upper layers of Earth’s atmosphere, they collide with the nuclei of particles in the atmosphere. The cosmic ray particles primarily convert into pions, which then decay into muons. Muons do not interact much with matter and can easily reach the Earth’s surface.
RAM and flash memory store data using transistors as one of their main components. These modern memory devices use metal-oxide-semiconductor field-effect transistors (MOSFETs). Memory storage, in the form of bits, is accomplished by applying voltage values across the terminals of the transistors.
An illustration of metal-oxide-semiconductor field-effect transistors. These types of transistors are widely used in memory storage devices.
A Bit Flip occurs when an external charged particle, such as a cosmic ray, interacts with a MOSFET and alters the properties of the electrons flowing through it, and more broadly, the voltage value at the transistor’s terminals.
Computers on the Earth’s surface are generally safe from cosmic rays, as most of them become muons when they reach the surface. Thus, ground-based computers typically do not experience bit flips, but this is not the case for spacecraft traveling in outer space. They are bombarded by cosmic rays without the protection of Earth’s atmosphere, making them quite susceptible to Bit Flips.
Bit Flip Corrections
While avoiding cosmic rays may be impossible for spacecraft once they have left Earth’s atmosphere, there are other measures we can take to rectify a flip phenomenon once it has occurred. Sometimes, rebooting can indirectly erase flipped bit data, resetting that data back to its original value through memory refresh and reinitialization. However, this technique is not always effective and may require more robust methods.
Sometimes we use Error Correction Codes (ECC) to correct errors caused by Bit Flip operations. These codes can detect when a Bit Flip operation has occurred, often by identifying the 0s or 1s contained in the data (as provided by the user). If the software detects a mismatch between the 0s or 1s it receives and what the user provided, it will identify an error.
A diagram showing how cosmic rays can attack MOSFETs to cause Bit Flips.
More complex ECCs, such as Hamming codes, are also used to rectify errors caused by Bit Flips.
Another approach to handling and correcting Bit Flips is using a technique called modular redundancy. Here, corrections are made by repeating the process from where we retrieve the data and then proceeding to a majority vote.
For example, if the retrieved data is “1,” by repeating that data three times, we would get “111.” However, suppose a bit flip occurs and the retrieved data is instead ‘110.’ Since “1” still has the majority, the modular redundancy will indicate that “1” is the correct data for the bit.
Three-time redundancy is referred to as three-dimensional modular redundancy or triple modular redundancy.
The computers used in the Shuttle program employ five repetitions and are known as five-dimensional modular redundancy. While effective, modular redundancy requires significant volume and power, making its implementation challenging.
Astronaut Chris Hadfield using a computer aboard the International Space Station.
Just as the atmosphere protects plants and animals on Earth from dangerous rays from outer space, it also shields computers and other electronic devices. However, with the increasing number of space missions and important missions like those to Mars, the phenomenon of cosmic ray Bit Flips is an area that must be taken very seriously. Outer space is not where you want computers to generate errors and crash at unexpected times.