Side-channel and fault attacks (advanced)¶

This section first provides an overview of physical attacks on implementations of cryptographic algorithms. The second part discusses various countermeasures and some open research problems.

Physical attacks, mostly side-channel and fault attacks, were originally a major concern for developers of small devices, particularly for protecting smart cards and pay-TV systems. The significance of these attacks and countermeasures is increasing as more electronic devices become easily accessible with the proliferation of IoT deployments.

Attacks (advanced)¶

Today, cryptographic algorithms provide very strong protection against mathematical and cryptanalytic attacks. This is especially true for algorithms that are standardised or have undergone extensive scrutiny and acceptance in the open research literature. The weakest link is often the implementation of these algorithms in hardware and software. Information leaks from hardware implementations through side-channel and fault attacks.

Side-channel attacks are typically passive, while fault attacks are active. Another classification can be made based on the attacker’s proximity to the device.

Side-channel attacks: Common side-channel attacks involve passive observation of the computation platform. Through data-dependent variations in execution time and power consumption, or through electromagnetic radiation emitted by the device, an attacker can infer secret information about the internal state of the platform. Observations are typically made near the device while it is operating normally. It is important to note that the normal operation of the device is not disturbed. A key strength of side-channel attacks is that the device is usually unaware that it is under attack.

Side-channel attacks based on variations in power consumption have been widely studied. Typically, the attack takes place near the device, where access to the power supply or its connection pins is available. Subtypes of power-based side-channel attacks include simple power analysis (SPA), differential and higher-order power analysis (DPA), and template attacks. In SPA, the idea is to first analyse how the behaviour of the target depends on the key. Typical targets (also in timing attacks) are if-then-else branches that depend on key bits. In implementations of public-key algorithms (such as RSA or ECC), the algorithm proceeds sequentially through all key bits. If the if-branch requires more or less computation time than the else-branch, this can be observed externally from the system or computation chip. SPA attacks are not limited to public-key algorithms; they have also been applied to algorithms of secret keys and those used for generating prime numbers. Thus, when the internal behaviour of the device is known, SPA requires only one or a few traces for analysis.

A DPA attacker collects multiple traces. The number ranges from a few dozen for an unprotected hardware implementation to millions when the implementation is protected. DPA does not proceed directly towards the key; instead, the attacker builds a statistical model of how the same operation, executed with an unknown but fixed key on different data inputs, leads to different power consumption profiles. Statistical processing of the collected traces relies on correlation analysis and other statistical tests that compare measured values against models constructed for different keys (e.g. all 256 possibilities for a given key byte). The correct key is approached by identifying the model that best matches the observations (statistically). The analysis then continues by examining the influence of subsequent bits (e.g. bytes) of the key. Although large datasets and extensive analyses are required, the effort is still orders of magnitude smaller than that of a brute-force attack.

Side-channel attacks based on electromagnetic radiation were identified early in the context of military communication and radio equipment. NATO and many governments have published TEMPEST specifications to protect devices from electromagnetic leakage, as well as from information leakage through vibration or sound. Electromagnetic radiation can be monitored not only from a distance but also at very close range: a precise sensor placed on top of an integrated circuit, combined with a 2D positioning system, can reveal highly localised information from the chip. (see more about TEMPEST in a later module)

Timing attacks form another class of side-channel attacks. When the execution time of cryptographic computations or software processing varies depending on sensitive data, an attacker can exploit these timing differences. Even differences in execution time between if- and else-branches depending on a key may be sufficient. A specific subclass is cache attacks, which are based on timing differences caused by whether data is found in the cache.

In template attacks, the attacker obtains a copy of the target device and builds a statistical model of its signal behaviour, covering a set of inputs and secret data values. By comparing the behaviour measured from one or a few executions of the target device against this model, secret information can be inferred. The key difference from DPA is that, during the actual attack phase, only a very small number of measurements is required. Template attacks therefore remain effective even if the original device prevents repeated executions, for example by using a counter to limit failed attempts. The modelled signals may be based on timing, power, or electromagnetic radiation. With advances in machine learning and AI techniques, profiling side-channel attacks, including template attacks, have become more effective and increasingly practical for attackers.

Microarchitectural side channels: Processor microarchitectures are highly susceptible to timing attacks. The problem of information leakage and the difficulty of isolating programs from each other were recognised early on. Later, variations in success of cache accesses became an important class of timing attacks. More recently, microarchitectural side-channel attacks such as Spectre, Meltdown, and Foreshadow have received significant attention. These also rely on detecting timing differences, but their strength and effectiveness arise from the fact that they can be carried out remotely in software. In addition to caches, modern processors include multiple optimisation techniques to improve performance, such as speculative execution, out-of-order execution, and branch prediction. Although virtualisation and other software techniques isolate data between parties at the architectural level, vulnerabilities can emerge at the microarchitectural level. In such cases, the processor, following these optimisation techniques, may speculatively access memory locations that are not intended for the current process (at the architectural level). Even though such instructions are never completed, they may still affect the microarchitectural state, such as the cache. This can create a side channel, for example observable as variations in access times.

Fault attacks: By disturbing the device’s clock, power supply, or temperature (increasing or decreasing it), faults can be induced in computations or in the program’s control flow. Conclusions can be drawn from incorrect computation results, but sensitive information may also leak if the device fails to produce a result or resets itself. A well-known example is disturbing an RSA implementation so that a signature is correct with respect to one prime factor but incorrect with respect to the nother. From this, the private key can be derived. (Background: the prime factors are the private components of an RSA public key, and optimised RSA implementations operate separately with respect to the prime factors using the Chinese Remainder Theorem (CRT)). Although fault attacks require proximity to the device, they do not need to physically penetrate it.

As memory density increases, new attack surfaces emerge. The RowHammer attack “hammers” memory by repeatedly reading specific DRAM locations and can cause bit flips in physically adjacent memory regions. (:ref:`see in more detail in operating systems <11-1-en>.)

Even more detailed information can be obtained from a microchip by opening its package or etching away layers of silicon. Attacks may use optics or laser techniques, as well as methods originally developed for chip reliability research and fault analysis. Such methods include focused ion beam techniques and scanning electron microscopy.

Countermeasures (advanced)¶

No single general method protects against all side channels. Countermeasures can be considered according to the abstraction level, and they depend on the threat model and other assumptions.

The most effective defence against timing attacks is constant-time execution, that is, hardware that completes its tasks in the same time regardless of secret inputs and internal state. Timing calculations must be increasingly precise and finely tuned depending on how accurate the attacker’s measurement equipment is according to the threat model. Achieving constant-time behaviour applies at the processor architecture level to instructions, at the RTL level to clock cycles, and at the logic and circuit level to logic depth and the critical path. For instructions, constant-time execution can be achieved by balancing execution paths and adding dummy instructions. Resource sharing, for example through caches, makes achieving constant-time behaviour extremely difficult.

At the RTL level, it must be ensured that all instructions operate with the same number of clock cycles. This can be achieved using dummy instructions or, at a finer granularity, dummy gates. A challenge is that both hardware and software compilers tend to remove such instructions or gates added for protection, as they appear unnecessary from a performance perspective.

Since many side-channel attacks rely on a large number of observations, randomisation is a popular countermeasure. It is used to protect against side channels exploiting power consumption, electromagnetic radiation, or timing. Randomisation can be applied at the algorithm level. It is particularly common in public-key algorithms, where the key or message can be blinded with a random factor before the operation and the blinding removed afterwards (≈ multiplication and division).

Randomisation at the register transfer or gate level is called masking. In this approach, intermediate values of computations are randomised so that, for example, the device’s power consumption can no longer be directly linked to internal secrets. Numerous results have been published at the gate level, ranging from simple Boolean masking to threshold implementations that are provably secure under certain leakage models. Randomisation has proven effective in practice, particularly for protecting public-key algorithms. Protecting secret-key algorithms with masking is more challenging: some masking schemes require a vast amount of random numbers, while others assume leakage models that do not always reflect reality. Unlike post-quantum cryptography, which is resistant to quantum computers and for which standards and implementations already exist, leakage-resilient cryptography is still largely in the research stage. Its goal is inherent resistance to side-channel attacks, but significant challenges remain between theory and practice. (See also in another module the mention of constant-time cryptography.)

Hiding is another key class of countermeasures. The idea is to weaken the sensitive signal by mixing it with noise. TEMPEST protection is one example. Hiding can also be applied at the gate and block levels by reducing the power consumption or electromagnetic signal of the basic logic. Simple methods include intentional jittering or frequency variation of the clock signal, as well as the use of large bypass and filtering capacitances in the supply voltage. These methods reduce the difference between the signal observed by the attacker and noise.

Sometimes leakage at one abstraction level can be mitigated at another level. For example, if the aim is to reduce the likelihood of a cryptographic key leaking from an embedded system, the key can be refreshed sufficiently often at the protocol level.

General-purpose processors, such as CPUs, GPUs, and microcontrollers, cannot be modified after manufacturing. Protecting against microarchitectural attacks using software patches and updates is extremely difficult, and limiting the cost often leads to degraded performance. Even microcode updates effectively modify only software, although the change may appear to the user as a hardware update. This is because the mapping between code and microcode is a trade secret of the device manufacturer. Providing software updates as a general solution is also a significant security challenge, because it is not known in advance what kinds of applications will be run on the hardware.

Protection against fault attacks is implemented at the RTL and circuit levels. At the RTL level, protection is mainly based on redundancy in time or space and on checks based on encoding. For example, parity checking is spatial redundancy. The cost of redundancy is high, as computations must be repeated. One problem with adding redundancy is that it increases the attack surface for side channels: due to additional computations, the attacker has more traces available for time-, power-, or electromagnetic side-channel attacks. Above the RTL level, at the circuit level, monitoring of the clock or power supply can detect deviations from normal operation and trigger an alarm.

Various circuit-level sensors are added to integrated circuits. Light sensors can detect whether the packaging has been opened. Metal mesh sensors in the outermost metal layers can detect probing attacks. Temperature sensors detect heating and cooling of the circuit, and antenna sensors can detect changes in electromagnetic fields near the device. Other sensors can also be added to detect, for example, manipulation of the power supply or clock signal. However, adding sensors to detect active manipulation can itself increase side-channel leakage.

Countermeasures against side-channel and fault attacks constitute a challenging and active area of research.