- COMP.SEC.100
- 12. Operating Systems and Virtualisation
- 12.2 Attacker model
Attacker model¶
This module focuses on the technical aspects of security, which means that the threat model ignores insider threats, human behavior, physical attacks, project management, organizational practices, and the like. These are also important, but they are not under the control of operating systems. The threats and attack techniques considered here are of the following kinds, which will be discussed in more detail later—excluding DoS cases (those are covered more extensively in the context of network security).
- Malicious extensions
- The attacker succeeds in persuading the system to load a malicious driver or kernel module (e.g., as a Trojan horse).
- Bootkit
- The attacker compromises the boot process in order to gain control before the operating system starts.
- Memory errors (software)
- Spatial and temporal memory errors allow an attacker (local or remote) to control control flow or leak sensitive information.
- Memory corruption (hardware)
- Vulnerabilities such as Rowhammer in DRAM memory allow a local or remote attacker to modify data they should not have access to.
- Leakage of uninitialized data
- The operating system returns to a user program data that has not been properly initialized and may contain sensitive information.
- Concurrency errors and double fetch
- Example: the operating system uses the same piece of data from a user process twice for the same purpose, but the data changes between the two uses. For example, the data may indicate the required size of a buffer: memory is first allocated based on this value, and later data is copied into the buffer after the size has changed to a larger value. (Similar results can occur, though less commonly, if the size is not fetched again but the buffer itself is reduced in the meantime.)
- Side channels (hardware)
- Attackers can measure access times of shared resources such as caches and tables that accelerate memory references to detect that another process has used the resource, potentially leaking sensitive data.
- Side channels (speculative execution)
- Overall operating system performance can be improved via controlled deviations from the normal instruction execution order. If a computation turns out to be unnecessary, traces are cleaned up, but measurable traces may still remain in the processor state.
- Side channels (software)
- Example: when operating systems use memory optimization features such as deduplication, an attacker can use timing measurements to infer that some content in a security domain forbidden to them is identical to content they possess.
- Resource exhaustion (DoS)
- By allocating resources (memory, CPU, buses, etc.), the attacker prevents other programs from making progress, resulting in denial of service.
- Deadlocks (DoS)
- The attacker brings the system into a state where some parts of the software no longer make progress, for example due to deadlocking.
These issues—excluding DoS and initialization errors—are discussed below at varying levels of depth, using loosely indicative headings and with an emphasis on side channels.
(Extensions, bootkit.) The simplest way to compromise a system is to inject a malicious extension into the operating system. For example, in monolithic systems such as Linux and Windows, this may be a malicious driver or kernel module that a user has installed as a Trojan horse. It has access to all privileged operations. To maintain their foothold in the system covertly and independently of what the operating system or hypervisor might do, the attacker may infect the boot process. This can happen, for example, by overwriting the master boot record or the UEFI (Unified Extensible Firmware Interface). The malware gains control during the boot process at every reboot, even before the operating system starts, allowing it to bypass all operating-system-level protections. When an attack technique targeting the boot process is implemented as a reusable piece of malware, it is referred to as a bootkit.
(Memory /software.) In addition to the use of Trojans, attackers often break security without user assistance by exploiting vulnerabilities. They have a wide variety of methods. Software vulnerabilities such as memory errors can be used to modify code pointers or data in the operating system, thereby breaking integrity, confidentiality, or availability. By modifying a code pointer, the attacker controls where execution continues after a hijacked call, jump, or return instruction. Modifying data or data pointers opens other possibilities, such as elevating a normal process to a root process (with full administrative privileges) or modifying page tables to grant a process access to arbitrary memory pages. Similarly, attackers can exploit vulnerabilities to leak information from the operating system by changing what data, or how much data, a system call or network request returns.
(Memory /hardware.) Attackers can exploit hardware vulnerabilities such as the Rowhammer bug present in many DRAM chips. Because memory bits are packed very closely together in rows, accessing bits in one row can cause adjacent-row bits to leak a small amount of charge into their capacitors, even if the bits are on different memory pages. When a row is accessed repeatedly at high speed (“row hammered”), disturbances accumulate such that in some cases a neighboring bit may flip. One cannot predict in advance which bits will flip, but once a bit flips, it will flip again if the experiment is repeated. If an attacker manages to flip bits in kernel memory, they gain access to attacks similar to those based on software memory corruption, such as corrupting page tables to access other processes’ memory regions.
(Concurrency…) Another class of attacks consists of concurrency errors and double fetches, which share the feature that two related elements are incorrectly combined. Concurrency bugs are also called race conditions and are common in distributed or networked computing. From a security perspective, the basic example is a TOCTOU attack (Time Of Check, Time Of Use), in which an attacker modifies some attribute into an unsafe value after it has been checked and approved but before it is used. For example, the name of a file to be accessed can be changed in between (e.g., via linking) from an ordinary file to one that should only be accessible to a root process (e.g., the password file). Double fetch is another important operating-system issue and resembles the above, but in this case a value is fetched twice: first the original value, on the basis of which, for example, memory is allocated, and then again when data is written and bounds are checked. The checking goal fails if the attacker can increase the value in between. In TOCTOU cases, no second fetch occurs; instead, the operating system has granted rights to a process that then uses a value modified by the attacker.
(Side channels.) Instead of direct attacks, data can be leaked indirectly via side channels. At the hardware level, operating systems are associated with many kinds of side channels. Consider cache side channels first. There are many variants. A common case is that an attacker fills a set of cache regions with their own data or code and then periodically accesses those addresses. If some access is significantly slower, the attacker knows that someone else—presumably the victim—has also used data or code mapping to the same region. The leakage is easy to understand in cryptographic cases, where victim code calls different regions depending on a secret. If a process handles a key bit by bit and calls code from different regions depending on whether the bit is 0 or 1, the attacker quickly learns the key.
Another well-known hardware-based side-channel attack exploits speculative and out-of-order execution. For performance reasons, modern processors may execute instructions deviating from the normal control flow. For example, while waiting for the resolution of a conditional branch, the branch predictor may predict that the outcome is “branch taken” (because it was so many times before) and start executing the corresponding instructions. If the prediction turns out to be wrong, the CPU clears all results of speculatively executed instructions. No data remains in registers or memory. However, traces of execution may remain in the microarchitecture—places other than those explicitly defined by the instruction semantics. Such places include branch predictor state, caches, and tables that accelerate memory references, such as TLBs (translation lookaside buffers), which are themselves a kind of cache. For example, if a speculative instruction in user code reads a sensitive and normally inaccessible byte from memory into a register and later uses it as an offset into a table, that table entry will be in the cache, even though the offset value is cleared from the register as soon as the CPU realizes it should not have processed that byte. The attacker can time accesses to each table entry and observe which one responds quickly (i.e., comes from cache). The offset of that entry is the byte the attacker seeks. In other words, the attacker can recover data used in speculative execution by exploiting a cache side channel via timing.
Although vulnerabilities related to speculative or out-of-order execution sound difficult to exploit, they can be quite devastating. You can find examples by searching for “Foreshadow Intel” and “Rogue In-Flight Data (RIDL)”. Mitigating such attacks requires not only hardware changes but also deep and often complex involvement of the operating system. The OS may, for example, need to flush caches and buffers that can leak information, provide guarantees that no speculation occurs between certain execution branches, or allocate processes that must be kept separate to different CPU cores, and so on.
In addition to caches, hardware-side side channels can exploit all kinds of shared resources, including TLBs, MMUs, and many other components. The MMU (memory management unit) is the system responsible for managing all of main memory, of which TLBs are a part.
Side channels do not have to be hardware-related at all. For example, memory deduplication and page caches are well-known sources of side channels in operating systems. Consider the former as an example. Deduplication means that the system cleans up duplicate virtual memory pages. When two pages have the same contents, the system adjusts both to point to the same physical page, thus storing two pages using only one physical page. Only when one of them is written must a copy be made. Writing to such a page takes longer than usual, which an attacker can observe. In this way, the attacker can infer that some other program also has the same content as the page to which the attacker just wrote. This is a side channel that reveals something about the victim’s data. Researchers have shown (2016) that even such coarse side channels can be used to infer highly fine-grained secrets. In many side channels, the root cause is insufficient isolation between software and hardware security domains (e.g., during hardware-implemented speculative execution, isolation may not exist or may be too weak). It is important to understand that domain isolation issues extend across the hardware/software interface.
Especially with respect to confidentiality, data leaks can be subtle and seemingly harmless yet still lead to serious security problems. For example, the physical or even virtual addresses of objects may not appear very sensitive at first glance, until code reuse or Rowhammer attacks are considered. In those, leaked addresses are used to steer control flow to specific locations or to flip specific bits.
(Attack origin.) The source of attacks may be local code running natively on the victim machine in user mode, an operating-system extension, a script fetched from the network and executed locally (e.g., JavaScript in a browser), malicious peripherals, or even remote systems where attackers trigger their exploits over the network. It is clear that remote attacks are harder to carry out than local ones.
In some cases, the operating system itself or the virtual machine hypervisor must also be modelled as an attacker. This may be necessary in cloud-based systems where the cloud provider is not trusted, or in cases where the operating system itself may be compromised. In such cases, the goal becomes protecting a sensitive application (or part of it) from the kernel or hypervisor. The application may then need to run in a special hardware-protected trusted environment or in isolated execution modes.
(Surface.) A useful metric for assessing system security is the attack surface. It denotes all points at which an attacker can read or write data. For example, the attack surface of locally executed native code includes all system calls the attacker can invoke, their arguments and return values, and all code implementing those system calls that the attacker can reach. The attack surface for remote attackers includes network device drivers, parts of the network stack, and all application code that handles network requests. The attack surface of malicious devices may include all memory the device can access via DMA, or the code and hardware functionality with which the device can interact. How much code is exposed to the attacker depends on code quality and varies greatly; in high-security contexts, this amount is minimized through verification, even formal methods.