Cyber-Physical Systems and Their Security Risks¶

Cyber-physical systems are integrated systems composed of computational and physical components, whose operation depends on the cooperation of these different types of components. In this material, they are abbreviated as CPS. Automatic control systems have existed for centuries (see centrifugal governor), but only in recent decades has the control of physical infrastructure shifted from analog control to embedded, computer-based systems. Examples of cyber-physical systems include the power grid, water distribution, and various chemical industry processes. Furthermore, developments such as medical implants and self-driving cars increase the role of computers in controlling physical systems. One might even ask whether a human who spends most of their time interacting with computers should count themselves as a cyber-physical system. In this module, however, we focus on mechanical systems whose control and management are automated using computers. We organic beings at least imagine ourselves to be the dominant party when interacting with computers.

While cyber-physical systems create new opportunities for interacting with the physical world, they also enable new types of attacks. Whereas many other topics covered in this course have been research subjects for decades, CPS cybersecurity is a relatively new research area. This module provides an overview of emerging CPS security.

The youth of this research area is illustrated by the fact that the term CPS itself only came into use in 2006. The term covers common research problems of embedded systems and communication systems used for automating physical systems. Cyber-physical systems are generally considered to consist of a set of data-networked actors. These actors include various sensors, actuators, control units, and communication devices (see figure).

Research related to CPS cybersecurity has aimed to understand how it differs from the cybersecurity of ordinary IT systems due to its cross-cutting nature between the physical and digital worlds. Cybersecurity in physical systems has, of course, been considered even before the introduction of the CPS term. The most notable example is the cybersecurity of supervisory software (SCADA or Supervisory Control and Data Acquisition), which was previously addressed mainly by applying best practices from traditional IT system security. CPS cybersecurity research, in contrast, has sought a multidisciplinary perspective that goes “beyond” traditional cybersecurity. For example, whereas classical intrusion detection systems monitor purely cyber events (network packets, operating system events, etc.), CPS cybersecurity incorporates devices that monitor physical-world events, tracking the evolution of a physical device’s state and comparing it to a model of expected events.

CPS relates to other current and popular terms such as Internet of Things (IoT), Industry 4.0, or Industrial Internet of Things, but it has been said that CPS as a term is more fundamental than the above, as it does not directly address implementation technologies (e.g., Internet) or application domains (Industry). Instead, it focuses on solving fundamental problems that arise when combining engineering traditions from the cyber and physical worlds.

Characteristics of Cyber-Physical Systems¶

CPSs exhibit several aspects of embedded systems, real-time systems, wired and wireless networks, and control engineering (and more broadly, control theory).

Embedded systems: One characteristic feature of CPSs is that computers integrated into the physical world (sensors, controllers, or actuators) each implement only a few functions. Therefore, they do not require computational power comparable to ordinary computers or even mobile devices, and they are given very limited resources. Some embedded systems do not even run an operating system but operate directly on firmware. Even when an embedded system runs an operating system, it is often a stripped-down version that supports only the features needed by that device.

Real-time systems: Industrial automation and other safety-critical systems often have real-time requirements: the execution time of an operation must be predictable and must not exceed a set limit. This means that traditional security methods cannot always be used; instead, protection often relies on isolating the system and communication channels. For example, encryption cannot always be used, or a reduced version is used that may be breakable. If encryption is used, key exchange is not done over the network but by physically setting keys in devices. Real-time programming languages help developers specify timing requirements, and real-time operating systems guarantee task acceptance and completion times.

Network protocols: Another characteristic feature of CPSs is that systems communicate with each other. Communication increasingly occurs over IP networks. Although many critical infrastructures, such as the power grid, have long used serial communication (sending data one bit at a time over a cable) to monitor SCADA systems remotely, the transition to IP networks began only in the late 1990s.

Wireless: Although long-distance communication usually occurs via wired connections, wireless is also common in CPSs. In the early 2000s, so-called sensor networks were an active research topic. The challenge was to build a network on top of low-power and lossy wireless links, where traditional link quality metrics often apply poorly. The first successes were seen in large process control systems with the arrival of WirelessHART, ISA100, and ZigBee. The problem was that these were developed based on the IEEE 802.15.4 standard, whose packet header was too small for IPv6 headers. Since the number of embedded systems connected to the Internet is expected to grow to billions in the coming years, manufacturers understand the need to produce IPv6-compatible embedded systems. Wireless protocols often used in consumer IoT devices include Bluetooth, Bluetooth Low Energy, ZigBee, and Z-Wave. Other wireless technologies include SigFox, LoRa, Narrowband IoT (NB-IOT), and LTE-M.

Control: Most CPSs monitor and attempt to control variables in the physical world. Most control theory literature attempts to model physical processes using differential equations, which can then be used to design controllers that meet required properties such as efficiency and stability.

These are general characteristics of CPSs, but it must be noted that CPSs are highly diverse. They include modern vehicles as well as medical and industrial devices, all of which have their own standards, requirements, communication devices, and timing constraints. Therefore, general CPS characteristics may not apply as-is to all systems.

Before moving on to cybersecurity issues, let us examine how CPSs have been protected against damage and natural accidents. At the same time, we will see how these protections against accidental harm are insufficient against active attackers who know these protections exist. We will first address security from the safety perspective and return to security later. See more generally the difference of these aspects in the introduction module.

Protections Against Natural Events and Accidents¶

Disturbances in physical infrastructure control devices can cause irreparable harm to people, the environment, and other physical infrastructure. The following sections present five types of protections against damage and natural events.

Safety: The general safety standard for control devices (IEC 61508) recommends a basic principle of deriving requirements from threat and risk analyses that identify disturbances and assess their likelihood and consequences. The system can then be designed so that safety requirements are met when all disturbance-causing factors are considered. This general standard has been the basis for several other industry-specific standards; for example, the process industry (refineries, chemical systems, etc.) uses IEC 61511 when designing so-called Safety Instrumented Systems (SIS). Their purpose is to prevent harm, for example, by closing a fuel valve when a high-pressure sensor raises an alarm. A more general safety analysis can use the defence-in-depth principle familiar from cybersecurity to produce protection layers to mitigate threats: (1) ordinary low-priority alarms sent to the control station, (2) activation of SIS systems, (3) mitigation measures such as physical protection systems (e.g., dams), and (4) organizational response protocols for reacting to/evacuating in plant emergencies. See figure. At its base, “Regulations” refer to environmental and occupational safety issues through laws and standards, and their impact naturally extends beyond that level.

Protection: Protecting power grids relates to safety features and includes the following components:

Generator protection: When the system frequency is too high or too low, the generator is automatically disconnected from the grid to prevent permanent damage.
Under Frequency Load Shedding (UFLS): If the grid frequency is too low, controlled load shedding is activated. This disconnection of parts of the power distribution occurs in a controlled manner to avoid outages in safety-critical locations such as hospitals. UFLS is activated to raise the grid frequency and prevent generator disconnection.
Overcurrent protection: If the current in a cable is too high, a protection relay activates to stop the flow. Connected devices are then not damaged.
Over/under-voltage protection: Same as above, but for voltage.

Reliability: Whereas safety and protection systems aim to prevent harm, other approaches try to maintain operation after disturbances occur. For example, the power grid is designed to meet the so-called N−1 security criterion. This means that the grid can lose one of its N components (such as a generator or transmission line) and continue operating.

Fault Tolerance: Continuing operation despite disturbances is fault tolerance. Alongside the electromechanical example mentioned above is a data-driven approach known as Fault Detection, Isolation, and Reconfiguration (FDIR). Fault detection occurs either model-based (cf. IDS systems) or purely data-driven. This part of the process is called Bad Data Detection. Isolation identifies which device is the source of the anomaly, and Reconfiguration is recovery from the fault. It can be implemented by shutting down the faulty function, replacing it with an equivalent function, or switching to a fault-tolerant mode where the system operates with restricted functionality. Success requires sufficient redundancy built into the system. This is comparable to RAID disk systems. A mere backup system would not meet fault tolerance requirements because recovery occurs only after interruption.

Robust Control: Uncertainties related to the use of the control system require robust control. Sources of uncertainty can include the environment (e.g., wind gusts for aerial devices), sensor noise, external factors ignored in system modeling, or hardware degradation over time. Robust control systems usually assume the most unfavorable operating conditions and design robust control algorithms so that the system operates safely even in the worst possible situation.

Safety mechanisms are not sufficient to provide security. Before CPS information security became established in the early 2000s, there was still uncertainty about the extent to which the safety measures discussed above were sufficient to protect CPSs from cyberattacks. The problem is that these measures usually assume disturbances are independent and non-hostile. Such incorrect modeling is the easiest way for an attacker to find vulnerabilities. Today, there are already several examples of why safety does not guarantee security. For example, fault detection systems have been bypassed by attackers sending false data that is within plausible limits but still erroneous enough to cause problems in the system. Another example is stealth attacks on dynamic systems (i.e., such where “time” is an important component), in which small amounts of false data are injected into sensors so that the fault detection system does not notice them, but over time these attacks can lead to dangerous operating configurations.

Security can harm safety: Adding new security features can increase risks. For example, a nuclear power plant in the USA shut down automatically (in 2008) because the control system detected a drop in cooling water level. However, this was false data caused by a business system computer reboot during a software update. After the incident, inappropriate connections between systems were physically removed. A general observation is that software updates or patches may violate safety standards, but the absence of connections is not always beneficial for safety features. Preventing unauthorized users from accessing a cyber-physical system securely may, for example, prevent emergency personnel from accessing a medical device in a critical situation. Such safety issues should be considered when designing security solutions.

Security and Privacy¶

CPSs are at the core of healthcare, energy systems, weapons systems, and transportation. Industrial Control Systems, in particular, perform vital functions in critical national infrastructures such as electricity distribution, oil and gas distribution, water supply and purification, etc. Disruptions in these CPSs can cause significant harm to public health and safety, as well as considerable economic damage.

For example, attacks against the power grid can cause outages, which in turn lead to problems in other critical areas such as computer networks or medical systems, potentially resulting in catastrophic economic or health consequences for society. Attacks targeting vehicles could cause severe traffic accidents.

Attacks Against Cyber-Physical Systems¶

In general, a CPS has a physical process to control and uses a set of sensors that report the process state to a controller, which then sends control signals to actuators (e.g., valves) to keep the system in the desired state. The controller often communicates with a supervisory program (e.g., a SCADA system in the power grid), which monitors the system and can change the controller’s settings. See figure.

Attacks against CPSs can occur at any point in the general architecture. This is illustrated in the figure, which shows the following eight attack points:

The attacker has compromised a sensor (e.g., if sensor data is unauthenticated or if the attacker has the encryption key) and sends forged sensor signals, causing the system’s control logic to operate based on the attacker’s data.
The attacker has compromised the communication channel between the sensor and the controller and can delay or even completely block information flow from sensors to the controller. In this case, the controller loses visibility of the system (loss of view) and operates on stale data. One such attack is a denial-of-service attack on sensors.
The attacker has compromised the controller and sends false instructions to the system.
The attacker can delay or block control commands. This is a denial-of-service attack on actuators.
The attacker has compromised the actuators and can perform control actions different from what the controller intended. Note that this attack differs from those directly targeting the controller.
The attacker can physically attack the system, e.g., destroy part of the infrastructure and combine this with a cyberattack.
The attacker can delay or block communication to/from the supervisory device.
The attacker can compromise or impersonate the supervisory device and thus send malicious configuration changes to the controller. An example was the attack on Ukraine’s power grid, where attackers gained control of a computer in the SCADA system’s control room (in 2016).

Generally, most attacks on CPSs have been software-based. However, one characteristic of CPSs is that their integrity can also be compromised without a computer-based attack, through so-called transduction attacks. These involve physically injecting false signals (see #1 above). When the attack targets the way sensors capture physical-world data, the attacker can inject false readings into the sensor or cause incorrect actuator actions by manipulating their environment. For example, an attacker could use a drone’s speakers to affect its gyroscope, exploit unintended receiving antennas in wires between sensors and controllers, use electromagnetic interference to affect an actuator, or inject inaudible voice commands into digital assistants.

& 4. & 7. Naturally, if the attacker could deliberately modify messages in these channels, they could cause more than the denial-of-service attacks mentioned above.

Note, there is a discussion on Sensor spoofing in another module.

In addition to security and safety risks, CPSs also have profound implications for privacy in ways that system designers may not have considered. Warren and Brandeis noted in their 1890 essay The Right to Privacy that they saw a growing threat to privacy in new inventions such as short-exposure photography, which enabled photographing people without their knowledge. CPS technologies, especially the growing field of consumer IoT devices, pose similar privacy challenges.

CPS devices can collect data on a wide range of human activities, such as electricity consumption, location information, driving habits, and bio-sensor data with unprecedented precision. Moreover, the passive way these devices collect information generally leaves people unaware of how much data is being gathered about them. In addition, the public is largely unaware of how such data collection exposes them to potential surveillance or criminal activity when data collected by corporations can be obtained through various legal and illegal means. For example, car manufacturers collect data remotely from their vehicles to improve product functionality and reliability. It is known that the collected data includes speed, odometer reading, interior temperature, and battery status. This creates a highly detailed picture of driving habits that manufacturers, dealers, marketers, insurers, law enforcement, and stalkers could exploit.

On the other hand, many CPS systems do not contain data related to humans and are not physically connected to any individual. In such cases, there is no direct privacy issue. However, indirect effects or combined problems may still exist, for example, through location, if a device carried by a person records a visit within the signal range of such a CPS device.

High-Profile Attacks Against Cyber-Physical Systems¶

Control systems have been at the core of critical infrastructures and industry for decades. Yet only a few confirmed cyberattacks have occurred. Here are some examples.

Non-targeted attacks are similar to those affecting ordinary computers. An example is the Slammer worm, which randomly targeted Windows servers and in 2003 infected the Davis-Besse nuclear power plant, affecting workers’ ability to monitor the plant’s systems. Another example was the use of a water treatment plant’s control device to send spam (2006). Water was not the only thing contaminated at that plant.

In targeted attacks, the adversary knows they are attacking a CPS and can tailor their strategy to exploit a specific CPS feature.

The first publicly reported attack on a SCADA system occurred in 2000 in the sewage control system of Maroochy Shire (Queensland, Australia). A subcontractor employee who wanted a permanent job as a system maintainer used commercially available radios and stolen SCADA software to make his laptop appear as a pumping station. Over three months, the subcontractor’s actions caused a thousand cubic meters of untreated sewage to spill into parks, rivers, and hotel grounds. Public health was endangered, and marine life died. The incident was costly for the city in repairs, monitoring, cleanup, and extra security. The subcontractor’s costs were even higher, and instead of a job, the perpetrator got a two-year place in prison.

In the two decades since that example, several attacks on CPSs have occurred. The best example of how sophisticated these attacks can be remains the Stuxnet worm, discovered in 2010. It targeted the uranium enrichment program in Natanz, Iran (see another module for who may have done it and how it may have entered). Stuxnet intercepted block read, write, and locate requests on a Programmable Logic Controller (PLC). By intercepting requests, Stuxnet could alter data going to and from the PLC without the operator’s knowledge. A second variant of Stuxnet sent false rotation speeds to centrifuge motors used for uranium enrichment, causing regular centrifuge failures and reducing Natanz’s production.

Other examples of highly sophisticated malware likely developed by state actors and targeting cyber-physical systems include Triton (2017) and Industroyer (likely behind the aforementioned Ukraine attack). Such attacks will profoundly impact the evolution of cyber conflicts in the future and possibly warfare itself.

State involvement is also behind Havex (2013), used for CPS espionage, and the DoS attacker BlackEnergy (2010). The year 2022 brought plenty more disruptions from the same direction.