From operating systems to databases (advanced)¶

Through the managing and mediating roles of the operating system and the hypervisor, it is easy to see that similar security problems and solutions arise in other domains as well. The most central of these are databases, where security follows principles very similar to those of operating systems.

The nature of databases (advanced)¶

Many entities encountered in security contexts are somewhat “scattered”: files, programs, messages, transactions, or even entire organizations. A database, at least from a logical perspective, is a fairly unified and well‑delimited entity. It is managed by a database management system (DBMS), through which users interact with the data, regardless of how the data itself is organized or distributed physically. Thus, we can largely focus on the properties of a secure management system (and on how it should be used).

Database security features also apply partly to data collections not usually thought of as databases. Examples include user account lists, access control files, network management information bases (MIBs), name service data, router configurations, and so on. Some of these belong to directory services, which—as their name suggests—are databases that are mostly queried and only rarely updated (or updated by only a few actors).

The general security goals of databases include:

Physical integrity: concerns hardware and recovery from certain types of damage; requires regular backups as part of operating-system procedures.
Logical integrity: preservation of structural consistency, especially references and dependencies between data.
Integrity and correctness of data elements, e.g. that a date is both well‑formed and correct.
Accountability of actors: the ability to trace who accessed or modified objects.
Access control (as generally): certain persons may perform certain operations on certain data.
User authentication, possibly in addition to that performed by the operating system.
Availability: access to the database in accordance with access control; at a basic level this requires that the database is running and reachable.

Database security mechanisms typically need to be more fine‑grained than those provided by the operating system, since OS‑level controls cannot distinguish data within files. In a database, individual records—or even individual fields—may need to be treated separately. Conceptually, however, the problem is similar to that of file protection: regardless of how the protected object is delimited, different types of rights for different actors must be defined and enforced using access control mechanisms, for example through SQL (Structured Query Language) queries.

Queries themselves constitute one of the most common threats when server interfaces (such as web forms) fail to sanitize inputs. Instead of data, an attacker may inject SQL expressions, leading to SQL injection, which can grant unrestricted access to the database or even facilitate compromise of the host or network in which the database resides (or simply destruction; sqlmap automates injection discovery; see also a later module). In this respect, database access control is significantly more complex than file access control. Moreover, databases and their server software (and vulnerabilities) often appear as embedded components of applications (e.g. photo archives) without installers fully realizing it—much like applications tend to create their own directories and files. Databases may also serve internal program purposes, but very often they contain valuable user data.

In addition to hardware failures, ordinary crashes, and other physical factors, database writes—or their absence—can cause the following integrity problems:

accidental incorrect input: collected, calculated, or simply mistyped;
unintentional or intentional errors by unauthorized actors;
conflicting or duplicate representations of the same data (e.g. when merging data from multiple sources);
concurrent modifications from different parties (e.g. updates to a quantity value whose initial value was read by both before modification).

Ensuring database integrity requires DBMS mechanisms, discussed below. Confidentiality is such an obvious requirement that it was not explicitly listed above; achieving it was assumed to be part of access control and authentication. However, this alone is not sufficient, since confidentiality problems can be quite subtle. We therefore begin by examining them.

Confidentiality (advanced)¶

Database confidentiality has three perspectives:

From above: how to allow access to the database only to authorized users. This is the simplest problem and is solved through standard access control, which may distinguish internal components such as tables. Injection attacks must also be prevented at this level.
Internal, discriminating: how to provide certain data (especially aggregates) to a broader set of users than other data.
From below: how to ensure that data does not leak at the file-system or operating-system level.

Internal, discriminating perspective¶

Read access creates confidentiality problems especially in databases where some data is public and some must be protected. If all data were of the same sensitivity, uniform protection would suffice and issues would arise mainly at the OS level.

Sensitive data may appear most simply as confidential records or individual fields. Partial disclosure may still be unacceptable. Information leakage can occur via:

value ranges or probability distributions;
existence or non‑existence (e.g. criminal records, presence of a field, or even that a specific user does not exist on a system);
complementary information (e.g. that an attribute does not have a certain value).

This leads into inference problems: attackers may derive sensitive information by combining multiple less‑sensitive pieces of data. Aggregates such as counts and averages are often classified as less sensitive, as are views listing only one field from all records.

Combining data across different databases is conceptually similar but significantly harder, and is not discussed further here. It is nevertheless crucial in personal-data systems and helps explain the strict GDPR rules. Inference through long‑term data collection also underlies concerns about web tracking and cookies.

How can query results of varying sensitivity be treated differently? Beyond standard access control and authentication, various inference‑control techniques can be applied (some also discussed in the context of personal data obfuscation):

Small-set rule: reject queries whose results are based on too few records (e.g. fewer than six in some health databases). A refined version, the (n,k)-rule, rejects queries where at most n records contribute at least k % of the result (typical values: n = 3, k = 70). Such rules are common in statistical reporting.
Query overlap control: reject queries whose result sets largely (but not exactly) overlap with previous ones, to prevent subtraction attacks.
Random sampling of a large database to generate results.
Reporting intervals instead of exact values, e.g. age groups in five‑year bins.
Random perturbation, slightly distorting results while preserving overall trends; ideally the perturbation is consistent across queries to prevent averaging away the noise.
Query history analysis: the DBMS tracks a user’s prior queries (even across sessions) to determine whether answering new ones would enable unauthorized inference. This is very difficult to implement in practice.

For example, the Finnish surname search displays only total counts, not gender distributions, when occurrences are few. Even totals may be obfuscated; for instance, the surnames Bobrikov and Putin both existed in Finland in 2026, but were reported as “under 10”.

Perspective from below the database¶

Even if database‑level access control restricts who can access which database components, it does not prevent OS‑level access to underlying storage. Therefore, many DBMSs support encryption of sensitive table data (e.g. column values) on disk. Keys may be stored in external modules; in extreme cases, data is fully encrypted and only users possess decryption keys.

Querying encrypted data is difficult. Advanced cryptographic techniques such as homomorphic encryption exist but are expensive. Typically only simple methods are used. Exact‑match queries work directly by comparing encrypted values. Range queries, however, require decrypting all values for comparison. Querying encrypted databases remains an active research area.

Outsourcing data management to external service providers adds further complexity. Beyond confidentiality, integrity and freshness of query results become concerns: can one trust the provider to return correct, complete, and up‑to‑date results? Integrity and authenticity can be enhanced with signatures or Merkle hash trees. However, if signatures are required due to low trust, confidentiality is already lost, since the provider sees both queries and results.

Integrity (advanced)¶

The following lists several mechanisms by which the preservation of database integrity can be promoted. It is worth noting that most of these are intended to prevent accidental integrity violations.

Integrity checks during data entry: either
- based on constraints defined per field, e.g. the value must be numeric within a certain range, a date, or satisfy some lexical condition; or
- based on the overall structure. In the simplest case, a field defined as a key must have a distinct value in each record, or for certain job titles salaries must fall within given limits. In addition to such single‑state constraints, state transitions may also be restricted: data modifications may require certain preconditions and/or trigger other changes (e.g. dropping below a reorder threshold creates an order that must be recorded in the database).
In these cases, the issue is both how database integrity is defined (that is, a kind of security‑policy question) and how the management system is able to enforce all relevant constraints.
Two‑phase update: intent and commitment. Interpreted simply, this is a very common integrity mechanism.
In databases, this typically involves grouping a multi‑step operation into a transaction, where interruption or interference by others could break integrity. In the intent phase, provided no other actor has reserved the objects for their own intent,
- necessary preparations are made: opening files, allocating variable state, ...
- all required data is collected and computations are performed; intent bits are set on the database objects to be modified, but no actual changes are made yet.
- if intent bits are successfully written to all required locations, the process proceeds to the commitment phase by upgrading them into lock bits.
In the commitment phase, new values are written, log entries are made, and locks are released. If an interruption occurs during the intent phase, database integrity is preserved and the phase can be repeated. If the commitment phase is interrupted, problems may arise, but they can be repaired by redoing all writes.
The intent phase (also called the request phase) describes actions performed by the database management system, not, for example, deliberation by a travel planner at the user interface. At the point where a seat reservation is actually intended, the DBMS can, in the strictest case, use intent bits to prevent others from even reading the same data; if the data allows the reservation (i.e. the seat is indeed still available), the intent phase may proceed. A weaker approach is that intent bits merely prevent others from forming conflicting intent. An even weaker solution is optimistic updating, which is rolled back if another party has modified the record since it was read.
Locks can, of course, be implemented at different levels: the entire database, a page (the operating system’s storage unit), a table, a record, or a field. As one moves along this list, load increases but flexibility improves.
Change logging, a kind of “online backup”, enabling an undo operation—rolling back changes to a greater or lesser extent into history. The mechanism is familiar from many text editors, but can also be applied more broadly so that rollback remains possible even after major failures—perhaps using backups combined with paper logs.
Other redundancy:
- The original version of a database table or file being modified may be kept as a disk copy until the commit is completed or the operation finishes fully, at which point the new version replaces the old one. This may be called a “careful replacement” method.
- During operation, periodic checkpoints may be written to a parallel file, which is either discarded or retained at the end of the session. Such redundancy may be called shadow data. Near‑real‑time mirroring to persistent storage is one option, but without delay integrity can be recovered only from hardware failures or similar interruptions. Delay is achieved using logs, whose processed form is later applied to the shadow database, for example once per day.
- There may be multiple working files under update. Instead of a single shadow, these hold the most recent versions, and only the latest is occasionally merged into the main database. This is a “differential” method in the sense that it functions like a differential backup aware of database structure. Such arrangements can themselves be layered on top of change logging.
Write‑permission restrictions may need to be more fine‑grained than for ordinary files: typical fixed operations include inserting or deleting records, modifying existing data, or merely appending new data. Structural changes—such as adding or removing fields—are usually not permitted for ordinary users, nor are modifications to access permissions.

Database‑focused courses (e.g. MIT OpenCourseWare lecture notes on database normalization) also cover proper relational design principles that support integrity. Normalization reduces redundancy at a different level than redundancy introduced for recovery. Secure programming also includes sanitizing data entered into databases.

SQL (advanced)¶

Database interfaces often operate at a very high level of abstraction, for example as web forms. Behind them usually lies the relational database query language SQL, Structured Query Language. Despite its name, it is not merely a query language, but includes all operations needed to create and manage a database.

The SQL security model contains the standard access control components:

actors: they are identified by user identities based on operating‑system authentication, but the database may also provide its own authentication.
operations: SELECT, UPDATE, DELETE, INSERT, and other more specialized operations.
objects: tables and views, as well as their columns, i.e. fields (or attributes). There are also other types of objects.

When a user creates an object, they become its owner, and initially only they have access to it. Access (privileges) must be granted to other users as needed, and privileges may include delegation rights. Users can be grouped, and privileges can be assigned to groups.

Granting and revoking privileges is performed using the GRANT and REVOKE operations as follows:

GRANT SELECT, UPDATE (column3), INSERT
   ON TABLE tabula
   TO aino, bertil
   WITH GRANT OPTION

If any of the three privileges granted here is revoked, the system must also revoke the privileges from those to whom user aino or bertil have delegated them (for this purpose, the REVOKE operation has the CASCADE modifier, which is the default). Privileges granted on different occasions are cumulative. The name PUBLIC refers to all users, including those added later. User privileges are stored in the user table of the mysql database.

Views are derivatives of existing tables (relations). They are created as follows:

CREATE VIEW scenery AS
   SELECT column2, column3 FROM tabula
   WHERE column1 > 0
   WITH CHECK OPTION

Privileges could now be granted on this view just as above, but INSERT privileges would not make sense, since the user could not supply a value for column 1. If the last line—the CHECK requirement—is omitted, a user with INSERT privileges on scenery would indeed be able to create a row in tabula in which column1 = 0. This would constitute a blind write, in the sense that the user would no longer be able to see that row themselves.

In this example, column1 can function as a kind of security label. Rows of tabula for which its value is zero (or < 0) are not shown to ordinary users.

Views are therefore convenient from an access‑control perspective, as they enable versatile regulation. Of course, they also make it possible to construct complex configurations that may introduce security holes or degrade performance.

Information and implementations of SQL are readily available, including open‑source software such as MySQL, MariaDB, and PostgreSQL.

The first two are “siblings” (My is also a name). Here is the GRANT syntax of MySQL. A large part of database‑system security falls under the responsibility of such implementations. Internal database access control and, for example, checks defined for field values can implement only relatively narrow—though important—application‑specific security requirements.

At this point, it is not necessary to learn this in detail, but if the topic is already somewhat familiar, it is good to recall that database‑management system‑level challenges include the ACID properties: Atomicity, Consistency, Isolation, Durability. These are the same as the requirements for secure operating systems. Isolation has been the most prominent theme in this material.