Overview of digital and other traces (advanced)¶

The purpose of this section is to illustrate, by means of a simple overview, how diverse the field of forensics can be. Information can, first of all, be obtained from roughly three directions: conclusions are drawn from traces that

have merely happened to remain from some event;
have been anticipated, and information of that kind may have been collected. The traces are otherwise still similar in nature to type 1;
have been deliberately designed to remain from events.

This is distinct from the division of our earlier definition of explicit and implicit traces, although the former kinds are more common within types 2 and 3 and the latter in type 1. Examples of traces discussed elsewhere in this material include:

Biometric traces: usually type 1, but they may have been recorded from subjects in advance.
Log data and the detection of intrusions and the like: type 2. See earlier module.
Watermarking and other means for identifying and tracing copies: type 2.
Tamper resistance of devices: type 3. The detection of forged documents, especially banknotes and similar items, is to some extent of this kind.

The objectives fall into two branches: the aim is either

to reconstruct the trace originator itself, that is, the traces are fragments of destroyed information:
- a wiped magnetic storage medium;
- shredded paper or other records: scanning and processing with a computer makes this considerably easier than assembling physical jigsaw puzzles. The same techniques can also be applied to archaeological sources (artefacts and their data contents).
to figure out the background of the traces:
- what or whose activity resulted in the traces;
- where or when the traces were created;
- in particular, whether the creation of the traces involves something criminal, such as forgery.

In general, this is an inverse problem, that is, a kind of reversing of a function. From this perspective, cryptanalysis and steganalysis can also be seen as part of this field.

Below are a few examples of traces of type 1 and 3.

Type 1, that is, traces that appear random

By comparing texts, one can attempt to determine
- whether a program or text has been plagiarised;
- whether a departing programmer violated copyright or merely relied on their own memory;
- who wrote a computer virus;
- whether Bacon wrote Shakespeare’s works.
Comparisons can be performed statistically, for example on the basis of the rarest words occurring in texts.

Linguistic nuances, turns of phrase, and even rounding related to unit conversions can provide indications of a text’s origin or path. Stylometry was already mentioned in the beginning of this module, in the context of program code.

The phenomenon known as Benford’s law can be helpful in detecting forgeries. It describes how numerical data originating from natural sources are distributed when expressed in different number systems. An observation of the phenomenon was published as early as 1881, and in 1938 Benford gave it a mathematical form, according to which the proportion \(log_{10}(1+1/d)\) of decimal numbers begins with the digit \(d (d=1..9)\), in particular about 30 % begin with 1. This “law” is relevant, for example, in auditing: if a fraudster invents new numbers for accounts in a completely random manner, they do not follow Benford’s law, and the overall pattern attracts the attention of auditors.

Type 3, producing distinctive traces

Fingerprint-like features can be created in various contexts:

Tracing the content of confidential documents can be facilitated by introducing textual variations into different copies that do not alter the meaning.
For physical tracing of documents, small mechanical modifications can be made
- to photocopiers and printers;
- to paper (fibres or physical watermarks as in the past);
- even to paper shredders.

Similar traces can be used for tracing even without deliberate modification, since all of these exhibit natural variation if examined closely enough.

For the investigation of physical crimes, one can (or could) mark — weapons — bullets — even lead, so that even self-cast bullets could be traced back to the point of purchase of the lead.

A similar phenomenon occurs in environmental protection when “emission-prone” substances are equipped with chemical fingerprints.

If the above seems old-fashioned, it is worth considering what similar mechanisms occur (or might perhaps be implemented) in the image sensors of digital cameras and smartphones. This is not new either. A fairly accessible article from 2021 refers to studies from 2005. Even earlier, faulty pixels had already been studied.