Definitions and basic information¶

Bits found in a suitable environment may suffice as evidence that certain processing has taken place or that particular information has been transmitted and/or stored. Such bits are referred to as forensic traces or digital forensic traces. Events may be the result of interaction between a human and a computer, such as launching an application, or they may result from autonomous operation of an information system, for example scheduled backups. The same event may leave traces in several locations on the same device or on multiple devices, not just locally. Explicit traces directly record the occurrence of certain types of events as part of the system’s normal operation; the most visible of these are various timestamped system and application event logs. Implicit traces are diverse and do not themselves “say” what they mean, but instead serve as starting points for inference. For example, if a sufficiently large fragment of a known file is found on a storage device, it can be inferred that the entire file was probably once stored but was subsequently deleted and partially overwritten. If ordinary log files are found to be missing, one may infer that a data breach has likely occurred, during which the perpetrators wiped system logs to cover their tracks.

The investigation of computer-related crimes is sometimes called forensics, as the word forensic refers to legal argumentation (originally, in Latin, on the forum or marketplace), and the term digital forensics is used for various tracing activities that aim to determine what a suspect has done on their own or someone else’s computer. The crime under investigation may, of course, involve matters entirely unrelated to information technology. In this material, for the sake of simplicity, the term forensics is used to mean digital forensic investigation. This includes the collection, preservation, and analysis of evidence. Traditionally, criminal investigation involves the systematic analysis of physical samples to establish causal relations between events and to resolve questions of origin and authenticity. It is important to note that persistent traces no longer naturally arise in the same way from the processing and transmission of digital information. Of course, various traces suitable for criminal investigation can still be found on digital devices, and the word digital itself refers to fingers—but let us leave touchscreen smudges aside for now.

When forensic traces are found in information technology interactions, they are often features designed into information systems without forensic intent. Although log data are collected to reconstruct events retrospectively, the collection mechanisms are not necessarily designed with court proceedings in mind. This fact can have a significant impact when assessing the origin and authenticity of digital evidence. As technological development has made memory technology and data communication cheaper, applications, operating systems, and protocols alike have begun to produce ever more ancillary data such as logs. This has meant that the focus of forensics has been able to shift towards explicit traces. At the same time, operating systems have become so diverse and complex that an ordinary user—criminals included—generally lacks the skills to clean up all traces. Someone who breaks into another machine over a network may, of course, have networking expertise and often good exploitation tools (such as MetaSploit), with which forensic traces can be minimised.

Investigating data on a computer can be hampered by encryption or concealment mechanisms—in principle the same means by which a lawful user protects their data against theft and intrusion. While lawful protections weaken the effectiveness of forensics, forensic methods are also useful for illegal purposes. Examining a running machine is usually easier than examining a powered-off one, especially if the investigator or attacker is logged in. In that case, memory can be examined, whose contents disappear with the power. One can find open files, network connections, recently used (not yet overwritten) passwords for network destinations, and, of particular interest, keys used for file encryption. Such information can also be accessed shortly after shutdown if the machine has first been cooled to a very low temperature. After shutdown, the machine is rebooted from a live CD that also contains software suitable for copying memory. This is the so-called cold boot attack (Halderman et al. 2008).

Even if examining a device’s memory is successful, it may be problematic from a forensic perspective in terms of maintaining the integrity of the chain of custody. Examination should not alter the target, but this is difficult with a running machine. For this reason, the mass storage of the machine under investigation is usually removed and copied on another machine.

Even when suspicions are well founded, criminal investigation may reveal private matters unrelated to the case. This is, of course, also true in other criminal investigations, but in the case of a general-purpose computer, different kinds of information may be closer to one another than usual. If something illegal is found, the data must naturally be destroyed, and then it is difficult to preserve any data at all and return storage media to the offender. Seemingly innocuous other data may contain a steganographic (”hidden in bits”) copy of illegal material.

Forensics is both science and technology, but the use of its results in legal proceedings may become difficult if novel methods are employed that the legal process does not sufficiently understand. The nature of admissible evidence changes over time, just as the concepts of computer crime become more precise. For example, in the United States the so-called Frye standard (1920s) concerning forensics has been replaced by the so-called Daubert standard (1990s), which concerns the scientific basis of evidence: in particular, forensic methods must have theoretical foundations and must produce testable predictions by means of which the theory can also be falsified. In addition, legal proceedings may pose challenges similar to those faced by the polygraph: it may be considered not only technically unreliable as evidence but also unreasonable, given that a witness has the right to remain silent. Naturally, different legal systems may also take different views on what forensics is allowed to investigate and what qualifies as evidence. In Finland, one can begin by consulting the Coercive Measures Act, which defines, among other things, search of data contained in a device (Finnish term simply ’laite-etsintä’), technical surveillance of devices, telecommunications interception, audio monitoring, technical observation, and technical tracking. It should be noted that forensic techniques also apply in situations where no crime is suspected.

Café forensics

Teemu and Teija jointly own a café where they offer customers wireless internet access. They notice that some suspiciously behaving customers have visited the café, and that something unusual has occurred on the wireless network at the same times. They hire a cybersecurity team to investigate the case.

The cybersecurity team begins by examining the logs of the café’s network devices and indeed finds suspicious events in the café network: the network equipment has clearly been scanned, as traffic has been directed at ports that an ordinary user would never communicate with. This is an explicit trace. The logs also reveal that the attacker has discovered the internal network address of the café’s server computer and has directed dubious but encrypted traffic at it.

No explicit traces are found directly on the server computer, and the logs appear to be intact. The cybersecurity team suspects that the attacker has cleaned up traces of the attack, and so they examine the contents of the hard drive using forensic software. They manage to recover a deleted file, which turns out to be part of a breaking-in tool. In addition, they recover an earlier version of the firewall log file, which reveals that firewall rules have been tampered with and that the log file has been replaced with a forgery to mislead forensics. These are implicit traces. The log file shows that the attacker exploited a vulnerability in the server computer’s remote management protocol to download the intrusion tool onto the machine. The attack method is thus becoming clear, but some forensic trace of its possible scope still needs to be found.

From the network logs it is observed that soon after the attack the server machine sent a substantial amount of data over the TOR network. The volume of data is the vast majority of the total size of the data stored on the hard drive. It appears that the attacker has compressed almost the entire contents of the hard drive and transferred it over the network to their own device. It is therefore reasonable to suspect that the personal data of the café’s subscription customers has leaked into the wrong hands. In addition, the attacker now has possession of Teija’s award-winning cheesecake recipe as well as other trade secrets.

The recovered part of the intrusion program gives the cybersecurity team clues about the possible perpetrator. Using stylometry they become fairly convinced that the attack was carried out by an international megacorporation that has previously been caught spying.

Teemu and Teija have 72 hours to notify the Office of the Data Protection Ombudsman of the personal data breach, from the moment they become aware that personal data has been leaked.