A reverse perspective on forensics¶

Learning advanced forensic techniques is not the topic of this course. Mastering them requires in-depth knowledge of, among other things, operating systems, file systems, and networking technology. In this section, the issue is viewed from the reverse perspective of an ordinary user: what should they clean up on their computer?

Deleting a file¶

The loss of important files due to user actions, hardware failures, or attacks is a serious problem. It is mitigated by various physical and software-based means, especially backups. There is, however, a reverse side to the issue. Often, files are intended to be destroyed, but this is not entirely straightforward.

Deleting files does not mean that the information actually disappears. First, operating systems may move deleted files to a “holding area” from which they must be explicitly removed (for example, the recycle bin). Deletion can also be automatic, taking place after a certain period of time or when a storage threshold is reached, starting with the oldest items.

Second, even the actual deletion of a file usually does not remove its bits. Typically, file management merely changes its view of the file’s existence, removing information such as its name and address from its directory. After this, the file manager no longer recognises or finds the file and considers the storage space it once occupied to be free. Until something else is written over the data, it remains as it was, and even after that some parts may remain intact. Such data is called residual information. Although many programs can no longer access even their own files once bit errors occur, this does not mean that fragments of files cannot still yield information whose disclosure could be harmful. (Note, data that the operating system still recognises but that an application program has “forgotten” are often called orphaned or stale data, application-level artefacts or leftover temporary files.)

Residual information is not harmful in the sense that it would interfere with current documents via old versions. The file manager no longer recognises the file, and other ordinary programs, like office software, rely on its services.

Residual information is removed by overwriting. More precisely, information (not residual information) should be removed by overwriting if one wants to prevent it from becoming unmanaged residual information. Naturally, overwriting can also be applied to unused areas of storage media, in which case residual information disappears. Overwriting usually requires separate programs, many of which are available, sometimes bundled with other tools, for example in some versions of PGP. It would be desirable for overwriting to be a built-in feature of operating systems, as is the case with, for example, shred in Linux. From Microsoft’s own website, one can obtain SDelete. Since many people use their computers mainly through certain applications, it would be equally important for cleanup to be handled through those applications as well. In particular, the deletion of browsing history, cache, and cookies ought to work in this way.

High security levels impose requirements on how and how many times overwriting must be performed. From a magnetic surface that has been overwritten even several times, it has sometimes been possible to find traces of the original bits. At this level, bits are not entirely binary; the strength of the magnetic field representing a bit varies slightly, as does its precise location. Traditionally, seven overwrite passes have been considered sufficient, but in reality, especially with today’s very high-density storage, a single pass is enough: although bits may still be detectable afterwards, they are too sparse to form meaningful information.

Manual wiping of files may be necessary for the same reasons as encrypting stored files, namely

in case a computer or storage medium is stolen;
in case of malware;
in case some other lawful user gains access to the data, for example an administrator performing some otherwise necessary and permitted task;
in case of other intrusions.

However, wiping files that were stored encrypted and subsequently deleted is not necessary. The reason is that the encryption is sufficiently strong to protect the stored data. Deleted data is in any case harder to access than data that is still stored. The most serious challenge to the strength of file encryption is probably that it is usually based on a password that a human is able to remember.

It may not always be possible to wipe files that have been manually deleted, especially if there is no knowledge of where copies of the file are located. Sometimes the operating system handles this automatically, but on the other hand, advanced file (or storage media) management may create numerous temporary or partial copies, which then form residual information (versioning, wear levelling of media, defragmentation, and so on).

A single storage medium may have multiple users, possibly even simultaneously. Before a memory area is assigned to another user, the operating system must clear it by overwriting. The same applies to main memory, where clearing may take place by other means. Depending on the technology and requirements, RAM cannot simply be assumed to have been cleared. There are also precise rules governing how data media must be permanently decommissioned. It is not sufficient to let an external entity to carry out physical destruction in a prescribed way. The data owner must ensure that this actually happens. The same applies to sensitive paper documents.

Deleting information¶

The problem of deleting files is relatively simple compared to the more general goal of erasing some information altogether. Of course, this does not refer to the “residual information” between the ears of a company’s former employee. Only forgetting over time can help with that (and the matter may also become obsolete). The same applies to things held in one’s own memory, which can be harmful in many ways.

Fairly obvious places from which information must be remembered to be erased include

storage media and computers that are taken out of use or recycled for others;
backups.

Here we consider some less obvious places where remnants of information may be forgotten and could be harmful if disclosed. Web browsers are discussed separately below.

Programs that process information often store temporary files that also contain the data being processed, rather than only technical logs or, for example, a video editor’s metadata about edit points and filters. Such temporary files may remain undeleted if the program is faulty, its execution is abnormally interrupted, or the storage medium is moved out of the program’s reach.
Operating systems and versatile document-handling programs also store various kinds of other information about the data. In particular, references to recently processed files are common. If the file and even its directory path are descriptively named, references to the information may reveal something about the information itself. Such disclosures usually need to be guarded against only on one’s own machine, that is, when someone else may gain control of it.
In another way, one must be careful with auxiliary information (metadata) in document files, which may automatically store, for example, the first line of text and the author’s name or user ID. If such a file is used as the basis for a new document, the old auxiliary information may remain unchanged. Sending such a document poses an obvious security risk. A similar issue arises with photographs and videos, whose files may contain location data (cf. Exif) and even information identifying people appearing in them (sending the latter kind of material should, of course, not occur at all if the associated textual data is sensitive).
A table, chart, or similar object created in a spreadsheet program can be copied into a word processor in the original program’s format (for example, an Excel table object embedded in MS Word). At the same time, other cells than those selected (and even other worksheets) may also be copied. The editor may then unwittingly publish more than intended. A similar situation can occur with images from which something has been cropped out at the text editor, since the image may still be present in full within the file.
For documents sent elsewhere, there is yet another similar issue: change data or recovery data, which enables “undo” functionality (sometimes merely log data). Such data appears in many application programs, such as word processors or databases, and may also be intended for reviewing changes. The data is stored in some way alongside the current content in files. If only the final result is to be sent to a business partner or similar recipient, one must ensure that the version being sent contains only what is intended. For this purpose, various read-only formats (such as PostScript or PDF) or otherwise stripped-down formats (starting from ASCII text, also for example RTF, rich text format) can be used. Good application programs should provide their own tools for this.
In general, if information has been transmitted by email or otherwise between different locations, the number of places where it may reside can be unpredictably large. If information has ever been published on the Internet, attempts to erase it are rather useless. The only thing that may perhaps be done is to obscure the meaning of questionable information by publishing much more material of a similar kind …

Metadata and copies

An example of extra information in Windows: a photograph is scanned, processed with an image program better than the scanner software, and finally burned onto a CD-ROM, for example for long-term archiving. From which location on one’s own machine should traces be cleaned if even the name given to the image is not to be revealed? Automatically named temporary files may be created both by the scanner software and the image processing program, and references to recent names may remain in the File menu of both programs. A reference may also remain in the Windows Start menu. For burning a CD to make sense, it is probably done at the same time as processing several other files, meaning that the image has meanwhile been saved and named. Also more modern archiving methods than CDs may include temporary storage.

Another example: Software for secure FTP file transfer over SSH, for example implemented via WinSCP, has high usability, like many modern programs. From a home computer, one can handle files located on a remote machine (for example at the workplace) almost as if they were local. If, instead of an actual transfer, one merely wants to view the contents of a text file, the software brings it to the local machine and opens it in your prefered editor. When the user then closes the editor without saving the file, they may believe that no traces were left on their machine. In reality, the file is stored in SSH’s temporary directory, fairly deep in the directory hierarchy—and is not deleted when SSH is closed.

Question 1

If a security policy is sensible and requires that some information be destroyed by multiple overwrites, what else should be done in addition to or instead of that?

Treat all copies of the information in the same way.

Remove the data owner’s access rights.

Encrypt the backups of the information with a key that is not written to disk and that is wiped from memory after use.

Perform comprehensive database and web searches to determine whether the information already exists elsewhere, from where it cannot be erased. In that case, overwriting would also be pointless.

Question 2

Deleting information can be problematic when it is not known where all copies or other remnants are located. This problem is least likely to be caused by

distributing information on paper.

transferring information between different software and file formats.

distributing information via email using encryption and personal addresses.

redundancy aimed at preserving integrity.

Case: browser¶

Web browsers offer six interesting sources of information for the forensic investigator:

URL and search history. At present, there are no practical obstacles to maintaining a complete browsing history, that is, a log of visited websites. It is an important usability feature, and most users delete this information only rarely. Service providers such as Google and Facebook have a separate commercial interest in this data and facilitate sharing the browsing log across multiple devices. Combined with the local file cache, browsing history allows the forensic investigator to almost look over the user’s shoulder as they browse the web. In particular, analysing users’ search engine queries is one of the most commonly used techniques. A search query is encoded as part of the URL and can often provide very clear and targeted clues about what the user was trying to achieve.
Form data. The browser automatically fills in passwords and other information needed in forms, such as addresses. This can be extremely useful for the forensic investigator, especially if the user is less security-conscious and does not use a master password to encrypt this data.
Temporary files. The local file cache provides its own chronology of web browsing. It includes stored versions of pages that were retrieved and displayed to the user. These may no longer be available on the web. Although the forensic significance of caches has diminished due to the increased use of dynamic content, this is compensated for by the large growth in storage capacity. There are very few, if any, practical limits on the amount of data stored in caches.
Downloaded files are not deleted by default, which provides another valuable source of information for reconstructing user actions. Download here refers to the download function, which is different from a page’s ordinary retrieval operation. The latter is what the cache discussed above relates to. Traces of both operations can, with an unwary user, extend far back in history.
HTML5 provides web applications with a standard way to store data locally on the browser machine. This can be used, for example, to support offline functionality or to provide persistence for user inputs. Correspondingly, the forensic investigator can use the same interface to reconstruct a user’s web activities.
A cookie is a record that a web server can send to a browser as auxiliary data when fulfilling an HTTP request, and which the browser stores alongside others of the same kind. The next time, the server receives the previous information from the browser and can thus select new advertisements, tailor services to the customer’s preferences, continue filling a shopping cart, or simplify possible login. A browser does not provide any server with information about cookies from other servers, but a single page may include cookies from several servers, which enables a certain level of tracking of browsing activities. From the forensic investigator’s perspective, cookies are usually opaque as such, but information can be obtained from them via the server. Some cookies, within their time limits, provide access to online accounts. Some have a structured format and may offer additional information.

Most local data is stored in SQLite databases, which should be examined if the sought information is not found directly. In particular, records that appear to have been deleted may persist and be recoverable until the database is explicitly cleared.