Data vs control (struct and enum types)

This section will discuss data driven programming. Before that, you will learn two C++ data types you need in order to understand the explanation.

Record type (struct in C++)

Many programming languages allow the programmer to define their own data types which can be used to group different types of data elements. In a way, it is like a primitive version of class types: There is no function interface, but all the fields can be accessed directly. A data type like this is often called a record or, in C++, struct.

In C++, a record is defined using the reserved word struct:

struct Product {
    string product_name;
    double price;
};

After the definition above, the new type Product can be used pretty much the same way as the other data types:

Product item = {"soap", 1.23};

You can access the partial information of a Product type variable with the . operator:

item.price = 0.9 * item.price;  // Discount 10 %
cout << item.product_name << ": " << item.price << endl;
item = { "orange", 0.45 };

You can use the struct as a value that you store in the STL container, as a parameter of a function (a value or a reference parameter), and as a return value.

We can describe a person in a struct:

struct Person {
    string forename;
    string surname;
    string address;
    string email;
};

All fields of the above struct are of the same type. Therefore, the above data could have been stored in four elements’ vector. However, here struct is a better solution (and the correct one), since now we can use named fields instead of numeric indices. In a vector, you should not store data items, the meanings of which differ, although they have the same type.

Enumeration type

Sometimes you need a variable that can only have certain values. For example, let us say that the status of a book in the library software can only be either on the shelf, borrowed, reserved, or lost. Naturally, it is possible to use the types string or int, but they would allow many other values besides the aforementioned ones. That is why they are not the best options for storing the status of a book.

There is a data type called enumeration type, for which all the possible values (elements) are defined by the programmer.

In C++, the enumeration type is defined by the reserved word enum:

enum Type_name { element_1, ... , element_n };

Above, we defined a new type called Type_name, and it contains the elements element_1, …, and element_n. In a more concrete way:

enum Book_status { ON_THE_SELF, BORROWED, RESERVED, LOST };

The elements of the enumeration type are integer constants, and the C++ compiler defines their values automatically: element_1 = 0, element_2 = 1, etc. If the programmer wishes, they can define the values of the elements:

enum Type_name { element_1 = value_1, ... , element_n = value_n };

Sometimes it is practical that the enumeration type is able to express its own size, in other words, the amount of its values:

enum Month { JANUARY, FEBRUARY, MARCH, APRIL, MAY, JUNE, JULY,
             AUGUST, SEPTEMBER, OCTOBER, NOVEMBER, DECEMBER, MONTHS_IN_YEAR
};

Data driven programming

A computer program consists of data and the various commands targeting it. In reality, the matter is not divided that clearly, because in some situations, data can replace some of the commands and vice versa. The so-called data driven or data directed designing and programming utilize that characteristic.

The idea of data driven programming is to design and implement data structures within a program that reduce the amount of data processing commands, that is, to replace certain commands by data. You can see this more clearly from the example using the beforementioned types struct and enum. (The complete program code of the example is located in Git: examples/04/datadrivenprogramming).

Let us first think about a snippet of code in which we will not use the philosophy of data driven programming:

#include <string>

enum PostalAbbreviation {AL, AK, AZ, AR, CA, CO, ERROR_CODE};  // Excluded the rest 44 elements

// Version 1: First idea
PostalAbbreviation name_to_abbreviation(const std::string& name)
{
    if(name == "Alabama"){
        return AL;
    } else if (name == "Alaska"){
        return AK;
    } else if (name == "Arizona"){
        return AZ;
    } else if (name == "Arkansas"){
        return AR;
    } else if (name == "California"){
        return CA;
    } else if (name == "Colorado"){
        return CO;
    } else {   // Excluded 44 "else if" blocks
        return ERROR_CODE;
    }
}

The if structure above would be a lot longer in reality if the rest of the code with the remaining 44 states was written out. If the same piece of code will be written according to the principles of data driven programming, we could get rid of the long if sentence like this:

#include <string>
#include <vector>

enum PostalAbbreviation {AL, AK, AZ, AR, CA, CO, ERROR_CODE};  // Excluded the rest 44 elements

struct StateInfo {
    std::string name;
    PostalAbbreviation abbreviation;
};

const std::vector<StateInfo> STATES = {
    { "Alabama", AL },
    { "Alaska", AK },
    { "Arizona", AZ },
    { "Arkansas", AR },
    { "California", CA },
    { "Colorado", CO }  // Excluded 44 lines
};

// Version 2: Better solution
PostalAbbreviation name_to_abbreviation(const std::string& name)
{
    for(auto s : STATES) {
        if(name == s.name){
            return s.abbreviation;
        }
    }
    return ERROR_CODE;
}

The difference between these two examples is clear: After going through a little bit of trouble to design and initialize a suitable data structure, the actual function name_to_abbreviation was decreased significantly. The technique of data driven programming is always - in one way or another – the use of pre-processed information comprised of initial values and the results from them.

The benefits of data driven programming include:

  1. The programs created are usually shorter, but still clear.
  2. The possibility of making errors is smaller.
  3. The programs are easier to expand and maintain.

A clear sign of a need for data driven programming within a program is an if structure that grows out of proportion (or a switch structure, which we have not studied so far.)