Syntax and compilation

Attention

In this section (as well as in other parts in the beginning of the material), Python and C++ programs have been compared to each other. Python has been selected as a comparison language, since most of the students have studied it in Programming 1 course. If you do not know Python, you can skip Python parts in the comparisons and pay attention only to C++ parts.

This section first shows how a simple C++ program looks like with details. Let’s compare the following two programs written in Python and C++, the functionality of which is equivalent to each other:

#
#  Version 1: Python
#

def main():
    print("Hello world!")
    print("How are you?")

main()
//
//  Version 2: C++
//

#include <iostream>

using namespace std;

int main() {
    cout << "Hello world!" << endl;
    cout << "How are you?"
         << endl;
}

What similarities and differences do the programs have:

  • C++ program will automatically start from the function named main.

  • In Python it was just an agreement to name the starting function as main. We did this because that is how many programming languages do it anyway. Python doesn’t require this, but C++ does.

  • Most of the C++ commands end with semicolon (;). Unfortunately this rules is not universal and one can’t just stick semicolon everywhere. This is a major source of angst and programming errors for an inexperienced C++ programmer who hasn’t yet quite figured out where semicolon belongs and where it doesn’t.

  • C++ commands generally end with semicolon, and this makes the language very flexible when it comes to dividing single command in multiple lines without having to use any special notation. Python on the other hand usually requires the character \ at the end of the continued line.

  • Program lines belonging together (blocks) are written inside curly braces { } in C++, when Python used indentation for the same purpose. However, it is a good practice to use indentation in C++, too, because it makes programs more readable and understandable.

  • Comments in C++ start with //. Python used # sign instead.

  • Output command is not part of C++ language itself, but it has been implemented in a library instead. If you want to print something on the screen or read user input from the keyboard, you have to use iostream library:

    The #include <...> structure is analogous to import command in Python. If you need program code written in another file, you can use include command. If you want the compiler to search for the other file from C++ library files, you will write the name of the library between angle brackets, as above. If you want the compiler to search for the file from the files written by the programmer, you will write the name of the file between quote marks ("").

    Lines beginning with # are directives for the preprocessor. Preprocessor is a part of the compiler, and its task is to prepare source code files for the actual compilation. Preprocessor makes textual substitution, for example, directive #include <iostream> copies the content of iostream library in the place of the directive.

  • When #include <iostream> has been added in the beginning of a source code file, information can be printed on the screen using cout command with << operator. One cout command can be used to print as many values as required just by adding << operator in front of each value. If a special value named endl is printed, the cursor moves in the beginning of the next line on the screen.

  • C++ code has one extra line that Python version seemingly does not have:

    This line makes it possible to use names from the iostream library as such, like cout and endl, instead of std::cout and std::endl.

    Python actually has exactly the same mechanism working when using libraries (the example above did not). When using libraries in Python, one choice is to write:

    import math
    ...
    math.sqrt(2.0)
    

    One must use the library name as a prefix when using the service of the library. This can be avoided by writing:

    from math import *
    ...
    sqrt(2.0)
    

    Command using namespace makes it possible to refer to identifiers declared in a named namespace without the name of the namespace as a prefix. On the other hand, using such prefixes can be useful, too. In a way, a namespace defines a set of identifiers and makes them invisible outside the namespace. In this way, namespaces enable use of same identifiers, if the identifiers have been defined in different namespaces, when it is clear, which of these identifiers with the same name is in question. For example, a student register program could use identifier name to refer to the name of a student as well as to that of a course. If both of these names have been defined a namespace of its own, we could refer to them as Student::name and Course::name.

    At this phase of the course, you need not worry about namespaces very much. It is enough to realize that if an identifier belongs to some other namespace than the global one, the compiler does not find it, if you have not written command using namespace or written the name of the namespace as a prefix as:

    name_of_the_namespace::identifier
    

Attention

The larger programs you write, the more important it is to avoid ”useless” names. If you use large libraries with all idenfiers of them, the number of identifiers may grow so big that it becomes difficult to invent new names.

Because of that, it is usually better to write with prefixes as std::cout and std::endl than using the whole namespace with command using namespace std;, which enables writing cout and endl, although the latter form is shorter and simpler.

Without command using namespace, the program above is as follows:

//
//  Version 3: C++ with better programming style
//

#include <iostream>

int main() {
    std::cout << "Hello world!" << std::endl;
    std::cout << "How are you?"
              << std::endl;
}

Although the above code follows a better programming style, it is more common to use the version with the line using namespace std, because Qt generates such a line automatically.

Interpretation vs compilation

Python is an interpreted programming language and C++ a compiled one.

Executing programs written in an interpreted programming language requires a utility program every single time we want to run the program. This program is called an interpreter (e.g. Python interpreter). If the interpreter program is uninstalled or it somehow gets deleted, it is not possible to run programs written in the programming language the lost interpreter was for. The interpreter’s job is to transform the commands in the source code to the machine language as the program’s execution progresses. This process is required since the CPU of the computer doesn’t know how to directly run programs written in any other language than its own machine language.

In the case of compiled programming languages a utility program is still required if we wish to execute our program. This program is called a compiler (e.g. C++ compiler). Compiler processes the source code completely before it is ever executed.

Compilation process includes error checkings, and it produces a whole machine language program that is usually stored in the hard drive to a separate file (e.g. with .exe suffix in Windows). Once the compilation process is completed, the compiler program is not needed anymore since the fully compiled machine language commands have been stored in the executable file, which can be run over and over again without any utility program (i.e. the compiler). If the source code is ever modified, it must, of course, be compiled again if we want the modifications to be compiled into the machine language so that we can run the new version of the program.

There are many programming languages which are somewhere between the two extremes described above. Programs written in these languages are translated into some kind of simpler form (bytecode) which is then interpreted. In a way, this is a hybrid form between interpreting and compiling.

Strictly speaking, compilation consists of (at least) two phases. The program is first compiled into so called object file. In this form, the program is in machine code, but it cannot be executed yet. The reason is that the final code still needs parts from the library, and thus, the program must be linked.

Linking means that the separately compiled parts are joined together as a single executable program. Compilers typically hide the linking phase, but large programs require that compilation must be done in two phases to avoid recompilation after each little modification. This benefit is significant, if the program is greater than 100000 or a million of lines, but it can be observable even with much smaller programs, too.