Characteristics of basic data types

At first glance, the following two programs will look identical to you:

def main():
    number = int(input("Input a number: "))
    if number < 0:
        print("Must be non-negative!")
    else:
        result = 1;
        while number > 0:
            result = result * number
            number -= 1
        print("Factorial is:", result)

main()
#include <iostream>

using namespace std;

int main() {
    cout << "Input a number: ";
    int number = 0;
    cin >> number;

    if ( number < 0 ) {
        cout << "Must be non-negative!" << endl;
    } else {
        int result = 1;
        while ( number > 0 ) {
            result = result * number;
            --number;
        }
        cout << "Factorial is: " << result << endl;
    }
}

Then again, if you run the programs and set the initial value at, for example, 17, Python program will print 355687428096000, the C++ program -288522240. The result of the C++ is undoubtedly wrong, because the value of a factorial (the multiplication of non-negative numbers) may not be negative. What causes this problem?

In a way, Python is a rare programming language, because the value of integers is not limited. Most of the other programming languages present integers (and other types of numbers) on machine language level as a certain, fixed amount of bits. This amount of bits usually depends on the processor architecture that is used, but sometimes it also depends on the chosen compiler. In the case of integers, the typical amount of bits is 32, but in some cases, it is 16 or 64.

If the processor of the computer presents integers with 32 bits, it will not be able to process other integers than the ones between -2147483648 and  2147483647 (altogether 232 different integers). If the result of a calculation on that processor is an integer that may not be presented with 32 bits, it happens an overflow. The overflow will naturally make the result incorrect.

../../_images/speed_odometer.jpg

One could compare overflow to the moment when the odometer (the distance gauge) of your car reaches its highest number, and ”spins around” back to the smallest number it is capable of showing. (Kuva: Ant75)

A similar situation may arise when calculating with real numbers, despite the fact that bit amounts and numerical values are not the same, because real numbers are presented in a different form than integers. Underflow means the situation where you count with real numbers and the absolute value of the result is too small for the processor to differentiate between result and zero.

When a programmer works with C++, they have some power over how numbers are processed. For example, there are several data types meant for presenting integers in C++, while there was only one in Python.

Since the aforementioned trivia is not vital at this point, the data types we will be using on this course are as follows:

  • int - the normal integer, fit for dealing with positive and negative values
  • unsigned int - integer type fit for non-negative numbers (natural numbers) only
  • long int - type that may allow a larger area of presentation than the usual int type, but only if the processor architecture of the computer supports it
  • unsigned long int - a rather logical combination of the two types above.

It is vital to note that C++ language does not define the amount of bits used to present numeric data types. Therefore, you should not trust all of the integers of the int type always to be presented by 32 bits, even though this applies to the working environment of our course.