Basic Java containers

The Java standard class library offers similar container classes and sorting functions as e.g. C++ and Python libraries. We will review some of them in this section. We will not even try to go through all the details. You are recommended to also look at the Java Collections framework documentation for more information. Java Collections framework is a part of the Java class library dedicated to storing and manipulating collections.

It is good to note one important thing right in the beginning: Java containers can store only objects. If you wish to store primitive values as such, you are pretty much limited to using plain arrays. Java containers are commonly used with primitive values: the actually stored values will then just need to be boxed into wrapper class objects that can be later unboxed when read into primitive variables. This is convenient enough from a programmer’s point of view due to how Java performan the (un)boxing automatically.

The generic class java.util.ArrayList provides a similar dynamic array as e.g. vector in C++ or list in Python. We will discuss generics a little later, but the term refers to the fact that we give an item type as a type parameter when using the ArrayList class. By a dynamic array we mean an array whose size can be changed e.g. by adding items to its end.

A list?

Not that the term “list” is not confined to mean just e.g. a linked list: a list is in general a collections of items that have a linear order, that is, where we can unambiguously refer to e.g. the first or the last item, or in general to the item at index i. As the name suggests, ArrayList is a list that stores its items in an array (and is reminiscent of Python’s list). We do not discuss it further in this section, but Java also contains e.g. the class LinkedList which, as the name suggests, is a list that stores its items in a linked list.

Java’s generic classes use similar syntax as C++: the type parameter is given inside angle brackets after the class name. E.g. ArrayList<Integer> is an ArrayList array for storing Integer objects, and ArrayList<ArrayList<String>> is an ArrayList array whose items are ArrayList arrays for storing String objects (so it would in effect be a two dimensional String array). It is also possible to store plain arrays into an ArrayList array: e.g. also ArrayList<String[]> would be a two dimensional String array (although less dynamic than the preceding one). The ArrayList class contains e.g. the following member functions:

  • add(item): add item to the end of this array (and the array grows by one).

  • addAll(container): add all items of container to the end of this array (as if they were added individually using add).

  • get(i): return the item at index i.

  • set(i, item): store item to the (existing) index i.

The following example code stores the 10 first Fibonacci numbers into an ArrayList array.

// Note how the angle brackets on the right side are empty. Java allows to leave out the
// type parameters when using the new operator if the Java compiler can infer them from
// the target variable type (here the type of fibonacci, that is, ArrayList<Integer>).
// It would of course be legal to use the full form new ArrayList<Integer>().
// The array will be initially empty.
ArrayList<Integer> fibonacci = new ArrayList<>();

// Add the first 2 Fibonacci numbers to the beginning of the array.
fibonacci.add(0); // The first Fibonacci number is 0.
fibonacci.add(1); // The second Fibonacci number is 1.
for(int i = 1; i < 10; i++) {
  // Add the rest: the next Fibonacci number is always the sum of the previous two.
  fibonacci.add(fibonacci.get(i-1) + fibonacci.get(i));
}

// Prints out "[0, 1, 1, 2, 3, 5, 8, 13, 21, 34]".
System.out.println(fibonacci);

The generic classes HashSet and TreeSet are set classes similar to unordered_set and set in C++ and set in Python. They store a collection of items without duplicates: no two items in a set are similar (compare to be equal). Set classes are much more efficient that arrays especially in situations where we want to check whether a set contains a certain item. The two Java set classes have the same difference as the corresponding C++ classes: HashSet is based on a hash table and TreeSet on a balanced search tree. In terms of practice, one sometimes relevant consequence is that HashSet maintains its items in random order (due to how hashing works) whereas TreeSet maintains its items in ascending order.

Both HashSet and TreeSet offer e.g. the following member functions:

  • add(item): add item to this set unless the set already contains a similar item.

  • addAll(container): add all items of container to this set (as if they were added individually using add).

  • contains(item): returns a boolean that tells whether this set contains an item similar to item.

  • remove(item): removes item from this set (if the set contains an item similar to item).

TreeSet in addition has e.g. the following member functions that rely on item order:

  • first(): returns the smallest item of this set.

  • last(): returns the largest item of this set.

  • pollFirst(): returns and removes the smallets item of this set.

  • pollLast(): returns and removes the largest item of this set.

The next code example demonstrates basic use of set containers. Only some of the above mentioned functions are used here.

import java.util.HashSet;
import java.util.TreeSet;

class Sets {
  public static void main(String[] args) {
    HashSet<String> hashSet = new HashSet<>();
    TreeSet<String> treeSet = new TreeSet<>();
    for(String arg : args) {
      hashSet.add(arg);
      treeSet.add(arg);
    }
    System.out.print("Iterating over hashSet:");
    for(String s : hashSet) {
      System.out.print(" " + s);
    }
    System.out.println();
    System.out.print("Iterating over treeSet:");
    for(String s : treeSet) {
      System.out.print(" " + s);
    }
    System.out.println();
  }
}

Running the code example e.g. as java Sets one two one three one four five two might print out:

Iterating over hashSet: four one two three five
Iterating over treeSet: five four one three two

The result illustrates how iterating over the hash-based set lists the items in a more or less random order, but iterating over the tree-based set listes them in ascending alphabetical order.

The generic classes HashMap and TreeMap are dictionary classes similar to unordered_map and map in C++ and dict in Python. They store key-value pairs and offer efficient lookup function based on the key of the key-value pairs. The key and value types can be different and are given as two comma separated type parameters when using a dictionary class. E.g. HashMap<String, Double> would store String keys and Double values, and TreeMap<Integer, ArrayList<String>> would store Integer keys and ArrayList<String> values. In the latter case the values would thus be dynamic arrays, and this kind of a dictionary structure might be useful e.g. if we wish to store several values under the same key.

The dictionary classs have the same difference as the corresponding set classes: HashMap is based on a hash table and maintains the key-value pairs in random order, whereas TreeMap is based on a balanced tree and maintains the key-value pairs in ascendingorder with respect to the keys. Both HashMap and TreeMap offer e.g. the following member functions:

  • put(key, item): adds the key-value pair, that is item under key, into this dictionary. If the key already had a value stored under it, item will replace the previous value.

  • putAll(map): adds all key-value pairs of the dictionary map into this dictionary (as if they were added individually using put).

  • containsKey(key): returns a boolean that tells whether this dictionary contains a value stored under key.

  • containsValue(item): returns a boolean that tells whether this dictionary contains a value item under any key. Note: this may be slow as it probably scans key-value pairs one by one.

  • get(key): return the value stored under key (or null if the dictionary does not contain a value stored under key).

  • remove(key): remove the key-value pair whose key is key (if such exists).

  • keySet(): returns a set that contains the keys of all stored key-value pairs.

  • entrySet(): returns a set that contains all stored key-value pairs.

    • The key-value pairs are stored internally in objects of type Map.Entry<K, V>, where K and V are the key and value types. This is similar to pair in C++.

    • Map.Entry has the member functions getKey() and getValue() for extracting the key and value stored in it.

TreeMap also offers e.g. the following functions that rely on the order of the key-value pairs:

  • firstEntry(): returns a Map.Entry containing the key-value pair with the smallest key.

  • lastEntry(): returns a Map.Entry containing the key-value pair with the slargest key.

  • pollFirstEntry(): returns and removes a Map.Entry containing the key-value pair with the smallest key.

  • pollLastEntry(): returns and removes a Map.Entry containing the key-value pair with the largest key.

The following example code demonstrates basic use of dictionaries by “indexing” the command line parameters based on their first characters: all command line parameters sharing the same first character are stored into a list that is stored using that first character as the key. We again use only some of the functions mentioned above.

import java.util.TreeMap;
import java.util.TreeSet;

class Index {
  public static void main(String[] args) {
    TreeMap<Character, TreeSet<String>> index = new TreeMap<>();
    for(String arg : args) {
      // Use the first character as a key, converting it always to upper case.
      Character key = Character.toUpperCase(arg.charAt(0));
      // Lookup the word set corresponding to this key. If it does not exist yet,
      // (get returned null), create and store a new empty set under this key.
      TreeSet<String> keyValues = index.get(key);
      if(keyValues == null) {
        keyValues = new TreeSet<>();
        index.put(key, keyValues);
      }
      // Add the current word to the set that now must exist in the dictionary.
      keyValues.add(arg);
    }

    // Iterate over the dictionary contents with a nested loop where the outer loop
    // iterates over the keys and the inner loop iterates over the words stored in
    // the set under that key.
    System.out.println("Iterating over keys and their values:");
    for(Character key : index.keySet()) {
      System.out.print("  " + key + ":");
      for(String s : index.get(key)) {
        System.out.print(" " + s);
      }
      System.out.println();
    }

    // Iterate over the dictionary again, but this time with a simple loop that
    // iterates over the key-value pairs. Here we might consider using the inferred
    // var type as the type of the iterated key-value pairs is fairly complicated
    // and long: Map.Entry<Character, TreeSet<String>>
    System.out.println("Iterating over key-value-pairs:");
    for(var keyValue : index.entrySet()) {
      // We may extract the key and value from the key-value pair by calling getKey()
      // and getValue(), respectively. We could also have simply printed the key-value
      // pairs as such, because Map.Entry offers a reasonable toString implementation.
      System.out.print("  " + keyValue.getKey() + ": ");
      System.out.println(keyValue.getValue());
    }
  }
}

Running the example code as

java Index The Java programming language is related to C and C++ but is organized rather differently with a number of aspects of C and C++ omitted and a few ideas from other languages included

will print:

Iterating over keys and their values:
  A: a and aspects
  B: but
  C: C C++
  D: differently
  F: few from
  I: ideas included is
  J: Java
  L: language languages
  N: number
  O: of omitted organized other
  P: programming
  R: rather related
  T: The to
  W: with
Iterating over key-value-pairs:
  A: [a, and, aspects]
  B: [but]
  C: [C, C++]
  D: [differently]
  F: [few, from]
  I: [ideas, included, is]
  J: [Java]
  L: [language, languages]
  N: [number]
  O: [of, omitted, organized, other]
  P: [programming]
  R: [rather, related]
  T: [The, to]
  W: [with]

The preceding examples created the containers to be initially empty, without giving a constructor parameter. The container classes usually have also a constructor that takes an existing compatible container object as a parameter and initializes the new container with its contents (as if we had used addAll or putAll right after constructing an initially empty container).

You are implementing a program to follow the process of chocolate tempering. You hence need to have a container that can store temperature values as they come, maintaining the order the values come in. The container you choose is:

You need a key word index that stores key words and all the pages of the book where the key word can be found. The key words are stored alphabetically and the page numbers first to last i.e. in an ascending order. The Java container to collect this data is:

There are several completion methods on a course. You need the student numbers of the students enrolled on the course so that it is easy to check if the student is already enrolled and if not add them to the course. The best Java container for handling the student numbers would be:

Container vs. hash code / comparison function

Containers based on a hash table compute hash values to the items (or dictionary keys) by calling a member function hashCode(). All Java objects have such a function, at least as a default implementation inherited from Object class (this is discussed later). The default implementation is based on object identity only (and not the value represented by an object). Most fundamental Java classes, such as Integer, String and so on, have a reasonable own implementation of hashCode. If you define a class that you want to use with hash based containers, you should implement the hashCode function for that class. This can be even quite straight-forward to do, because a hash value can usually be computed by combining the hash values of some suitably selected object member variables. If the member variable types already have reasonable hashCode implementations, a combined has value can be computed by the function java.util.Objects.hash. This function takes one or more objects as parameters and returns a hash value based on combining their individual hash values. We will provide an example of this later when discussing class inheritance.

Tree based containers maintain items (or dictionary keys) in ascending order. This requires that the items must either readily support value comparison or that we provide a separate comparison function as a constructor parameter when creating the container. This will be discussed later. But note that Java’s fundamental value types (Integer, String, etc.) readily support comparison based on “typical” ascending order.