Data Structures for Pre-College Programmers: Non-linear Data Structures
beginner data data structure dictionary hash set unordered set ordered set
Many of the beginners on the site are pre-college students. Often beginners will learn by reading tutorials on the Internet, copying code from books, and trying out things that they find interesting.
This article is part of a series that focuses on giving pre-college developers the basics they need to understand data structures.
This article is about the non-sequential container types
These data structures are different from arrays and lists. Arrays and lists are sequential containers. Things have an order. When you add a bunch of items in a specific order, they will remain in that order.
Non-linear data structures do not necessarily remain in the order you add them. When you add or remove something it can cause other items to shuffle around.
Internally these are made from trees and heaps, which were covered in the previous article.
There are many variations of these data structures. The most basic are the data dictionary and the ordered and unordered set.
The Data Dictionary
A regular dictionary contains a bunch of words (the key) and a definition (the value). Since the keys are in alphabetical order you can find any item very quickly. Imagine if the dictionary were not sorted, looking up words would be extremely difficult.
There are two common ways to sort the items in the dictionary: a comparison, or a hash.
A traditional comparison ordering is usually the most intuitive. This is like the real-world dictionary where everything is sorted alphabetically, or they could be sorted numerically. When storing items this way you need to provide a comparison function.
Usually this function defaults to the less than operator, such as a < b
The second way to sort items is with using a hash. A hash is just a way to convert a block of data into a single number. For example, the string ‘blue’ might have a hash of 0xa66b370d, and the string ‘red’ might have a hash of 0x3a72d292. When a data dictionary uses a hash it is usually considered unsorted. In reality it is still sorted, just sorted by hash rather than by something that is human usable.
A data dictionary works the same way. There are slight performance differences between using a traditionally-sorted dictionary or a hash-sorted dictionary. The differences are minor enough that you generally don't need care about them.
In C++ the family of containers are a map or a mutimap, or an unordered_map or unordered_multimap. In Java the family is a HashMap, TreeMap, or LinkedHashMap. In C# it is a Dictionary or SortedDictionary. Each one has its own implementation details such as if they use a hash or a comparison and if they allow duplicates, but the overall concept is the same.
Note that in each of the standard libraries they provide an ordered version (where you specify the comparison) and an unordered version (where they use a hash function).
Once you have added items to a data dictionary you can change the values but you cannot change the key. Back to the real-world word dictionary analogy, you can modify the word’s definition without moving the word in the book; if you change the spelling of the word then you need to remove the first spelling and re-insert the word with the new spelling.
The details for how they work is best left to a college level. It is enough to know that they are very fast for looking up data, and they can be pretty slow when adding or deleting values.
The Ordered and Unordered Set
An ordered set is almost exactly the same as a dictionary. Instead of having a key and a value, it only has a key.
Instead of a traditional dictionary with words and definitions, it is just the words.
These are useful when you want to store just store items directly without any additional data.
In C++ the family of structures are called a set or multiset, or an unordered_set or unordered_multiset. In Java they are HashSet, TreeSet, or LinkedHashSet. In C# they are the HashSet and SortedSet.
Just like above, there are ordered versions (where you specify the comparison) and an unordered version (where they use a hash function). You also cannot modify a key when it has been added. Instead you must remove the old object and insert a new object.
Often times they are implemented exactly the same as a data dictionary above, they simply store nothing for the value.
Since they are implemented the same way they have the same characteristics. They are very quick for searching and looking up values, but they are slow when it comes to adding and removing items.
The data dictionary, ordered set, and unordered set container classes are very useful for fast data lookup. They are often implemented as trees or hash tables, which are very efficient for that. Use them when you need to construct the data once and reference it many times.
They are not efficient at adding and removing items. Change to the container can cause items to shift and shuffle internally. If you need to follow that use pattern, a sorted linked list may be a better option.
NEXT: Choosing the Right Data Structure