Jump to content

  • Log In with Google      Sign In   
  • Create Account


Like
13Likes
Dislike

Data Structures for Pre-College Programmers: Non-linear Data Structures

By frob | Published Apr 07 2013 07:23 PM in General Programming
Peer Reviewed by (Michael Tanczos, Dragonsoulj, AllEightUp)

beginner data data structure dictionary hash set unordered set ordered set

Introduction


Many of the beginners on the site are pre-college students. Often beginners will learn by reading tutorials on the Internet, copying code from books, and trying out things that they find interesting.

This article is part of a series that focuses on giving pre-college developers the basics they need to understand data structures.

This article is about the non-sequential container types

Overview


These data structures are different from arrays and lists. Arrays and lists are sequential containers. Things have an order. When you add a bunch of items in a specific order, they will remain in that order.

Non-linear data structures do not necessarily remain in the order you add them. When you add or remove something it can cause other items to shuffle around.

Internally these are made from trees and heaps, which were covered in the previous article.

There are many variations of these data structures. The most basic are the data dictionary and the ordered and unordered set.

The Data Dictionary


A regular dictionary contains a bunch of words (the key) and a definition (the value). Since the keys are in alphabetical order you can find any item very quickly. Imagine if the dictionary were not sorted, looking up words would be extremely difficult.

There are two common ways to sort the items in the dictionary: a comparison, or a hash.

A traditional comparison ordering is usually the most intuitive. This is like the real-world dictionary where everything is sorted alphabetically, or they could be sorted numerically. When storing items this way you need to provide a comparison function.

Usually this function defaults to the less than operator, such as a < b

The second way to sort items is with using a hash. A hash is just a way to convert a block of data into a single number. For example, the string ‘blue’ might have a hash of 0xa66b370d, and the string ‘red’ might have a hash of 0x3a72d292. When a data dictionary uses a hash it is usually considered unsorted. In reality it is still sorted, just sorted by hash rather than by something that is human usable.

A data dictionary works the same way. There are slight performance differences between using a traditionally-sorted dictionary or a hash-sorted dictionary. The differences are minor enough that you generally don't need care about them.

In C++ the family of containers are a map or a mutimap, or an unordered_map or unordered_multimap. In Java the family is a HashMap, TreeMap, or LinkedHashMap. In C# it is a Dictionary or SortedDictionary. Each one has its own implementation details such as if they use a hash or a comparison and if they allow duplicates, but the overall concept is the same.

Note that in each of the standard libraries they provide an ordered version (where you specify the comparison) and an unordered version (where they use a hash function).

Once you have added items to a data dictionary you can change the values but you cannot change the key. Back to the real-world word dictionary analogy, you can modify the word’s definition without moving the word in the book; if you change the spelling of the word then you need to remove the first spelling and re-insert the word with the new spelling.

The details for how they work is best left to a college level. It is enough to know that they are very fast for looking up data, and they can be pretty slow when adding or deleting values.

The Ordered and Unordered Set


An ordered set is almost exactly the same as a dictionary. Instead of having a key and a value, it only has a key.

Instead of a traditional dictionary with words and definitions, it is just the words.

These are useful when you want to store just store items directly without any additional data.

In C++ the family of structures are called a set or multiset, or an unordered_set or unordered_multiset. In Java they are HashSet, TreeSet, or LinkedHashSet. In C# they are the HashSet and SortedSet.

Just like above, there are ordered versions (where you specify the comparison) and an unordered version (where they use a hash function). You also cannot modify a key when it has been added. Instead you must remove the old object and insert a new object.

Often times they are implemented exactly the same as a data dictionary above, they simply store nothing for the value.

Since they are implemented the same way they have the same characteristics. They are very quick for searching and looking up values, but they are slow when it comes to adding and removing items.

Conclusion


The data dictionary, ordered set, and unordered set container classes are very useful for fast data lookup. They are often implemented as trees or hash tables, which are very efficient for that. Use them when you need to construct the data once and reference it many times.

They are not efficient at adding and removing items. Change to the container can cause items to shift and shuffle internally. If you need to follow that use pattern, a sorted linked list may be a better option.

NEXT: Choosing the Right Data Structure



License


GDOL (Gamedev.net Open License)




Comments
I still have to understand the meaning of these articles, they are sterile, copy and pasted from books introductions , barely scratching the surface.

are a map or a mutimap

-- Should say multimap, correct? Nitpicking for the sake of education. ;-)

 

I like the series so far. It gives decent enough summaries we could use in a lexicon of sorts.

Show some pictures of these data structures!

I still have to understand the meaning of these articles, they are sterile, copy and pasted from books introductions , barely scratching the surface.

I suspect there is underlying purpose to the descriptive nature of things.  The second (third?) article got into memory/optimization stuff which I thought was a bit early for that.  But the structure of the articles has mostly been to describe the items and how they work (yeah, text book like but very simplistic) which can lead into a full "here is what you learned and why it is important to differentiate" set of additions.

 

I think this is a solid description as it stands though.

I still have to understand the meaning of these articles, they are sterile, copy and pasted from books introductions , barely scratching the surface.

 

This comment is neither positive nor constructive.  Member has been warned for a second offense and currently has a history of negatively commenting on member articles.

When will the next article come? :)


Note: Please offer only positive, constructive comments - we are looking to promote a positive atmosphere where collaboration is valued above all else.




PARTNERS