First I would like to bring up the project I am working on because that is the inspiration behind this multi part hashtable blog entry set. I am working on a game called Orbis. Orbis came to thought a few years ago. It is basically an asteroids clone that has a twist of pattern matching added in to bring a little complexity and immersion into the game play. I chose pattern matching because it is not to difficult or complicated to dull and frustrate the game play but instead it adds some challenge to the asteroids game type.
When I originally chose to do Orbis I was going to do it in C#/XNA or Python, however, that was years ago and I found that I like to use games as a way to feed my need for the knowledge of how things work internally at a lower level. This is not lower level as in language but low level as in API/functionality. So coming back to my original point I chose to return to Orbis and develop it using pure ANSI-C and SDL. This really gives me a chance to dive into the heart of algorithms which is a real peek into the interests that I have as well as give me a chance to develop software in a non OOP manner. So this brings us to the topic of HashTables.
When designing Orbis I realized I need a way to cache media resources such as images and music so that I can minimize the amount of memory that needs to be allocated as there is no need to keep around duplicates of such resources as you can just reuse them. It just so happens that HashTables are a great way to do this and lets explore why.
What is a HashTable?
A HashTable is a data structure that maps values to keys. If you are a C++ programmer you would know that when you want to have this functionality you would use std::unordered_map. When you tell a HashTable you want a value for some key it goes into the table finds the value and returns that value to the caller.
The main Pro to HashTables is that they are ridiculously fast. In a perfect situation a HashTable has O(1) access. This means that when it looks in the table it is guaranteed to find the value instantly.
The main con is that O(1) access is not guaranteed. If you pass in a key that does not exist the HashTable has to iterate from that point forward giving us a O(I - N) lookup time. Conflicting Keys can also increase lookup time but lookup time will always be better then a typical linear array search of O(N).
So how do they work?
HashTables are a very interesting data structure. When someone uses std::unordered_map, a Dictionary, or some other HashTable type in another language without knowing the inner workings you would assume that the HashTable is storing the actual key and the value side by side and that the key is the actual index into the "list". This is actually quite false in the way the work. The general idea behind a HashTable is to actual store the key and the value as a node and index that node with a standard integer index.
So in reality a HashTable is a linked list where each node is indexed. The index is generated from the key to determine where the node should be placed in the chain. In order to do this internally every HashTable uses a hash function to figure this out. The rule is that a HashTable is only as good as it's hash function and how well it handles duplicate keys or avoids duplicate keys.
In this post I want to focus on the basics of my HashTable implementation. So I want to try to cover the my creation function, hash function, and lookup function.
The first thing we need in my case are 2 structures. The first is the actual node which is a single linked list which stores the key, the value, and a pointer to the next node in the chain. This is what the definition of that looks like.
/* single linked hashtable node */
char *key; /* the node key */
void *value; /* the node value */
struct htnode *next; /* the next node in the chain */
Next is the structure to the actual HashTable. It contains a fixed size array to node pointers and a function pointer to a node disposal function because some nodes may contain dynamically allocated memory like a SDL_Surface which needs to be freed in a special way.
/* a hashtable containing an array pointers to its nodes */
void (*disp)(HashNode *); /* function pointer to dispose func */
Now the structures are out of the way we can look at the creation function for the actual HashTable. This function is designed to allow us to dynamically allocate a HashTable. This technically is not necessary. We could easily just use the actual HasTable structure as a variable somewhere and do the assignment manually but because this HashTable is dealing with dynamically allocated memory for my purposes I feel it is better to design this way so that we can make sure everything gets cleaned up. With this function present the caller will know that the HashTable is dynamic and they should call the appropriate cleanup function for the table which will ensure the nodes get disposed of properly.
* Creates a new hashtable returning said hashtable.
* The function pointer argument is for a dispose function
* if no dispose function is needed for the stored nodes pass NULL.
HashTable *create_hashtable(void (*disp)(HashNode *))
HashTable *tbl = malloc(sizeof(HashTable));
tbl->disp = &disp;
I now want to look at the most important function which is the hash function. The hash function I am using is not 100% full proof but it is simple to understand and hopefully it will be efficient enough for my purposes. I will do some extensive testing on it as the game progresses to make sure things stay sound. It is easy to modify if the need arises but I find it best to start simple and avoid premature optimization. This hash function generates a index based off of the key string which mangles the characters by a seed value multiplied by the previous index size. Once the final index is present the function with return the final index which is the generated index modulus the size of the HashTable. I am hoping this is plenty enough to prevent collisions for my purposes but if they do occur I will have to make some modifications because my intentions are to discard collisions when they occur. The hash function is very important because it is the core of how the actual lookup, insertion, and removal function operate.
* A hash function for turning a null terminated string to a unsigned
* index set.
unsigned hash(char *key)
/* generate a salted index */
for (index = 0; *key != '\0'; key++)
index = *s + 31 * index;
return index % HASHSIZE;
Finally for this article we are going to look at the actual lookup function. Out of the lookup, insertion, and removal functions the lookup function is the most straight forward. We take in the HashTable and the key as arguments and we iterate through the linked list starting at the index they key corresponds to. The iteration is very straight forward linked list navigation. If the key is found we return a pointer to the node and if the key is not found we return NULL.
* Look up a node in given hashtable.
HashNode *lookup(HashTable *tbl, char *key)
for (nptr = tbl->table[hash(key)]; nptr != NULL; nptr = nptr->next)
if (strcmp(key, nptr->key) == 0)
If you noticed yes there is a lot of pointers being thrown around. Welcome to the world of ANSI-C. Without references all we have is pointers and various pointer arithmetic tricks. In this case no pointer arithmetic is really being thrown around so hopefully this post is very straight forward to understand.
In the next post I will cover the rest of the implementation and may do a third about how to put the code to use. When this series is all said and done we will be able to thank all of the glorious programmers that implemented various complete standard library implementations of this data structure so we don't have to do it all the time.