Simple Closed HashTable Program

Started by
2 comments, last by Chad Smith 11 years, 2 months ago

I wanted to get some input on a very basic simple Closed HashTable program I wrote as we just started to study the very basics of them in class. Because of this I decided I'd go ahead and try to implement a very basic one.

I got it and I am getting the output that I would get when I worked it out on pen and paper so it seems I did get it to work. Though before I expand on this program more to make the HashTable an actual class that I could use (yes I know their are ones already out there that I could use, but it is just for practice and make sure I understand the bare bone basics of them).

So here are the basics of what I wanted. I wanted a Closed HashTable that used a basic hash function. I then wanted it to hash the set and then display the hash table set. Nothing fancy, nor it is even using classes yet. Though how is the code so far for it? Anything that I should watch out for?

Here's the code, it's bare bones basic just enough to get theory into code


#include <iostream>

// Method Prototypes
int* CreateSet(int buckets, int defaultValue = -1);
void GetSet(int* set, int buckets);
void DisplaySet(int* set, int buckets);
void DeleteSet(int* set);
int HashFunction(int* set, int x, int buckets);
int* ClosedHashSet(int* set, const int buckets);

int main()
{
	int buckets;
	int* set;
	int* hashSet;

	// ask the user how big the set is
	std::cout << "Please enter the size of the set: ";
	std::cin >> buckets;

	set = CreateSet(buckets);
	GetSet(set, buckets);
	std::cout<< "\nOrinal Set..." <<std::endl;
	DisplaySet(set, buckets);

	hashSet = ClosedHashSet(set, buckets);
	std::cout << "\nClosed Hashing Set..." << std::endl;
	DisplaySet(hashSet, buckets);

	DeleteSet(set);
	DeleteSet(hashSet);

	return 0;
}

// Creates a set with the desired default value
// defaults to -1 for the defalt value
int* CreateSet(int buckets, int defaultValue)
{
	int* set = new int[buckets];

	for(int i = 0; i<buckets; i++)
	{
		set[i] = defaultValue;
	}
	std::cout << std::endl;

	return set;
}

// Gets input from the user on what value to give each index in the set
// sets each index in the set to that value
void GetSet(int* set, int buckets)
{
	for(int i = 0; i<buckets; i++)
	{
		std::cout << "Index " << i << ": ";
		std::cin >> set[i];
	}
}

// Takes in a set and displays the set
void DisplaySet(int* set, int buckets)
{
	for(int i = 0; i<buckets; i++)
	{
		std::cout << i << ": " << set[i] << std::endl;
	}
}

//  Takes in a set and frees the memory that was created of that set
void DeleteSet(int* set)
{
	delete[] set;
	set = nullptr;
}

// Takes in the set you're hashing from, element to hash and how many buckets in the hash function
// returns the index using the hash function
int HashFunction(int* set, int x, int buckets)
{
	// the index of the hash function
	int result = (x*x)%buckets;
	// what index of the hash function we are on
	int hashedIndex = 1;

	// while the element we are hashing has already been hashed
	while(set[result] != -1)
	{
		// try to get another index using closed hashing
		result = ( ((x*x)%buckets)+hashedIndex) %buckets;
		// increase the hashedIndex
		hashedIndex++;
	}
	return result;
}

// Takes in the set to hash from and the number of buckets and returns the set hashed
int* ClosedHashSet(int* set, const int buckets)
{
	int* hashSet = CreateSet(buckets);

	int result = 0;

	for(int i = 0; i<buckets; i++)
	{
		result = HashFunction(hashSet, set[i], buckets);
		hashSet[result] = set[i];
	}

	return hashSet;
}

Here is the hash function that i am using for this program

h(x)=x2 % b

where b is the number of buckets wanted for the set

Anything to watch out for code wise? Should had been implemented better (though note it will be put in a class and template it later on)?

Again this is just to try to implement the bare bone basics. I am not trying to do something better than what the professionals have already done for it. Just practice and would not try this for an actual project.

Thanks!

Advertisement

Code review time!

This screams out to be made a class. You are passing the same values to each function. The table, the number of buckets, and the default value should all be made in to data members of a class.

Use of raw pointers and manual allocation is outdated, but understandable for a beginner. You should prefer stack-managed objects like shared_ptr, weak_ptr, and so on. This will help with many issues, such as proper cleanup when handling exceptions, and proper pointer lifetime management. You should also consider using standard library containers such as std::vector<> rather than your own pointers to arrays.

You have no input validation. What if they enter a number of buckets like -1? This is a good habit to get in to, even with practice programs.

Avoid using names that exist in the standard library. You use "set" as a variable name. There is a std::set class. Prefer to use a different name.

Your function names are confusing. You are doing two things with one group of functions --- you effectively have a set class, and you have a function that creates an instance of that class which has had an operation run against it.

Each bucket can only hold exactly one item, and that one item is pre-initialized with a default value. What happens if they need the same value as the default, how can they differentiate between a valid value and the default placeholder?

Several key bits of functionality are missing to make it general purpose. Add() would be nice to help with a single responsibility principle. Remove() is obviously missing -- once a slot is full it can never be released. Lookup() is obviously missing --- how will you find your result, especially in the case of a hash collision? Also it is often good to have a Reset() or Clear() function, although you could implement it with a DeleteSet() followed by CreateSet().

int* CreateSet(int buckets, int defaultValue = -1);

Default parameters cause problems in legacy code. Even though it is functionally equivalent to providing a second function definition that calls the first, in practice there are subtle gotchas as the code base grows. Provide two function signatures with the first explicitly calling the second:

int* CreateSet(int buckets) { return CreateSet(buckets, -1); }

int* CreateSet(int buckets, int defaultValue) { ... }

int* CreateSet(int buckets, int defaultValue){

int* set = new int[buckets];

In the real world people will misuse your code. Add assertions to verify that the parameters are sane. Consider validating every function's parameters like so:

assert(buckets > 0 && "Positive number of buckets expected");

(Side note: If you don't have your own assertion class that provides descriptive strings, use the format above to display a message in the assertion.)

In this case you could require an unsigned int rather than a signed int, since a negative number of buckets would be problematic.

void GetSet(int* set, int buckets)

//Gets input from the user on what value to give each

Violates the Single Responsibility Principle. Have a function to stores a value in the hashed container. Have a different function that reads input from the user.

HashFunction() violates the single responsibility principle. It should compute the hash, and that's it. It should not search for an empty bucket, that is part of a separate Add() function as mentioned earlier.

//while the element we are hashing has already been hashed


while(set[result] != -1)

This should not be a constant. It should be the defaultValue parameter passed to CreateSet().

When you create your closed hash set you are passing in all values, including those set to the default value. Is that intentional?

This looks confusing.... post title says "hash table", yet the function names are all about "hash set". Are you trying to implement set(mathematics) with hash table? Or are you trying to implement a hash table?

Either way, I think It's better if you just make a class for it, and user I/O can wrap around your class instead of mixing them like this.

This screams out to be made a class. You are passing the same values to each function. The table, the number of buckets, and the default value should all be made in to data members of a class.

You're right, I've already gotten started on designing the bare bone basics of this class

Use of raw pointers and manual allocation is outdated, but understandable for a beginner. You should prefer stack-managed objects like shared_ptr, weak_ptr, and so on. This will help with many issues, such as proper cleanup when handling exceptions, and proper pointer lifetime management. You should also consider using standard library containers such as std::vector<> rather than your own pointers to arrays.

Thanks. The reason I didn't really use them was because I have a feeling I will have an assignment that builds on the theory that is learning the class. With that I know the professor will want use to use raw pointers as she believes they are very important to know. So on most assignments she wants us to use our own ways of doing things to learn instead of "relying" on some of the standard library. So just in case if this was turned into an assignment later on I would be able to translate/use the code I have written. I think I'll go ahead and do another version to get more practice and try to do it using the standard library containers.

You have no input validation. What if they enter a number of buckets like -1? This is a good habit to get in to, even with practice programs.

Point taken and noted for future programs and practice programs. I usually always do input validation though never did on practice programs. It would be a nice habit to get into though.

Avoid using names that exist in the standard library. You use "set" as a variable name. There is a std::set class. Prefer to use a different name.

Noted, thanks.

Your function names are confusing. You are doing two things with one group of functions --- you effectively have a set class, and you have a function that creates an instance of that class which has had an operation run against it.

Any suggestions that you could suggest here? I really only spent about 10 minutes typing this up last night so not much went into the design of each function knowing that this would later be put into a class. While creating some of the functions I did notice I found myself saying what you said though wasn't entirely sure on what to do at the time.

Each bucket can only hold exactly one item, and that one item is pre-initialized with a default value. What happens if they need the same value as the default, how can they differentiate between a valid value and the default placeholder?

I'm a little confused by what you mean here?

Do you mean that, lets say the default placeholder is -1, that that would not be a valid value for a bucket? If so then thanks for noting that and was something that I decided on really fast since I needed a way to see if the hashed index had already been used. At the time it felt ok to just use a default value. I was thinking of fixing this in my new design with maybe an enum to note if a index has been used or not. Is their a better idea?

Several key bits of functionality are missing to make it general purpose. Add() would be nice to help with a single responsibility principle. Remove() is obviously missing -- once a slot is full it can never be released. Lookup() is obviously missing --- how will you find your result, especially in the case of a hash collision? Also it is often good to have a Reset() or Clear() function, although you could implement it with a DeleteSet() followed by CreateSet().

Noted. I didn't think of some of these functions though some of them were already being designed for the basics of a class I am currently designing.

int* CreateSet(int buckets, int defaultValue = -1);
Default parameters cause problems in legacy code. Even though it is functionally equivalent to providing a second function definition that calls the first, in practice there are subtle gotchas as the code base grows. Provide two function signatures with the first explicitly calling the second:
int* CreateSet(int buckets) { return CreateSet(buckets, -1); }
int* CreateSet(int buckets, int defaultValue) { ... }

Did not know of gotchas with default values. Don't remember reading anything in particular about them, but thanks for the tip on that! Noted for sure!

int* CreateSet(int buckets, int defaultValue){

int* set = new int[buckets];
In the real world people will misuse your code. Add assertions to verify that the parameters are sane. Consider validating every function's parameters like so:
assert(buckets > 0 && "Positive number of buckets expected");
(Side note: If you don't have your own assertion class that provides descriptive strings, use the format above to display a message in the assertion.)
In this case you could require an unsigned int rather than a signed int, since a negative number of buckets would be problematic.

Noted! I should be doing more error checking.

void GetSet(int* set, int buckets)

//Gets input from the user on what value to give each
Violates the Single Responsibility Principle. Have a function to stores a value in the hashed container. Have a different function that reads input from the user.

Thanks. Again an issue with me typing it up with no design in mind. Will see what I can come up with in the newer design.

HashFunction() violates the single responsibility principle. It should compute the hash, and that's it. It should not search for an empty bucket, that is part of a separate Add() function as mentioned earlier.

My original function I wrote just has a simple return statement that computed the hash using the hash function. Then when working on computing the hash when their was a collision I saw I needed a way to check which hash function to use. Any suggestions on what to do then when a collision arises, when/where I should get that hash?

//while the element we are hashing has already been hashed

while(set[result] != -1)
This should not be a constant. It should be the defaultValue parameter passed to CreateSet().

This is fixed in the newer design I am working on. Thanks!

When you create your closed hash set you are passing in all values, including those set to the default value. Is that intentional?

I wanted to pass in those set to default values also so I would know which buckets have and have not been used. Do you have a suggestion to not do this? Better idea?

Thanks a ton frob for your time and your input! I greatly appreciate this and is exactly what I wanted.

Thanks!

This topic is closed to new replies.

Advertisement