Sign in to follow this  
bakery2k1

Hash-table-like data structure

Recommended Posts

My application needs to store key-value pairs, for 32-bit keys, in a data structure and look them up later. In order to do this, I am using a hash table (with chaining) with a prime number of "slots", and am hashing the key by simply taking its modulus with the number of slots. However, I now have the need to remove all entries from the hash table where (key & 0xfffff000) has a certain value (call it removeKey). At the moment, I am doing this by:
for(int i = 0; i < 0x1000; ++i)
{
    key = removeKey | i;
    HashTableRemove(key);
}
However, profiling shows that my app is spending 50% of its entire processing time executing this loop. Can anyone suggest a change to the data structure which will allow the removal of this block of keys (and their respective values) to be performed more quickly? Please note that I am coding on a low-level, using pure C and even with no standard library, so "use stdext::hash_map" is not a very useful answer! Thanks

Share this post


Link to post
Share on other sites
It seems that a simple method would be a heiarchial hash table with the first level using "key >> 12" as the key and the second level using "key & 0xFFF" as the key. This way, you can simply delete an entry in the first level and have all the appropriate keys destroyed automatically.

Of course, if there are other masks that need similarly quick removal, it might become complicated (unless all the masks start with a string of all 1 bits and end with a string of all 0s, in which case a tree-like hashtable where each level looks at a single bit would be appropriate).

Share this post


Link to post
Share on other sites
I assume 0x1000 = 1000 and is just an example for a hashtable size. To traverse through somany elements that is empty is truly a waste! So, I would recon that you find a way to identify which elements are used and which are free, like having an array list showing the indexes or something. I dunno.

Share this post


Link to post
Share on other sites
Quote:
Original post by Extrarius
It seems that a simple method would be a heiarchial hash table with the first level using "key >> 12" as the key and the second level using "key & 0xFFF" as the key. This way, you can simply delete an entry in the first level and have all the appropriate keys destroyed automatically.

Of course, if there are other masks that need similarly quick removal, it might become complicated (unless all the masks start with a string of all 1 bits and end with a string of all 0s, in which case a tree-like hashtable where each level looks at a single bit would be appropriate).


I'm not sure I completely understand this. The first level hash table will be indexed by hash1(key >> 12), and its entry will be another hash table, indexed by hash2(key & 0xFFF)? Then, when a block deletion needs to occur, an entire second-level table is deleted?

This is the only mask (or maybe one of two) that I will be using. So, the tree-like structure, whilst sounding cool, should not be necessary.

Quote:

I assume 0x1000 = 1000 and is just an example for a hashtable size. To traverse through somany elements that is empty is truly a waste! So, I would recon that you find a way to identify which elements are used and which are free, like having an array list showing the indexes or something. I dunno.


Firstly, 0x1000 = 4096.
Secondly, the keys are arbitrary 32-bit values, so a bitmap holding whether or not a certain key has been used will require (2^32)/8 bytes = 512MB.

Share this post


Link to post
Share on other sites
Hey,

As far as I see it you have 2 options [that I can see]. The first is a smaller hash table. The second is a less brute method to storing your data. That is, unless you're wanting to put your objects into a list every time you mark them for removal.

I'd have to reccomend using a tree instead of a straight array to store your data. You can even partition easily enough [each node gives the address of the child node that has binary code 1 or 0 for each bit or a 0 pointer on no children]. The 32nd node would then give the address of the actual object. That way every reference is 32 binary compares. I think that there's a google hash map project somewhere on sourceforge which uses some tiny amount of memory [AFAIR, it was something like 2 bits per entry] to access elements in a tree fashion like I describe. You may want to check it out.

Of course, this method really breaks down [memorywise] if you're going to be putting a lot of keys into the hashmap. But you can still shortcut somewhat the lookup by storing the address to the node which has resolved key&0xfffff???.

Sorry if I'm not explaining this particularly well, but that's more or less the approach that I'd use.

--CJM

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Quote:
Original post by bakery2k1
This is the only mask (or maybe one of two) that I will be using. So, the tree-like structure, whilst sounding cool, should not be necessary.
Use 2 hash tables. When putting stuff to the container, if (value & mask != 0), put stuff in hash table 1. Otherwise put it in table 2. Likewise when you get stuff from container, you need one extra if-clause to know which hash table you should be using for look-up. Then when you need to clear stuff with this particular mask, clear table 1 entirely.

Share this post


Link to post
Share on other sites
Quote:
Original post by bakery2k1
My application needs to store key-value pairs, for 32-bit keys, in a data structure and look them up later. In order to do this, I am using a hash table (with chaining) with a prime number of "slots", and am hashing the key by simply taking its modulus with the number of slots.

However, I now have the need to remove all entries from the hash table where (key & 0xfffff000) has a certain value (call it removeKey). At the moment, I am doing this by:


for(int i = 0; i < 0x1000; ++i)
{
key = removeKey | i;
HashTableRemove(key);
}


However, profiling shows that my app is spending 50% of its entire processing time executing this loop.

Can anyone suggest a change to the data structure which will allow the removal of this block of keys (and their respective values) to be performed more quickly?

Please note that I am coding on a low-level, using pure C and even with no standard library, so "use stdext::hash_map" is not a very useful answer!

Thanks
First I'l tell you what I tell everyone trying to make their own hash tables, then I have a speedup to try.
I wouldn't bother with prime-number hash-table sizes. The only reason to do this is when your hash function isn't very good. CRC makes an excellent hash function, and then you can use a power of two and bitmasking for the table size.

Secondly, you could just write a function to iterate over every chain in the data and delete from the list, those that match your criteria. (The iterator approach I suppose). No need to evaluate the hash function and expensive modulus operator.

The other thing is that if it's taking that much time, then it must be called very often. So maybe you could set a flag to say if one of these things matching your criteria has even ben inserted into the hash-table, and if not, don't bother to call the remove function.

Share this post


Link to post
Share on other sites
Didn't read everything, so someone might already have mentioned it.

You might consider using stl, in particular the std::map or std::hash_map.
Stl is usually very fast (if want to test that with profiling, make sure
you profile in a release-build!)

Regards

Share this post


Link to post
Share on other sites
Didn't read everything, so someone might already have mentioned it.

You might consider using stl, in particular the std::map or std::hash_map.
Stl is usually very fast (if want to test that with profiling, make sure
you profile in a release-build!)

Regards

Share this post


Link to post
Share on other sites
Quote:
Original post by CJM
Hey,

As far as I see it you have 2 options [that I can see]. The first is a smaller hash table. The second is a less brute method to storing your data. That is, unless you're wanting to put your objects into a list every time you mark them for removal.

I'd have to reccomend using a tree instead of a straight array to store your data. You can even partition easily enough [each node gives the address of the child node that has binary code 1 or 0 for each bit or a 0 pointer on no children]. The 32nd node would then give the address of the actual object. That way every reference is 32 binary compares. I think that there's a google hash map project somewhere on sourceforge which uses some tiny amount of memory [AFAIR, it was something like 2 bits per entry] to access elements in a tree fashion like I describe. You may want to check it out.

Of course, this method really breaks down [memorywise] if you're going to be putting a lot of keys into the hashmap. But you can still shortcut somewhat the lookup by storing the address to the node which has resolved key&0xfffff???.

Sorry if I'm not explaining this particularly well, but that's more or less the approach that I'd use.

--CJM


With the tree method, to delete a certain (key & 0xfffff000) value then, I simply delete an entire subtree...Thanks, that might well work.

Quote:

Use 2 hash tables. When putting stuff to the container, if (value & mask != 0), put stuff in hash table 1. Otherwise put it in table 2. Likewise when you get stuff from container, you need one extra if-clause to know which hash table you should be using for look-up. Then when you need to clear stuff with this particular mask, clear table 1 entirely.


Sorry, you clearly did not understand the problem. I need to delete all keys for which (key & mask) has a certain value, and that value could be anything. I don't need to simply delete all keys for which (key & mask != 0).

Quote:

The other thing is that if it's taking that much time, then it must be called very often. So maybe you could set a flag to say if one of these things matching your criteria has even ben inserted into the hash-table, and if not, don't bother to call the remove function.


Thanks, I'm actually doing this already.

Quote:

Didn't read everything, so someone might already have mentioned it.

You might consider using stl...


If you're not going to read everything, at least read the OP!

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Quote:
Original post by bakery2k1
Sorry, you clearly did not understand the problem. I need to delete all keys for which (key & mask) has a certain value, and that value could be anything. I don't need to simply delete all keys for which (key & mask != 0).
I made a small typo, I meant (key & mask == removeKey) instead of (key & mask != 0).

Also, why weren't you satisfied with Extrarius' answer? I could think of two interpretations for your explanation. That the removeKey can change, for which Extrarius' answer is very good, and that removeKey is constant, for which my solution should be fast.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this