Jump to content

  • Log In with Google      Sign In   
  • Create Account

#ActualSandman

Posted 19 April 2013 - 04:04 AM

To understand the suitability of any given container, it's quite useful to understand what is going on under the hood.

 

The hash table is basically an array of 'buckets'. When you retrieve an item, it generates a hash code from your key, and uses that to decide which bucket it is in. The buckets themselves are linear arrays, so if multiple items ended up into one bucket it must do a linear search to find the correct one.

 

A lot depends then, on the hash function that generates the hash code from your key. I'd assume that the java default string hashing function is pretty decent, so you can probably expect a reasonable spread of values, and therefore not too many entries going into the same bucket (an unhashed, linearly incrementing integer on the other hand, would pretty much guarantee you the worst case performance of O(n))

 

Another thing to consider is the load factor. The default value is 0.75 - which basically means that once the number of items reaches three quarters of the total number of buckets (the default capacity is 16), the structure doubles the number of buckets, so you can expect a certain amount of reallocation of memory and other work going on when this happens. You probably don't want this to happen too often, especially if the hash table is large and you're mid game, so you need to consider how much your table is likely to grow, if at all. 

 

So far we've only considered the actual performance of the container itself, but don't forget that we're doing other work here as well. Converting the floating point coordinates to a string will involve a whole load of work, plus memory allocation for the string, and then the string's hashing algorithm has to turn that into a hash code. This additional work is probably quite significant compared to the lookup itself. I'd guess that the time taken to convert the coordinates to a string would completely dwarf the time spent in the rest of the algorithm.

 

If you're looking for a fast algorithm that you can call many thousands of times a frame, this probably isn't it. There's a lot of pointless work going on with all the string conversion and hashing, so if you're doing a lot of lookups, you're going to suffer. Unless your chunk table is actually quite large, I wouldn't be too surprised if it turned out that a dumb linear search just based on the raw coordinates was quicker.

 

However if my understanding of your usage is correct, you're not actually searching this table that often. It's probably way off from being a performance bottleneck, in which case, don't worry about it.


#1Sandman

Posted 19 April 2013 - 04:03 AM

To understand the suitability of any given container, it's quite useful to understand what is going on under the hood.

 

The hash table is basically an array of 'buckets'. When you retrieve an item, it generates a hash code from your key, and uses that to decide which bucket it is in. The buckets themselves are linear arrays, so if multiple items ended up into one bucket it must do a linear search to find the correct one.

 

A lot depends then, on the hash function that generates the hash code from your key. I'd assume that the java default string hashing function is pretty decent, so you can probably expect a reasonable spread of values, and therefore not too many entries going into the same bucket (an unhashed, linearly incrementing integer on the other hand, would pretty much guarantee you the worst case performance of O(n))

 

Another thing to consider is the load factor. The default value is 0.75 - which basically means that once the number of items reaches three quarters of the total number of buckets (the default capacity is 16), the structure doubles the number of buckets, so you can expect a certain amount of reallocation of memory and other work going on when this happens. You probably don't want this to happen too often, especially if the hash table is large and you're mid game, so you need to consider how much your table is likely to grow, if at all. 

 

So far we've only considered the actual performance of the container itself, but don't forget that we're doing other work here as well. Converting the floating point coordinates to a string will involve a whole load of work (probably a fair bit of fp maths to work out each decimal place), plus memory allocation for the string, and then the string's hashing algorithm has to turn that into a hash code. This additional work is probably quite significant compared to the lookup itself. I'd guess that the time taken to convert the coordinates to a string would completely dwarf the time spent in the rest of the algorithm.

 

If you're looking for a fast algorithm that you can call many thousands of times a frame, this probably isn't it. There's a lot of pointless work going on with all the string conversion and hashing, so if you're doing a lot of lookups, you're going to suffer. Unless your chunk table is actually quite large, I wouldn't be too surprised if it turned out that a dumb linear search just based on the raw coordinates was quicker.

 

However if my understanding of your usage is correct, you're not actually searching this table that often. It's probably way off from being a performance bottleneck, in which case, don't worry about it.


PARTNERS