Strings are the interesting case when it comes to containers indexed by them.
Depending on various factors, all sorts of different data structures can come out fastest.
For example, if the strings are quite long in length and don't tend to have common prefixes (unlike directory paths for example) then a binary search can be significantly faster due to the fact that the string compares stop upon reaching the first non-matching char, and even repeating that log(n) times can be quicker than hashing a long string.
Another really fast method for longer strings is to use multi-way tries.
Hashing is the obvious choice when the strings aren't very long though, and there are more than about 10 of them. You need a good and fast hash though.
I always recommend and use a CRC function, with which you practically can't go wrong. It's blazingly fast, very few lines of code, and produces extremely good results.[cool]
Check out this page for more info.
Why is it that your hash function ONLY looks at the last character in the string? That's absolutely awful![sick]
[Edited by - iMalc on January 5, 2008 10:00:47 PM]
fast hash map
Quote:now i get 12ms vs 13.3ms... mine is faster... but my benchmarks aren't very accurate... anyone willing to do a better test?
what more can i do to make it faster?
The reason you're faster right now is because unlike hash_map, you're not hashing entire string, but only one character.
Your current improvement comes from simple constant factor.
There is no better test. Hash maps are very sensitive to type and amount of data. While it's somewhat possible to theoretically analyze them, it's mostly about hash functions.
Quote:Another really fast method for longer strings is to use multi-way tries
This would be "fastest" possible way if strings must truly be arbitrary. If I absolutely had no choice, this is what I'd go with. If strings however are known in advance, a 1-to-1 mapping using ints will result in only a few cycles per lookup, or, depending on compiler, might be completely evaluated in advance.
Quote:Why is it that your hash function ONLY looks at the last character in the string? That's absolutely awful!
Might not be. Depending on actual strings, if they're actual words, scanning from back will likely yield unique/diverse values after reading less characters than scanning from start.
But as always, it's impossible to say without actual dataset.
Quote:Original post by AntheusNo really it is that awful! I know there's a loop, but he's not scanning the string as such. The code literally only calculates a hash based on reading the very last character on the string repeatedly.Quote:Why is it that your hash function ONLY looks at the last character in the string? That's absolutely awful!
Might not be. Depending on actual strings, if they're actual words, scanning from back will likely yield unique/diverse values after reading less characters than scanning from start.
But as always, it's impossible to say without actual dataset.
For starters he'll never exceed 256 different hash values, though with text he'll most likely never get anywhere near half that value.
Quote:Original post by iMalc
Strings are the interesting case when it comes to containers indexed by them.
Depending on various factors, all sorts of different data structures can come out fastest.
For example, if the strings are quite long in length and don't tend to have common prefixes (unlike directory paths for example) then a binary search can be significantly faster due to the fact that the string compares stop upon reaching the first non-matching char, and even repeating that log(n) times can be quicker than hashing a long string.
Another really fast method for longer strings is to use multi-way tries.
Hashing is the obvious choice when the strings aren't very long though, and there are more than about 10 of them. You need a good and fast hash though.
I always recommend and use a CRC function, with which you practically can't go wrong. It's blazingly fast, very few lines of code, and produces extremely good results.[cool]
Check out this page for more info.
Why is it that your hash function ONLY looks at the last character in the string? That's absolutely awful![sick]
its not... its a setting u can set "Spread" to a larger value and it will take more characters...
in what way would CRC be better?
soo... is it good or bad?
Quote:Original post by Dragon_StrikeQuote:Original post by iMalc
Strings are the interesting case when it comes to containers indexed by them.
Depending on various factors, all sorts of different data structures can come out fastest.
For example, if the strings are quite long in length and don't tend to have common prefixes (unlike directory paths for example) then a binary search can be significantly faster due to the fact that the string compares stop upon reaching the first non-matching char, and even repeating that log(n) times can be quicker than hashing a long string.
Another really fast method for longer strings is to use multi-way tries.
Hashing is the obvious choice when the strings aren't very long though, and there are more than about 10 of them. You need a good and fast hash though.
I always recommend and use a CRC function, with which you practically can't go wrong. It's blazingly fast, very few lines of code, and produces extremely good results.[cool]
Check out this page for more info.
Why is it that your hash function ONLY looks at the last character in the string? That's absolutely awful![sick]
its not... its a setting u can set "Spread" to a larger value and it will take more characters...
No it won't. In the GetIndex() function, you only ever take the last character.
I am not sure you understand the bit shift operator. I can't see why you are using it the way you are.
Quote:Original post by Dragon_Strike
im experimenting with my own implementation of a hash map... im not worried about memory size and the key will always be a string... insertion speed is also not imoprtant... whats important for me is the "find" performance
In that case, you might be interested in a trie instead of a hash table, as a trie provides better performance for searching strings than binary search trees or imperfect hash tables.
Quote:
No it won't. In the GetIndex() function, you only ever take the last character.
yea... i had a typo there.. fixed now...
index += int(name[(name.size()-1)>>n]);
Quote:Original post by Dragon_StrikeQuote:
No it won't. In the GetIndex() function, you only ever take the last character.
yea... i had a typo there.. fixed now...
index += int(name[(name.size()-1)>>n]);
Tell me what you think that line does. In detail please, with an example string perhaps.
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement