Fastest string hash algorithm

Started by
18 comments, last by _Sigma 16 years, 10 months ago
Quote:Original post by _Sigma
[....]
Quote:So why not just resolve it to a function ID in the first place, and store that instead of the string?
Because we have no way of knowing what function goes with a given ID. It has to be a declaration-int pair.
You obviously know which function goes with a given ID at some point or you wouldn't be able to look it up ever. If you're actually loading and parsing the script inside your time-critical area, that is your problem right there. If not, you can look up the ID when it first becomes available and then you don't need to look it up again.
"Walk not the trodden path, for it has borne it's burden." -John, Flying Monk
Advertisement
Quote:but I worry that it is a little slow


So you've not actually profiled you're just optimizing early for no good reason?

Now, as Sneftel said, you have no valid or good reason why the function Engine.RegisterKeyDown() cannot, given the string "void left()" turn that into (say) an int and then store that in std::map<std::pair<int, int> > instead of storing the string and calculating the ID each iteration. I don't think you get what we're saying as I see NO problem here like you're describing if you just look at our solutions.

You may not know it until runtime but who cares. My entire engine is data driven and nothing is known until it starts up. But once you register the left arrow key to be handled by a function you now KNOW what the function name is and can calculate the ID and store it. No need to recalculate this each iteration of the engine.

I think you're just hung up on hashing when none is needed :)

"Those who would give up essential liberty to purchase a little temporary safety deserve neither liberty nor safety." --Benjamin Franklin

Quote:Original post by Mike2343
Quote:but I worry that it is a little slow

So you've not actually profiled you're just optimizing early for no good reason?
Well I have, yet I have nothing to compare it too...Therefore I worry that it is not the best solution, and I'm looking for something faster. I'm pulling about 130fps with a debug build, and there is no rendering that is taking place, so something isn't running as fast as it should.

Quote:
Now, as Sneftel said, you have no valid or good reason why the function Engine.RegisterKeyDown() cannot, given the string "void left()" turn that into (say) an int and then store that in std::map<std::pair<int, int> >

Yes this seems like a good sugestion. Does this not require some form of hashing though?

Quote:
and calculating the ID each iteration.
I'm not...

Quote:If you're actually loading and parsing the script inside your time-critical area, that is your problem right there.

I'm not doing this, because, as you say, its slow.
Quote:
If not, you can look up the ID when it first becomes available and then you don't need to look it up again.

Well not quite. I need to look up the ID whenever a script funciton is called, so computing it is done once, but I need to hold onto it, else I have to recompute it which is very slow...

Quote: Quote:
Original post by _Sigma

Quote:
So why not just resolve it to a function ID in the first place, and store that instead of the string?

Because we have no way of knowing what function goes with a given ID.


I'm still a little confused by your situation, then.

I miss read your reply. I think that this is a good solution...

For the moment, it sounds like you guys are saying storing a string,int pair is dumb. So i'll move to a int,int pair. Yet, is there an even faster solution? Ontop of that, what is a good function for creating the int for the string?


Quote:
I'm pulling about 130fps with a debug build,...so something isn't running as fast as it should.

Metrics taken on unoptimized (e.g., debug) code are worthless, so that's not useful information in the slightest.

Quote:
Well not quite. I need to look up the ID whenever a script funciton is called, so computing it is done once, but I need to hold onto it, else I have to recompute it which is very slow... For the moment, it sounds like you guys are saying storing a string,int pair is dumb. So i'll move to a int,int pair. Yet, is there an even faster solution? Ontop of that, what is a good function for creating the int for the string?

It's not that slow.

It's also possible to get constant-time actual lookup of the data you're after if you take some precautions and appropriately cook the data (you'll have to be willing to accept the use of a proxy object, though, that is, you only use the string when you specify the mapping initially, and the process of creating the mapping produces something like a FunctionIdentifier which is used in all subsequent lookups; this is usually sufficient as there is often no need to look up via the actual string more than once, initially at start up time).
Quote:Original post by _Sigma
I'm pulling about 130fps with a debug build, and there is no rendering that is taking place, so something isn't running as fast as it should.


It's a debug build. It's designed not to run as fast as it should. If you wish to measure the performance of your program, turn optimizations on.

Quote:
Yes this seems like a good sugestion. Does this not require some form of hashing though?


This hashing would occur at script loading time in a dozen different ways (such as incrementing UIDs).

Quote:Well not quite. I need to look up the ID whenever a script funciton is called, so computing it is done once, but I need to hold onto it, else I have to recompute it which is very slow...


Stop representing your functions with strings, and start representing them witha scriptFunction type, which you can then implement using either a string or an ID.

Quote:Yet, is there an even faster solution?


Yes, there is an even faster solution. Use an integer index to represent your functions, making comparisons and lookups O(1) with a simple array. Then, use the time you saved (because the solution is extremely simple to implement) doing this to optimize a more critical operation.
Quote:Original post by _Sigma
[...]I miss read your reply. I think that this is a good solution...

For the moment, it sounds like you guys are saying storing a string,int pair is dumb. So i'll move to a int,int pair. Yet, is there an even faster solution? Ontop of that, what is a good function for creating the int for the string?
You don't need a map at all.
First, you associate "somestring" with KEY_LEFT which is some kind of constant.
Next, you make a vector<ID>, size it appropriately, then do KeyFuncs[KEY_LEFT] = Lookup("somestring") and then you have constant time access very easily.

Also, as others have said, using debug mode for anything other than actually debugging is worthless. STL will be always excessively slow in debug mode.
"Walk not the trodden path, for it has borne it's burden." -John, Flying Monk
Damn, I gotta quite posting when I'm in a bit of a rush. I appologize for not giving more concise replies above.

In regards to the debug build, yes, I understand it is un-optomized, and slow. Its just that it seems *quite* slow? Guess it doesn't matter. I'll rebuild with release and take a look there.

Quote:You don't need a map at all.
First, you associate "somestring" with KEY_LEFT which is some kind of constant.
Next, you make a vector<ID>, size it appropriately, then do KeyFuncs[KEY_LEFT] = Lookup("somestring") and then you have constant time access very easily.
Yes, this dawned on me a little while ago. I'm glad its a good approach! :) I will do this. Thanks.

Quote:It's also possible to get constant-time actual lookup of the data you're after if you take some precautions and appropriately cook the data (you'll have to be willing to accept the use of a proxy object, though, that is, you only use the string when you specify the mapping initially, and the process of creating the mapping produces something like a FunctionIdentifier which is used in all subsequent lookups; this is usually sufficient as there is often no need to look up via the actual string more than once, initially at start up time).
Ok, sounds like a good solution. I'll look into this. What would be a good way of making the Identifier from the string?

Cheers
Quote:In regards to the debug build, yes, I understand it is un-optomized, and slow. Its just that it seems *quite* slow? Guess it doesn't matter. I'll rebuild with release and take a look there.


If it were not optimized, it would be great. It contains completely different debug-grade implementation. Which means, every memory allocation is checked, safeguards are inserted before and after memory allocations, all STL concepts use different implementations that perform multiple sanity checks, everything is interlaced with exceptions and exception handlers.

Even if you were to compile this code with release settings, it would be considerably slower. But fool-proof code, along with no optimizations is disaster.

If I consider the performance tests that include STL classes, the performance drop between release and debug mode is around 100-times. This doesn't mean 5 fps in debug, 500 in release, but the SC++L dependant part is insanely slow.
Quote:
Ok, sounds like a good solution. I'll look into this. What would be a good way of making the Identifier from the string?

Essentially the way others have suggested. An intermediary object takes the string and the value the string represents, stores the value in a dynamic array (vector, probably) and returns a "handle" object that is a wrapper around the index that the object was stored at.
Alright, I'll give this a shot! Thanks

This topic is closed to new replies.

Advertisement