**0**

# unique hashed pointers and statistical probability?

###
#1
Members - Reputation: **168**

Posted 30 August 2012 - 04:39 PM

For the sake of this question let's say I have a Car object made up of various Make, Model, Color, Package objects. That Car object is put into a Vector and sorted so that all the given catagories are lumped together appropriately.

If I only have a total of 8 Makes, and the Make is our most important catagory to sort by, then I use the 4 most significant bits of our key. If I have 100 models per make thats the next 7 bits. 20 colors per model is the next 5 bits. So on.

Where my question comes in is this. If I'm using a 32 pointer for my Make, Model, Color, Package objects, I need to somehow reduce that size down to the size of associated bits in the key. So for my Make bits I need to reduce the size of my Make pointer down to 4 bits and still maintain some level of integrity in regards to it's uniqueness. For obvious reseaons I can't have my Ford and Chevy come out the same at 4 bits.

The method I've been playing with so far has looked something like this:

I take bottom 16 bits of my pointer and AND them together with my top 16 bits. So:

TempID = MakeID & 0x000fff

TempID <<= 16

MakeId &= TempID

At this point it seems to me (and please correct me if I'm wrong) that there is only one other pointer that would give me the same result and the statistical probability of getting that pointer would be incredible minute. But I need it to be 4 bits and not 16. So I can do that two more times. What I'm worried about is the ability to maintain the unique-ness and how important that unique-ness is in terms of the statistical possibility of getting identicle results from another pointer.

I hope this all makes sense. It's giving me a headache just thinking about it. lol.

###
#2
Crossbones+ - Reputation: **8530**

Posted 30 August 2012 - 05:44 PM

If you mean how many different pointers which point to the same (Make, Model,Color, Package) information would produce the same resulting "hash", then there are much more than 2 possible pointers, more along the lines of 2^16 / sizeof(Car). Since you throw away the high 16 bits of each pointer, they could be anything, so with the same Car data you could have such pointers:

0x00008181 0xFFFF8181 0xABCD8181 0xDEAD8181

They will all produce a low word of 0x8181, and thus produce the same "hash" if they point to similar car data. Of course some of these pointers are impossible, but it really depends on the underlying virtual memory manager and how it pages memory - which you cannot reliably assume does what you think it will do.

What you can do is hash the entire pointer value using some uniform hash function, and exclusive-or it with the hash of the car data (yes, even if the car data is only 16 bits large, hash it to a 32-bit value). If the hash function is any good, the probability of a collision would be around 2^(-32), and that's the best you will get since the key is only 32 bits wide. To me, this is a horrifying collision rate, but it could be enough for you, I don't know what you need it for.

If you want a more specific analysis, then firstly, your information is 16 bits large, so it might be worthwhile to store it unchanged in the key, this will guarantee that no Car objects with different specs come out the same (so no Ford and Chevy producing the same key). Then, you have 16 bits left - and here you hit an information theory roadblock, you are trying to pigeonhole 32 bits of data (your pointer) into a 16-bit value. If the pointer was uniformly distributed, you are basically screwed. However, this is where you look at different pointers, and you should have noticed they are generally clumped at the 0x000FFFFF range, thus looking at the low-order bits of the pointer is the right thing to do. But again, if the memory manager suddenly decides to start allocating memory differently, this may break your code, making it very unsafe.

Remember though that hashing pointers can make your code non-portable on platforms with different pointer widths. Though the way you describe your issue sounds like it isn't a very good idea in the first place. Remember - if you can't explain your design simply, it's not good design.

**Edited by Bacterius, 30 August 2012 - 05:57 PM.**

The slowsort algorithm is a perfect illustration of the multiply and surrender paradigm, which is perhaps the single most important paradigm in the development of reluctant algorithms. The basic multiply and surrender strategy consists in replacing the problem at hand by two or more subproblems, each slightly simpler than the original, and continue multiplying subproblems and subsubproblems recursively in this fashion as long as possible. At some point the subproblems will all become so simple that their solution can no longer be postponed, and we will have to surrender. Experience shows that, in most cases, by the time this point is reached the total work will be substantially higher than what could have been wasted by a more direct approach.

- *Pessimal Algorithms and Simplexity Analysis*

###
#3
Moderators - Reputation: **15084**

Posted 30 August 2012 - 06:08 PM

If you have a pointer, you already have a

**guaranteed unique**identifier for the data item in question. No other data item could possibly share the same pointer.

So just reserve 32 (or 64) bits for your "key" depending on platform, and you're done. No need to complicate things :-)

[Work - ArenaNet] [Epoch Language] [Scribblings] [Journal - peek into my shattered mind]

###
#4
Crossbones+ - Reputation: **2003**

Posted 30 August 2012 - 06:10 PM

Hashing is normally used to turn an object of some kind into a key for efficiently finding related data. A hash function generally takes data of a known well-understood format, summarises the most unique or significant attributes of it, and then transforms it in a way that looks pseudo-random. At lookup time to avoid collision problems, the full key object is often compared with the key found in the bucket. That's why the pigeonhole problem... is not a problem. Multiple keys hashing to the same bucket only reduce performance, they don't introduce unpredictable behaviour. Collision in terms of two different keys ending up in the same bucket is not a problem. Looking up key A in a hashtable and getting data returned for key B is a disaster.

You're talking about a key for sorting... that doesn't sound like hashing at all. On top of that, you're using pointers, numbers which are intrinsically meaningless and are only partly under your control.

###
#5
Members - Reputation: **168**

Posted 30 August 2012 - 06:34 PM

I'll try to explain this in regards to my needs:

I'm writing the graphics handling code for my project. I have a vector of RenderObjects. Each RenderObject has pointer to a Model, Texture, Material and Shader object which are all stored in another container. It also has a key that the vector is sorted by. The Key is made up of some form of unique identifier from each of those 4 objects with the Model identifier in the most significant bits followed by Material, Texture then Shader. The example I was following (which I cannot find now, should have book marked it) used various bit manipulations of the pointer to generate this key. The result, when the vector is sorted using this key, has all like models/textures/materials/shaders lumped together to minimize OpenGL calls. No need to load a model or bind a texture that's already loaded/bound, right?

###
#6
Moderators - Reputation: **15084**

Posted 30 August 2012 - 06:46 PM

[Work - ArenaNet] [Epoch Language] [Scribblings] [Journal - peek into my shattered mind]

###
#9
Moderators - Reputation: **29497**

Posted 30 August 2012 - 11:04 PM

Regarding the hashing of make pointers to get a (small) unique make ID... I'd just use indices to begin with instead of pointers. Then they'll start from 0 and increment up to numMakes.

A bit of trivia: the bottom 2 bits are (I take bottom 16 bits of my pointer and AND them together with my top 16 bits.

*almost*) always going to be 0 -- the reason being that your compiler will (

*probably*) be using 4-byte aligned structures.

**Edited by Hodgman, 31 August 2012 - 12:14 AM.**

###
#10
Moderators - Reputation: **15084**

Posted 30 August 2012 - 11:36 PM

Sorting by an integer key can be done in O(n) instead of O(n^2), so if you've got a really large data set it can make a huge difference.

Correct me if my big-O analysis is failing me (it's been a long week) but sorting n items on k criteria should be O(k * n * log n), not O(n^2). You're going to need a serious metric shitload of items before you realistically notice the difference between O(n log n) and [O(n) + whatever overhead your combination indexing mechanism has]. Unless you're pushing many millions of items per frame, chances are the gains in not context-switching the graphics hardware are going to totally drown the micro-optimization of the sorting itself.

[Work - ArenaNet] [Epoch Language] [Scribblings] [Journal - peek into my shattered mind]

###
#11
Moderators - Reputation: **29497**

Posted 31 August 2012 - 12:00 AM

Sorry, yes I meant n * log n, not n*n. I left k out because it's just a small constant.Correct me if my big-O analysis is failing me (it's been a long week) but sorting n items on k criteria should be O(k * n * log n), not O(n^2).

In my engine, I found that a simple 8-bit radix sort started to beat std::sort after sizes of about 200 integers (

*when inputs initially in random order*). On lists of about 10k items in random order, the radix sort was twice as fast -- but yes you're right, unless we're dealing with millions of items, we're talking about saving less than 1ms here (

*at 1M items with 3 predicates, I measured radix at 0.5ms and std::sort at 1.7ms*).

So it might be a good idea just to code it in the simplest way first, and then optimise later if it's actually a performance problem.

**Edited by Hodgman, 31 August 2012 - 12:08 AM.**

###
#12
Moderators - Reputation: **15084**

Posted 31 August 2012 - 12:44 AM

However, the other important factor when comparing against a radix sort here is semantics.

Suppose we measure that changing pixel shader parameter values is roughly 1/N the cost of changing vertex buffers. (I'm pulling this out of thin air because frankly it's been years since I did any performance measurements firsthand on what's expensive to context-switch on hardware, so my information may be out of date.)

If this is the case, we can change pixel shader parameter sets N times before we accrue the cost of a single vertex buffer change. Therefore, we should sort on all of our criteria such that we order the shader parameter changes together and the VB changes relatively far apart, insofar as this is possible with the data set we're trying to render.

*Just*using a radix sort with an arbitrary "permutation" index isn't guaranteed to give us these semantics by any means. For instance, if we have a data set of size 20, with 19 shader parameter changes and a single VB change, a naive radix sort has a strong probability of placing the VB change in the

*middle*of the parameter changes, possibly leading to extra pipeline stalls/flushes. If, on the flip side, we arrange for the VB change to come at the beginning or end of the batch, and group the less significant state changes together, we get a net perf win.

Of course it's entirely possible to select your permutation indices such that the radix sort

*does*give these semantics. However, at that point you're back to encoding the permutation space in some way in a small number of bits, versus just doing a multi-criteria sort on the 3-4 different grouping parameters that might come into play.

It's also worth noting that, if we limit ourselves to 64,000-odd maximum "kinds" of model, shader, texture, etc., we can encode the OP's 4 criteria in 64 bits. If we're on a 64-bit platform, poof, instant radix sort opportunity,

*plus*maintaining the grouping semantics we need.

So really I guess for me the bottom line is: do the algorithmically important thing first (group your state changes to minimize pipeline idle time and stalling) and then look at trivial ways to improve on that once the dominant factors are taken care of (do a 64-bit radix sort for instance). Both techniques can assuredly have great results, but the judicious combination is going to go much further than just naively applying one approach or the other in vacuum.

[Work - ArenaNet] [Epoch Language] [Scribblings] [Journal - peek into my shattered mind]

###
#13
Moderators - Reputation: **29497**

Posted 31 August 2012 - 02:48 AM

I don't get it - the order that you create the fields in your radix key should be the same as the order in which you chain your regular sorting criteria. Using the same order in both sorting algorithms should give the same output (Just using a radix sort ... isn't guaranteed to give us these semantics

...

Of course it's entirely possible ... that the radix sort does give these semantics...

*assuming both sorts are stable*). There's no difference, except in how you express your criteria -- comparison sorts only require that the criteria can be evaluated in relative terms when looking at two objects, whereas integer sorts require that each object can evaluate the criteria in isolation as an absolute value.

N.B. Even on a 64-bit platform, you'd probably still only use an 8 to 16 bit radix sort, as the cost of the partitioning step of the algorithm is based on the bit-width used per pass. e.g. 8passes*256 vs 4passes*65536 vs 1passes*10If we're on a 64-bit platform, poof, instant radix sort opportunity

^{^19}. I've seen 11-bit radix sorts used commonly on 32-bit values as a kind of sweet-spot.

There's two sides to the costs -- there's the CPU-side cost, where the GL driver takes your arguments, validates that you're not going to crash the GPU, and writes them into a command buffer. Sorting is an optimisation to reduce the number of GL calls -- in your example, the total number of GL calls is basically the same. On the GPU-side, most state changes are basically 'free' as long as there's enough pixels drawn in each 'group' of states to avoid pipeline bubbles forming. Sorting by alike states helps to maximize the amount of work done in each 'group'.Suppose we measure that changing pixel shader parameter values is roughly 1/N the cost of changing vertex buffers. (I'm pulling this out of thin air because frankly it's been years since I did any performance measurements firsthand on what's expensive to context-switch on hardware, so my information may be out of date.)

As above, this is irrelevant to the type of sorting algorithm used.

However, if we keep in mind that we're just sorting as an optimisation, it's not the end of the world if the sort results are invalid due to a hash collision. If one draw-call ends up in the wrong order, and forces us to switch textures before/after it, then our program still works, it's just sub-optimal.

**Edited by Hodgman, 31 August 2012 - 02:50 AM.**

###
#14
Crossbones+ - Reputation: **2663**

Posted 31 August 2012 - 09:52 AM

Regarding the actual problem, maybe the Car object shouldn't use pointers to its read-only components: Makes, Models etc. can be objects in an array, sorted as desired and identified by their index (unsigned integer). Cars would contain several of these indices, which unlike pointers could be used satisfactorily both for accessing the read-only objects (by an array lookup) and for sorting Car collections.

Compacting full-size integer indices into a bitfield is a variation that reduces storage space (good for file storage of Cars) and makes sorting faster but lookup slower; it might or might not be a performance improvement, and constraining the number of supported Makes, Models etc. might be unacceptable.