Draw item, sort key hashing

Started by
4 comments, last by Hodgman 7 years, 2 months ago

Hello,

how do you correctly generate a sort key for sorting your draw-items? Refering to hodgmans slides once again (with a similar system in mind):

https://www.dropbox.com/sh/4uvmgy14je7eaxv/AABoM4vmng4ORXlJOL_Ma0vla/Designing%20a%20Modern%20GPU%20Interface.pdf?dl=0

So say I want to generate a key like that:


struct Key
{
     uint64_t pipelineStates : 8;
     uint64_t effect : 8;
     uint64_t iaConfig : 16;
     uint64_t textures: 32;
}

Where textures is most significant to sort. Now for generating the key, I have the inputs:


Key buildKey(uint16_t effectID, uint8_t rasterizerID, uint8_t blendID, uint8_t depthID, uint16_t iaConfigID, ArrayView<const uint16_t> resourceListIDs)
{
}

So, what hash functions can I use here? Basically I have to hash effect from 16 to 8 bit, the pipeline states from 3 x 8 bit to one 8 bit, ia config can stay the same for now, and all 16-bit resource list ids (0-8) go into one 32 bit value.

Also for doing hybrid-depth sorting, I have to build another key out of 32-bit depth and the old key:


Key hybridKey(float depth, uint64_t oldKey)
{
}

Whats generally used here? For the resource lists on 32 bit, I can simply use std::hash with a custom hash-combine, but how do I go about the other ones (and even for resource lists there is probably a better way). Especially since I might tune the key at some point, assigning different bit-counts to the certain slots. Is there a general-purpose variable-bit hashing algorithm, or do you have to build seperate hashing functions for all of those?

Advertisement

If you search through the forum for "render queue" I pretty sure you find a post or two on generating sort keys. My memory could be playing tricks on me though.

Sorry if I'm off base in regards to what you're asking.

-potential energy is easily made kinetic-

I use FNV-1a for most things as it's super cheap and produces great distribution.

Check out these for a good collection of options though:
http://aras-p.info/blog/2016/08/02/Hash-Functions-all-the-way-down/
http://aras-p.info/blog/2016/08/09/More-Hash-Function-Tests/

To reduce a 32bit has to a 16bit hash (or 16 bit to 8bit), I usually just XOR the top half of the value over the bottom half.
uint16_t hash16 = (uint16_t)(((hash32>>16)^hash32)&0xffff);
uint8_t hash8 = (uint8_t )(((hash16>> 8)^hash16)&0xff );

However, if you've got a good hashing algorithm, then you should even just be able to slice part of the result off and get decent results, e.g.

uint16_t hash16 = hash32 & 0xffff;

If you've got a 64bit value where significance matters and you need to reduce it to a 32bit value, the xor trick won't work, because it combines, for example, bit #31 and bit #63. In these situations I've sometimes just thrown out every second bit like this:


u32 KeepEverySecondBit(u64 key)
{
	key &= 0x5555555555555555;//throw out every second bit -- 0a0b0c0d0e0f0g0h0i0j0k0l0m0n0o0p0A0B0C0D0E0F0G0H0I0J0K0L0M0N0O0P
	key = (key | (key >> 1)) & 0x3333333333333333;//          00ab00cd00ef00gh00ij00kl00mn00op00AB00CD00EF00GH00IJ00KL00MN00OP
	key = (key | (key >> 2)) & 0x0f0f0f0f0f0f0f0f;//          0000abcd0000efgh0000ijkl0000mnop0000ABCD0000EFGH0000IJKL0000MNOP
	key = (key | (key >> 4)) & 0x00ff00ff00ff00ff;//          00000000abcdefgh00000000ijklmnop00000000ABCDEFGH00000000IJKLMNOP
	key = (key | (key >> 8)) & 0x0000ffff0000ffff;//          0000000000000000abcdefghijklmnop0000000000000000ABCDEFGHIJKLMNOP
	key = (key | (key >>16)) & 0x00000000ffffffff;//          00000000000000000000000000000000abcdefghijklmnopABCDEFGHIJKLMNOP
	return (u32)key;
}

If you want to do this without actually throwing away any entropy, I guess you could use this to XOR together pairs of bits (e.g. bit #62 with bit #63):

u32 hash32 = KeepEverySecondBit(hash64)^KeepEverySecondBit(hash64>>1);

If you know your depth values are always positive (distances should be), then their bitwise representations will always be nicely ordered, so you can just bitwise cast/reinterpret those floats to an int, and then put them in the most significant bits of your key. Doing a bitwise negation on that value reverses the ordering.

e.g.

u32 intDepth = *(u32*)&floatDepth;

Back to front:
key = ((~(u64)intDepth)<<32) | KeepEverySecondBit(key);
Front to back:
key = (( (u64)intDepth)<<32) | KeepEverySecondBit(key);
Alternatively, you can normalize your depth values into a zero to one range, and then quantize them.
float normalized = depth / far_clip_plane;
int quantized_8bit = Clamp( (int)(normalized * 255 + 0.5), 0, 255 );
And then put these in the most significant bits of your key. e.g. you can just shift the key down to lose the least significant sorting data:
key = (((u64)quantized_8bit)<<56) | (key>>8);
For transparent passes you typically want to do an accurate (low/no quantization) back to front sorting.
For opaques that occur after a Z-pre-pass, you don't care about depth -- just sort by state changes.
For opaques that occur without a Z-pre-pass, do a very coarse front-to-back sorting (e.g. quantize depth to 4 bits!).

Thanks a lot, thats all I needed to know :)

EDIT: Well, one more thing, just to make sure I understood the order to sort by, can you confirm that this is how you mean it/this is how its done:

(least significant - most significant)

key: pipeline - shader - ia - textures

transparent: old-key (if even) - depth

opaque w/o z-prepass: coarse_depth - old-key

I was already a little confused by the slides, whether textures or shader are most important to sort - most information I could find online seemed to indicate that textures where the slowest to change, but in your examplary key, texture are in the most significant bits (0xXX00), as far as I can see?

The same for here, in your examplary code you put depth in the most-significant bits, but ie. for opaques I assume you should still prioritize the state-ordering, or am I mistaken in this assumption (since in the code you posted its the other way around)?

I used MurmurHash 3A then FNV-1A but switched to xxHash lately for fast as possible lookup speed and on top a better bit flip distribution.

https://github.com/Cyan4973/xxHash

I still use a compile time and runtime MurmurHash function for assets, because the asset names should be collide as less as possible.

I was already a little confused by the slides, whether textures or shader are most important to sort - most information I could find online seemed to indicate that textures where the slowest to change, but in your examplary key, texture are in the most significant bits (0xXX00), as far as I can see? The same for here, in your examplary code you put depth in the most-significant bits, but ie. for opaques I assume you should still prioritize the state-ordering, or am I mistaken in this assumption (since in the code you posted its the other way around)?

There was supposed to be an video recording of that presentation, but there were technical difficulties :(

In terms of GPU cost, changing pipeline is the most expensive thing to do, and changing resources is cheap. In terms of CPU cost, changing resources is expensive in D3D11/GL, but cheap in D3D12 (because 12 doesn't do any reference counting / hazard checking for you). In some articles on this topic they only measure the CPU cost, but IMHO a good sorting order should consider the GPU timeline to be more important.

The GPU-side cost changes every generation... On some prev-gen GPU's, a texture/constant/shader change would force the end of a processing segment, but certain pipeline state changes didn't... These days resources are more likely to be bound by a single "root pointer" register, and multiple draws can be in flight at once as long as they share the same PSO but use different root pointers.

So I put PSO at the top (MSB), then IA, then textures/buffers, then constants.

For transparent, I shift that key down 32bits and put precise depth in the most significant 32 bits.

For coarsely sorted opaque, I shift that key down N bits, and put quantized normalized depth in the most significant N bits (where N=5 currently IIRC).

This topic is closed to new replies.

Advertisement