1 hour ago, mark_braga said:
Sorry, but what are these rules?
That the root parameter block size is 64 DWORDs on every GPU, a table takes up 1 DWORD, a root-constant takes up 1 DWORD per element, and a root descriptor takes up 2 DWORDs.
If you really want a particular parameter to go into a HW register instead of being spilled to memory, put it earlier in the root descriptor.
e.g. Let's say that AMD can support 6 root descriptors in hardware -- if you use 12 root descriptors, the first 6 will go into HW registers (fast for shader to read, fast for driver to update), and the other 5 will spill into memory (very very slight penalty for the shader setup, more cost for the driver to update). So, if some of those descriptors change frequently, you should put the frequently-changing ones first in the root, and the rest afterwards. No matter the actual HW root register size, this will ensure that your most dynamic resources are the most likely to go into HW registers.
After that, if you want to micro-optimize for different GPUs... yeah, there is no official API to help with this. You're down to asking individual vendors for advice and applying that advice for every different GPU model out there...