Jump to content
  • Advertisement
Sign in to follow this  
MxADD

DX11 mul24 / mul_hi replacements

This topic is 1362 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi
 
I'm in need to optimize some nasty part in the vertex shader :)
 
The original code:
 
uint4 Ctrl; // x - divider [1, 2, 3, or 4]
// y - Mask
// z - unused
// w - unused
 
uint VtxInstc; // InstanceIndex (SV_InstanceID) [will be in range 1 - 256]
 
...
 
uint InstanceIndex  = VtxInstc / Ctrl.x;
uint ShadowMapIndex = VtxInstc % Ctrl.x;  
     ShadowMapIndex = (Ctrl.y >> (ShadowMapIndex * 2)) & 0x3;
 
...
 
Looking at ISA (on GCN 1.1, Radeon HD 7870) i was shocked how many instruction it takes to do this block (around 42 :/)
So after a while ...
 
in C++ add those to the Ctrl:
 
static const uint32_t OPSGInfoLockZ[] =
{
    1,
    1,
    0xAAAAAAAB,
    1,
};
 
static const uint32_t OPSGInfoLockW[] =  
{
    0,
    1,
   31, // 33 if in c++ since no mul_hi ...
    2,
};
 
Ctrl.z = OPSGInfoLockZ[Ctrl.x-1];
Ctrl.w = OPSGInfoLockW[Ctrl.x-1];
 
and in PSSL (PS4)
 
 uint InstanceIndex  = mul24(VtxInstc, Ctrl.z)  >> Ctrl.w;       // For divide by 3 it will be canceled out to 0
      InstanceIndex |= mul_hi(VtxInstc, Ctrl.z) >> 1;            // For divide by 3 it will have some result as Ctrl.z will be high enough
 uint ShadowMapIndex = VtxInstc - mul24(InstanceIndex, Ctrl.x);  // VtxInstc % Ctrl.x;
 ShadowMapIndex = (Ctrl.y >> (ShadowMapIndex * 2)) & 0x3;
 
... ok ... 12 instructions ... not bad
 
Now time for DX11 (Shader model 5)
 
... and here is a problem and question, it is possible to port code above to HLSL (DX11) ?
i could live without mul24 (multiply 2 ints and return lower 24 bits of the result, this takes 1 cycle, compared to 2 cycles for ordinary multiply)
but not without mul_hi (multiply 2 32bit ints, and return upper 32 bit of the 64bit result)
 
basically I want to write in HLSL something like this (in c++):
 
uint InstanceIndex  = (uint64(Ctrl.z) * VtxInstc) >> Ctrl.w;
 
but there is no uint64 type and using double type is overkill not an optimization :/

Share this post


Link to post
Share on other sites
Advertisement
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!