mul24 / mul_hi replacements

Graphics and GPU Programming Programming DX11

Started by MxADD February 22, 2015 07:29 PM

-1 comments, last by MxADD 9 years, 1 month ago

MxADD

500

Author

February 22, 2015 07:29 PM

I'm in need to optimize some nasty part in the vertex shader :)

The original code:

uint4 Ctrl; // x - divider [1, 2, 3, or 4]

// y - Mask

// z - unused

// w - unused

uint VtxInstc; // InstanceIndex (SV_InstanceID) [will be in range 1 - 256]

...

uint InstanceIndex = VtxInstc / Ctrl.x;

uint ShadowMapIndex = VtxInstc % Ctrl.x;

ShadowMapIndex = (Ctrl.y >> (ShadowMapIndex * 2)) & 0x3;

...

Looking at ISA (on GCN 1.1, Radeon HD 7870) i was shocked how many instruction it takes to do this block (around 42 :/)

So after a while ...

in C++ add those to the Ctrl:

static const uint32_t OPSGInfoLockZ[] =

{

0xAAAAAAAB,

};

static const uint32_t OPSGInfoLockW[] =

{

31, // 33 if in c++ since no mul_hi ...

};

Ctrl.z = OPSGInfoLockZ[Ctrl.x-1];

Ctrl.w = OPSGInfoLockW[Ctrl.x-1];

and in PSSL (PS4)

uint InstanceIndex = mul24(VtxInstc, Ctrl.z) >> Ctrl.w; // For divide by 3 it will be canceled out to 0

InstanceIndex |= mul_hi(VtxInstc, Ctrl.z) >> 1; // For divide by 3 it will have some result as Ctrl.z will be high enough

uint ShadowMapIndex = VtxInstc - mul24(InstanceIndex, Ctrl.x); // VtxInstc % Ctrl.x;

ShadowMapIndex = (Ctrl.y >> (ShadowMapIndex * 2)) & 0x3;

... ok ... 12 instructions ... not bad

Now time for DX11 (Shader model 5)

... and here is a problem and question, it is possible to port code above to HLSL (DX11) ?

i could live without mul24 (multiply 2 ints and return lower 24 bits of the result, this takes 1 cycle, compared to 2 cycles for ordinary multiply)

but not without mul_hi (multiply 2 32bit ints, and return upper 32 bit of the 64bit result)

basically I want to write in HLSL something like this (in c++):

uint InstanceIndex = (uint64(Ctrl.z) * VtxInstc) >> Ctrl.w;

but there is no uint64 type and using double type is overkill not an optimization :/

mul24 / mul_hi replacements

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

mul24 / mul_hi replacements

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines