Jump to content
  • Advertisement
Sign in to follow this  
Happy SDE

Reducing byte transfer between C++ and HLSL.

This topic is 903 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

My idea is simple: all the colors I have, are in RGB(A) format [0-255, 0-255, 0-255, 0-255].

For me it seems a good idea to pass only 4 bytes as one unsigned value and somehow (automatically) decode it in a shader to unorm float[3].

Instead of a way I always follow: pass 12 bytes as float[3] from C++ directly.

Is it possible to reinterpret 4 bytes as float4 inside a shader?

 

 C++

struct RdAmbientLight
{
  unsigned colorUNorm = 0xFFFFFFFF;
  //Other data
};

void DefPass2::createPsAmbientColorCb()
{
  CD3D11_BUFFER_DESC constantBufferDesc(sizeof(RdAmbientLight), D3D11_BIND_CONSTANT_BUFFER, D3D11_USAGE_DYNAMIC, D3D11_CPU_ACCESS_WRITE);
  m_device->CreateBuffer(&constantBufferDesc, nullptr, &m_psAmbientCb);
}

HLSL:

cbuffer AmbientLight: register(b1)
{
   unorm float4 Color; //< Some automatic decoding
};
Edited by Happy SDE

Share this post


Link to post
Share on other sites
Advertisement
Yes, it is possible, but...

You'll do more work unpacking the unsigned on the GPU than you would just passing in the other three floats-- by several orders of magnitude.

Share this post


Link to post
Share on other sites
fastcall22, on 03 Jan 2016 - 06:17 AM, said:

Yes, it is possible, but...

Is it possible via some HLSL syntaxes or I should unpack it manually (like use << and divide each color value by 255)?

Edited by Happy SDE

Share this post


Link to post
Share on other sites
Hodgman, on 03 Jan 2016 - 06:55 AM, said:

Yes you can just pass it as an int (4 bytes), and then shift and mask to get the 3 bytes out, and then multiply by 1/255.0 to convert to normalized floats.

This may be faster or slower than sending full floats, depending on the situation.

Thank you, Hodgman!

I just thought there is a native way for video card to make this translation for free which I am not aware of.

Edited by Happy SDE

Share this post


Link to post
Share on other sites

This is one of the places where OpenGL is ahead of D3D.

OpenGL has unpackUnorm for this. It's cumbersome but gets the job done. On most modern hardware, this function maps directly to a native instruction. Unfortunately, as far as I know HLSL has no equivalent.

However you do have f16tof32 which is the next best thing.

 

Edit: Someone already wrote some util functions. With extreme luck the compiler recognizes the pattern and issues the native instruction instead of lots of bitshifting, masking and multiplication / division. You can at least check the results on GCN hardware using GPUPerfStudio's ShaderAnalyzer to see if the driver does indeed recognize what you're doing (I don't think it will though...).

Edited by Matias Goldberg

Share this post


Link to post
Share on other sites

Edit: Someone already wrote some util functions. With extreme luck the compiler recognizes the pattern and issues the native instruction instead of lots of bitshifting, masking and multiplication / division. You can at least check the results on GCN hardware using GPUPerfStudio's ShaderAnalyzer to see if the driver does indeed recognize what you're doing (I don't think it will though...).

 

I'm not even sure GCN has an instruction for what he wants to do. The best I can figure out it would be 4 v_cvt_f32_ubyte[0|1|2|3] and then 4 v_mul_f32 by 1/255.0f.

Share this post


Link to post
Share on other sites

I'm not even sure GCN has an instruction for what he wants to do. The best I can figure out it would be 4 v_cvt_f32_ubyte[0|1|2|3] and then 4 v_mul_f32 by 1/255.0f.

Maybe yes, maybe not, but what I mean is that it's still very far from doing 4 loads, 4 bitshifts, 4 'and' masks, 4 conversions to float, then the 1/255 mul.

Edit: Checked, you're right about the instructions. "fragCol = unpackUnorm4x8(val);" outputs: (irrelevant ISA code stripped):

  v_cvt_f32_ubyte0  v0, s4                                  // 00000000: 7E002204
  v_cvt_f32_ubyte1  v1, s4                                  // 00000004: 7E022404
  v_cvt_f32_ubyte2  v2, s4                                  // 00000008: 7E042604
  v_cvt_f32_ubyte3  v3, s4                                  // 0000000C: 7E062804
  v_mov_b32     v4, 0x3b808081                              // 00000010: 7E0802FF 3B808081
  v_mul_f32     v0, v4, v0                                  // 00000018: 10000104
  v_mul_f32     v1, v1, v4                                  // 0000001C: 10020901
  v_mul_f32     v2, v2, v4                                  // 00000020: 10040902
  v_mul_f32     v3, v3, v4                                  // 00000024: 10060903

Edit 2: Well, that was disappointing. I checked the manual and GCN does have a single instruction for this conversion, if I'm not mistaken it should be:

tbuffer_load_format_xyzw v[0:3], v0, s[4:7], 0 idxen format:[BUF_DATA_FORMAT_8_8_8_8,BUF_NUM_FORMAT_UNORM]
Edited by Matias Goldberg

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

Participate in the game development conversation and more when you create an account on GameDev.net!

Sign me up!