Reducing byte transfer between C++ and HLSL.

Graphics and GPU Programming Programming

Started by Happy SDE January 03, 2016 02:55 AM

12 comments, last by Matias Goldberg 8 years, 3 months ago

Happy SDE

2,247

Author

January 03, 2016 02:55 AM

My idea is simple: all the colors I have, are in RGB(A) format [0-255, 0-255, 0-255, 0-255].

For me it seems a good idea to pass only 4 bytes as one unsigned value and somehow (automatically) decode it in a shader to unorm float[3].

Instead of a way I always follow: pass 12 bytes as float[3] from C++ directly.

Is it possible to reinterpret 4 bytes as float4 inside a shader?

C++


struct RdAmbientLight
{
  unsigned colorUNorm = 0xFFFFFFFF;
  //Other data
};

void DefPass2::createPsAmbientColorCb()
{
  CD3D11_BUFFER_DESC constantBufferDesc(sizeof(RdAmbientLight), D3D11_BIND_CONSTANT_BUFFER, D3D11_USAGE_DYNAMIC, D3D11_CPU_ACCESS_WRITE);
  m_device->CreateBuffer(&constantBufferDesc, nullptr, &m_psAmbientCb);
}

HLSL:


cbuffer AmbientLight: register(b1)
{
   unorm float4 Color; //< Some automatic decoding
};

fastcall22

10,918

January 03, 2016 03:17 AM

Yes, it is possible, but...

You'll do more work unpacking the unsigned on the GPU than you would just passing in the other three floats-- by several orders of magnitude.

Happy SDE

2,247

Author

January 03, 2016 03:50 AM

fastcall22, on 03 Jan 2016 - 06:17 AM, said:

Yes, it is possible, but...

Is it possible via some HLSL syntaxes or I should unpack it manually (like use << and divide each color value by 255)?

Hodgman

52,717

January 03, 2016 03:55 AM

Yes you can just pass it as an int (4 bytes), and then shift and mask to get the 3 bytes out, and then multiply by 1/255.0 to convert to normalized floats.

This may be faster or slower than sending full floats, depending on the situation.

. 22 Racing Series .

Happy SDE

2,247

Author

January 03, 2016 04:07 AM

Hodgman, on 03 Jan 2016 - 06:55 AM, said:

Yes you can just pass it as an int (4 bytes), and then shift and mask to get the 3 bytes out, and then multiply by 1/255.0 to convert to normalized floats.

This may be faster or slower than sending full floats, depending on the situation.

Thank you, Hodgman!

I just thought there is a native way for video card to make this translation for free which I am not aware of.

Adam Miles

3,468

January 03, 2016 04:22 AM

If you're desperate, you could create a standard ID3D11Buffer of 4 bytes and a ShaderResourceView of format DXGI_FORMAT_R8G8B8A8_UNORM. That'll auto-unpack for you on the GPU side, but it won't be a constant buffer any more so might cost you a tiny bit of GPU performance on some hardware.

Adam Miles - Principal Software Development Engineer - Microsoft Xbox Advanced Technology Group

Hodgman

52,717

January 03, 2016 04:29 AM

Depends how you're sending the data - in buffers or textures, you can specify the R8G8B8A8_UNORM format.
Constant-buffers only support 32bit integers, not 8bit, so you have to manually unpack.

. 22 Racing Series .

Matias Goldberg

9,637

January 03, 2016 05:55 AM

This is one of the places where OpenGL is ahead of D3D.

OpenGL has unpackUnorm for this. It's cumbersome but gets the job done. On most modern hardware, this function maps directly to a native instruction. Unfortunately, as far as I know HLSL has no equivalent.

However you do have f16tof32 which is the next best thing.

Edit: Someone already wrote some util functions. With extreme luck the compiler recognizes the pattern and issues the native instruction instead of lots of bitshifting, masking and multiplication / division. You can at least check the results on GCN hardware using GPUPerfStudio's ShaderAnalyzer to see if the driver does indeed recognize what you're doing (I don't think it will though...).

Twitter: @matiasgoldberg

Distant Souls ? Alliance AirWar ? My Free Royalty-Free Music Library

Adam Miles

3,468

January 03, 2016 07:42 PM

Edit: Someone already wrote some util functions. With extreme luck the compiler recognizes the pattern and issues the native instruction instead of lots of bitshifting, masking and multiplication / division. You can at least check the results on GCN hardware using GPUPerfStudio's ShaderAnalyzer to see if the driver does indeed recognize what you're doing (I don't think it will though...).

I'm not even sure GCN has an instruction for what he wants to do. The best I can figure out it would be 4 v_cvt_f32_ubyte[0|1|2|3] and then 4 v_mul_f32 by 1/255.0f.

Adam Miles - Principal Software Development Engineer - Microsoft Xbox Advanced Technology Group

Matias Goldberg

9,637

January 04, 2016 12:40 AM

I'm not even sure GCN has an instruction for what he wants to do. The best I can figure out it would be 4 v_cvt_f32_ubyte[0|1|2|3] and then 4 v_mul_f32 by 1/255.0f.

Maybe yes, maybe not, but what I mean is that it's still very far from doing 4 loads, 4 bitshifts, 4 'and' masks, 4 conversions to float, then the 1/255 mul.

Edit: Checked, you're right about the instructions. "fragCol = unpackUnorm4x8(val);" outputs: (irrelevant ISA code stripped):


  v_cvt_f32_ubyte0  v0, s4                                  // 00000000: 7E002204
  v_cvt_f32_ubyte1  v1, s4                                  // 00000004: 7E022404
  v_cvt_f32_ubyte2  v2, s4                                  // 00000008: 7E042604
  v_cvt_f32_ubyte3  v3, s4                                  // 0000000C: 7E062804
  v_mov_b32     v4, 0x3b808081                              // 00000010: 7E0802FF 3B808081
  v_mul_f32     v0, v4, v0                                  // 00000018: 10000104
  v_mul_f32     v1, v1, v4                                  // 0000001C: 10020901
  v_mul_f32     v2, v2, v4                                  // 00000020: 10040902
  v_mul_f32     v3, v3, v4                                  // 00000024: 10060903

Edit 2: Well, that was disappointing. I checked the manual and GCN does have a single instruction for this conversion, if I'm not mistaken it should be:


tbuffer_load_format_xyzw v[0:3], v0, s[4:7], 0 idxen format:[BUF_DATA_FORMAT_8_8_8_8,BUF_NUM_FORMAT_UNORM]

Twitter: @matiasgoldberg

Distant Souls ? Alliance AirWar ? My Free Royalty-Free Music Library

Reducing byte transfer between C++ and HLSL.

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reducing byte transfer between C++ and HLSL.

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines