Sign in to follow this  
QNAN

HLSL (SM2/3): How do I transfer less-than-32 bit variables?

Recommended Posts

QNAN    232

I wish to transfer an instancing vertex, that beyond the matrix has four very small integers, two of which are indexes into a texturemap shared by several different other objects - a texturemap holding many different kinds of grass and flowers. I call each of them "frames" and the texture a "framemap"l The last two define the dimensions (x/y) of the frame map, that the indexes point into.

From the max dimensions and the indexes, the shader will be able to calculate the offset into the frame map. I found, that this should be cheaper to transfer than simply the float offsets, if I used special types.

 

The integers are so small, that I can get away with only 4 bits for each, as I allow the frame map to have maximum 16x16 frames. Combined the indexes- and dimensions-variables will occupy 4x4=16 bits.

 

However, I cannot find a data type for transferring (http://msdn.microsoft.com/en-us/library/windows/desktop/bb172533%28v=vs.85%29.aspx), which takes only 4 bits. Actually nothing that takes less than 32 bits. Is that really so?

I could live with packing variables together and unpacking on the other end, but if I am stuck with a minimum unit of 32 bits, then Im not sure it is worth the price.

 

Is there a solution to this? Is there any way I can transfer variables of less than 32 bits?

Share this post


Link to post
Share on other sites
mhagain    13430

You could go up to 32 bits exactly and transfer them as D3DDECLTYPE_UBYTE4 but this seriously does smell of premature optimization.  Try just transferring the full normal texcoords as floats anyway - you'll probably find that you're not really bound at this stage of the pipeline at all and that any attempt to reduce the data size doesn't make a blind bit of difference.

Share this post


Link to post
Share on other sites
QNAN    232

Bandwidth is usually a problem when rendering massive amounts of objects (which foliage can easily be), so I assumed, that it would be here too, although I have not tested yet.

If there is no elegant way to do it, I guess I will bump the variables (indexes/boundaries) to 8 bit and use the UBYTE4-structure. Im just a bit disappointed, that no solution exists for transferring custom-sized data pieces, as it can be a problem if transferring millions of data packets.

 

Even if this may be premature optimization, I thought I would benefit from knowing the transferring to the card in detail. And I think it is an interesting problem.

Share this post


Link to post
Share on other sites
Schrompf    1035

I assume the context to be PC games. It might look different when you're running on a console.

 

On DX9 level hardware, UBYTE4 is indeed the best you can get. You can do some bit swizzling in the shader to combine 2x 4bit into one of these 4 bytes, but you can't specify anything less than 32bit. Thus this compression scheme is only useful if you can make other use of the remaining 3 bytes. 

 

A warning from experience: it won't help you much.

 

- It can save a lot of GPU memory, but this is only useful if you're handling literally millions of instances. If you're doing just a few tens of thousands of instances, I wouldn't waste my time on it. 

- It can save quite some memory bandwith, but I never found this to be the limiting factor. The best I got from compressing my instancing vertex structure from 56 bytes down to 20 bytes was 30% performance gain.

- It can save you transfer bandwith in case you're updating the instancing data every frame. In that case I'd say it's worth the hassle, but I'd wager you have other problems then.

- It won't help you in any other case. 

 

A few months ago I wrote a voxel renderer that splatted millions of textured quads. I first tried to use instancing, but it was slow as hell. 4 million quads resulted in ~15fps on my Geforce GTX 460. When trying to find the bottleneck I noticed that all counters of NVPerfHUD together only accounted for 30% of the frame time, and 70% went to "somewhere". Then I tried Visual NSight, which was buggy as fuck but at least could show me the real cause: the Input Assembler. Then I removed all instancing and stored four unique vertex structures per quad, with a total 80 bytes per quad, and I got to 55fps. For the very same geometry, and four times the GPU memory bandwith. Something is happening on those modern cards that I can't explain. An ATI GPU showed the same behaviour.

Share this post


Link to post
Share on other sites
mhagain    13430

It's also the case that for huge numbers of objects you're more likely to bottleneck on fillrate (and potentially overdraw, depending on the type of object) than on vertices.  This can be observed with particle systems and would be true of foliage too.

Share this post


Link to post
Share on other sites
Bacterius    13165

Im just a bit disappointed, that no solution exists for transferring custom-sized data pieces, as it can be a problem if transferring millions of data packets.

Well, graphics cards can't deal too well with data smaller than 32 bits, especially unaligned data, so what you'd save in memory bandwidth, you would lose in processing efficiency. Really, you should get everything working first, and then benchmark (if you still notice a slowdown once everything is in place).

Share this post


Link to post
Share on other sites
QNAN    232
Thanks for the excellent input guys. Knowing that it is impossible to have less than 32 bit types is a big plus - at least I don't have to bang my head against an unbreakable wall :). It was also nice to hear about people's performance stories, as from them it sounds like Im not gonna run into a bandwidth problem as the first thing.

Thanks for the feedback, guys. Edited by QNAN

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this