Jump to content
  • Advertisement
Sign in to follow this  
Numsgil

Compressing multiple colors into a single shader register

This topic is 2487 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Is there a more-or-less kosher way to compress multiple colors into a single shader register? Let's say, in shader model 3?

I'm sort of hurting for registers right now, so I'm trying to scrounge around for places to compress data into fewer registers, or even combining multiple registers into the same register.

I think SM3 (pretty much) guarantees that you get a full 4x4 bytes per register. But color information doesn't need more than a single byte per color channel for even full 32 bit color. So it seems like it should be possible to pack multiple colors in to a single register. I could probably come up with something with a bit of thought, but I'm wondering if anyone has done all the hard work for me already :) All the triangles I'm rendering are flat shaded, so I don't even need to worry about interpolating the compressed register, though a method that allows the compressed version to be interpolated and get back interpolated colors in the pixel shader when you uncompress would be pretty sweet :)

Share this post


Link to post
Share on other sites
Advertisement
bit logic isn't available on SM3.0, but the math can be done using float ops. Bit shifts are pretty much just multiplies/divides by powers of 2, etc.

Bewarned though, its not a simple amount of math, and is hideously expensive if you are doing this per vertex or per pixel. It would likely be faster to sample a point filtered texture for your extra data.

On a side note, SM3.0 you have 256 seperate registers for the vertex shader and pixel shader, how on earth have you managed to blow that and not kill performance? ;)

Share this post


Link to post
Share on other sites

Is there a more-or-less kosher way to compress multiple colors into a single shader register? Let's say, in shader model 3?

I'm sort of hurting for registers right now, so I'm trying to scrounge around for places to compress data into fewer registers, or even combining multiple registers into the same register.

I think SM3 (pretty much) guarantees that you get a full 4x4 bytes per register. But color information doesn't need more than a single byte per color channel for even full 32 bit color. So it seems like it should be possible to pack multiple colors in to a single register. I could probably come up with something with a bit of thought, but I'm wondering if anyone has done all the hard work for me already smile.png All the triangles I'm rendering are flat shaded, so I don't even need to worry about interpolating the compressed register, though a method that allows the compressed version to be interpolated and get back interpolated colors in the pixel shader when you uncompress would be pretty sweet smile.png

A simple way to pack 2 colors into one register is to pack one color in normalized space, that is 0..1, clamp 1 to 0.99999 and to use 'byte' space(0,1,2..255) for the other color. Packing,unpacking looks like this:

pack
vec4 packed_color = min(vec4(0.9999),first_color);
packed_color += floor(second_color*255.0);
unpack:
vec4 first_color = fract(packed_color);
vec4 second_color = floor(packed_color) / 255.0;

Share this post


Link to post
Share on other sites

bit logic isn't available on SM3.0, but the math can be done using float ops. Bit shifts are pretty much just multiplies/divides by powers of 2, etc.

Bewarned though, its not a simple amount of math, and is hideously expensive if you are doing this per vertex or per pixel. It would likely be faster to sample a point filtered texture for your extra data.


Texture lookup is an idea, but it'd take a lot of lookups to get the same amount of data as a single register (even assuming floating point textures, you'd need 4 lookups).

If you implement teh decompression with divides it doesn't seem like a bad amount of math in terms of operation count. All the operations are vectorized, after all.


On a side note, SM3.0 you have 256 seperate registers for the vertex shader and pixel shader, how on earth have you managed to blow that and not kill performance? ;)
[/quote]

There's >200 constant registers. I still have plenty of those. But there's only 10 "interpolated" registers. I don't actually need them interpolated, but I do need them specific to each triangle. Which means passing them to the pixel shader as texture coordinates from the vertex shader. Even then, I haven't quite run out of interpolated registers, but I'm close (I'm at about 9.5 used registers).

My pixel shader is getting pretty beefy (~2K instructions), so I might need to cut out features to make it fit in lower end SM3 cards, etc., but for now I'm just stuffing everything I want it to do in to the shader.


A simple way to pack 2 colors into one register is to pack one color in normalized space, that is 0..1, clamp 1 to 0.99999 and to use 'byte' space(0,1,2..255) for the other color. Packing,unpacking looks like this:

pack
vec4 packed_color = min(vec4(0.9999),first_color);
packed_color += floor(second_color*255.0);
unpack:
vec4 first_color = fract(packed_color);
vec4 second_color = floor(packed_color) / 255.0;



That seems reasonable enough for 2 colors. But you're only using part of the mantissa, so you still have a lot of wasted bits, and you can't store 3 colors that way (the mantissa is only 23 bits wide).


As a matter of interest, how do you know you are hurting for registers?


I get fun error messages like this: "Problem building "Main.fx", "(1): error X5629: Invalid register number: 12. Max allowed for o# register is 11. ID3DXEffectCompiler: Compilation failed "

Share this post


Link to post
Share on other sites
I'm guessing at around 2000 instructions, you aren't too concerned about performance though? (Thats pretty damn heavy weight for a GPU regardless of model. But you could probably write the shader significantly easier/cheaper under SM4.0/5.0 with the extra shader types)

But, regarding texture usage, the lookups themselves have a latency which will probably be hidden by the surrounding instructions (texture fetches typically don't block the GPU until the data is absolutely required for use)

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!