Is there a more-or-less kosher way to compress multiple colors into a single shader register? Let's say, in shader model 3?
I'm sort of hurting for registers right now, so I'm trying to scrounge around for places to compress data into fewer registers, or even combining multiple registers into the same register.
I think SM3 (pretty much) guarantees that you get a full 4x4 bytes per register. But color information doesn't need more than a single byte per color channel for even full 32 bit color. So it seems like it should be possible to pack multiple colors in to a single register. I could probably come up with something with a bit of thought, but I'm wondering if anyone has done all the hard work for me already All the triangles I'm rendering are flat shaded, so I don't even need to worry about interpolating the compressed register, though a method that allows the compressed version to be interpolated and get back interpolated colors in the pixel shader when you uncompress would be pretty sweet
Compressing multiple colors into a single shader register
bit logic isn't available on SM3.0, but the math can be done using float ops. Bit shifts are pretty much just multiplies/divides by powers of 2, etc.
Bewarned though, its not a simple amount of math, and is hideously expensive if you are doing this per vertex or per pixel. It would likely be faster to sample a point filtered texture for your extra data.
On a side note, SM3.0 you have 256 seperate registers for the vertex shader and pixel shader, how on earth have you managed to blow that and not kill performance? ;)
Bewarned though, its not a simple amount of math, and is hideously expensive if you are doing this per vertex or per pixel. It would likely be faster to sample a point filtered texture for your extra data.
On a side note, SM3.0 you have 256 seperate registers for the vertex shader and pixel shader, how on earth have you managed to blow that and not kill performance? ;)
Is there a more-or-less kosher way to compress multiple colors into a single shader register? Let's say, in shader model 3?
I'm sort of hurting for registers right now, so I'm trying to scrounge around for places to compress data into fewer registers, or even combining multiple registers into the same register.
I think SM3 (pretty much) guarantees that you get a full 4x4 bytes per register. But color information doesn't need more than a single byte per color channel for even full 32 bit color. So it seems like it should be possible to pack multiple colors in to a single register. I could probably come up with something with a bit of thought, but I'm wondering if anyone has done all the hard work for me already All the triangles I'm rendering are flat shaded, so I don't even need to worry about interpolating the compressed register, though a method that allows the compressed version to be interpolated and get back interpolated colors in the pixel shader when you uncompress would be pretty sweet
A simple way to pack 2 colors into one register is to pack one color in normalized space, that is 0..1, clamp 1 to 0.99999 and to use 'byte' space(0,1,2..255) for the other color. Packing,unpacking looks like this:
pack
vec4 packed_color = min(vec4(0.9999),first_color);
packed_color += floor(second_color*255.0);
unpack:
vec4 first_color = fract(packed_color);
vec4 second_color = floor(packed_color) / 255.0;
bit logic isn't available on SM3.0, but the math can be done using float ops. Bit shifts are pretty much just multiplies/divides by powers of 2, etc.
Bewarned though, its not a simple amount of math, and is hideously expensive if you are doing this per vertex or per pixel. It would likely be faster to sample a point filtered texture for your extra data.
Texture lookup is an idea, but it'd take a lot of lookups to get the same amount of data as a single register (even assuming floating point textures, you'd need 4 lookups).
If you implement teh decompression with divides it doesn't seem like a bad amount of math in terms of operation count. All the operations are vectorized, after all.
On a side note, SM3.0 you have 256 seperate registers for the vertex shader and pixel shader, how on earth have you managed to blow that and not kill performance? ;)
[/quote]
There's >200 constant registers. I still have plenty of those. But there's only 10 "interpolated" registers. I don't actually need them interpolated, but I do need them specific to each triangle. Which means passing them to the pixel shader as texture coordinates from the vertex shader. Even then, I haven't quite run out of interpolated registers, but I'm close (I'm at about 9.5 used registers).
My pixel shader is getting pretty beefy (~2K instructions), so I might need to cut out features to make it fit in lower end SM3 cards, etc., but for now I'm just stuffing everything I want it to do in to the shader.
A simple way to pack 2 colors into one register is to pack one color in normalized space, that is 0..1, clamp 1 to 0.99999 and to use 'byte' space(0,1,2..255) for the other color. Packing,unpacking looks like this:
pack
vec4 packed_color = min(vec4(0.9999),first_color);
packed_color += floor(second_color*255.0);
unpack:
vec4 first_color = fract(packed_color);
vec4 second_color = floor(packed_color) / 255.0;
That seems reasonable enough for 2 colors. But you're only using part of the mantissa, so you still have a lot of wasted bits, and you can't store 3 colors that way (the mantissa is only 23 bits wide).
As a matter of interest, how do you know you are hurting for registers?
I get fun error messages like this: "Problem building "Main.fx", "(1): error X5629: Invalid register number: 12. Max allowed for o# register is 11. ID3DXEffectCompiler: Compilation failed "
I'm guessing at around 2000 instructions, you aren't too concerned about performance though? (Thats pretty damn heavy weight for a GPU regardless of model. But you could probably write the shader significantly easier/cheaper under SM4.0/5.0 with the extra shader types)
But, regarding texture usage, the lookups themselves have a latency which will probably be hidden by the surrounding instructions (texture fetches typically don't block the GPU until the data is absolutely required for use)
But, regarding texture usage, the lookups themselves have a latency which will probably be hidden by the surrounding instructions (texture fetches typically don't block the GPU until the data is absolutely required for use)
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement