Digitalfragment, on 23 February 2012 - 08:59 PM, said:
bit logic isn't available on SM3.0, but the math can be done using float ops. Bit shifts are pretty much just multiplies/divides by powers of 2, etc.
Bewarned though, its not a simple amount of math, and is hideously expensive if you are doing this per vertex or per pixel. It would likely be faster to sample a point filtered texture for your extra data.
Texture lookup is an idea, but it'd take a lot of lookups to get the same amount of data as a single register (even assuming floating point textures, you'd need 4 lookups).
If you implement teh decompression with divides it doesn't seem like a bad amount of math in terms of operation count. All the operations are vectorized, after all.
Quote
On a side note, SM3.0 you have 256 seperate registers for the vertex shader and pixel shader, how on earth have you managed to blow that and not kill performance? ;)
There's >200 constant registers. I still have plenty of those. But there's only 10 "interpolated" registers. I don't actually need them interpolated, but I do need them specific to each triangle. Which means passing them to the pixel shader as texture coordinates from the vertex shader. Even then, I haven't quite run out of interpolated registers, but I'm close (I'm at about 9.5 used registers).
My pixel shader is getting pretty beefy (~2K instructions), so I might need to cut out features to make it fit in lower end SM3 cards, etc., but for now I'm just stuffing everything I want it to do in to the shader.
Ashaman73, on 24 February 2012 - 12:13 AM, said:
A simple way to pack 2 colors into one register is to pack one color in normalized space, that is 0..1, clamp 1 to 0.99999 and to use 'byte' space(0,1,2..255) for the other color. Packing,unpacking looks like this:
pack
vec4 packed_color = min(vec4(0.9999),first_color);
packed_color += floor(second_color*255.0);
unpack:
vec4 first_color = fract(packed_color);
vec4 second_color = floor(packed_color) / 255.0;
That seems reasonable enough for 2 colors. But you're only using part of the mantissa, so you still have a lot of wasted bits, and you can't store 3 colors that way (the mantissa is only 23 bits wide).
Crowley99, on 24 February 2012 - 02:18 AM, said:
As a matter of interest, how do you know you are hurting for registers?
I get fun error messages like this: "Problem building "Main.fx", "(1): error X5629: Invalid register number: 12. Max allowed for o# register is 11. ID3DXEffectCompiler: Compilation failed "