View more

View more

View more

### Image of the Day Submit

IOTD | Top Screenshots

### The latest, straight to your Inbox.

Subscribe to GameDev.net Direct to receive the latest updates and exclusive content.

# very compact reflective shadow maps format

Old topic!

Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

11 replies to this topic

### #1Lightness1024  Members

Posted 31 August 2012 - 05:00 AM

Hello gamedev,

There is something I've been wanting to do for quite some time, I have an RSM that is stored in 3 textures:
one R32F, and two RGBA8.
• R32F : depth in meters (from camera that rendered the RSM)
• RGBA8 : normal storage in classical shifted to color space with " * 0.5 + 0.5 "
• RGBA8 : albedo
so we have 3 render targets simultaneously, and a waste of two alpha components.

For optimisation, because 3 RT can be very heavy for some cards, I thought about compacting ALL of that, into ONE RGBA16F texture.
• R : depth, part 1
• G : depth, part 2 (+ sign bit for normal)
• B : normal
• A : color
It must be compatible with DX9 so no integer targets, and not bit fiddling in shaders.

I thought of for the depth, a simple range splitting should do the trick.
we decide of some multiple of a distance that the "MSB" will store, and the LSB will store the rest.
example:
R = floor(truedepth / 100) * 100;
G = truedepth - R;


for the normal, we could store the x in the first 8 MSbits using the same trick, and the y in the 8 LSB.
the z can be reconstructed using the sign stored in the depth. knowing we are on a sphere there is just a square root to evaluate.
(and when reading depth we just always think of doing abs(tex2d(depth).r))

for the color, it would be a 16 bits color, stored in HLS with the same trick, again, of "floor" and "modulo" to park the values in 6/5/5 bits.

now,

knowing we have 16 bits IEEE754 half floats per channel here.
checking wikipedia, the precision is at least integer until 1024,
therefore should be increasing by steps of 32 between 32k and 65k.
and by steps of 0.00098 max between 0 and 1.

the issue is, what space slices should we use for the depth divisor ??
and would it be better stored if using a logarithmic depth ?
but in that case it would still need slicing since we need to store the depth in 32bits, so on two components, I suppose in that case the slicing will be logarithmic too ?

about the normal, I feel that there is a danger storing them like this, because some direction will have more precision than others.

the color is not really an issue, RSM don't need precise albedo.

what do you think ?

Edited by Lightness1024, 31 August 2012 - 05:03 AM.

### #2Hodgman  Moderators

Posted 31 August 2012 - 05:41 AM

Interesting idea.
Packing data into F16 components is possible, but it is tricky. It's best to think of F16 as a 1.5.10 format, where you've got a sign bit, an exponent and a fraction.
The sign field can be 0/1. Exponent is an integer from -14 to +15. Fraction is a binary fraction with a hidden/non-stored "1." in front of it, ranging from 1.0 to 1.9990234375 (incrementing at a resolution of 1/512).

As an alternative, you could consider outputting to 2x regular 8888 textures, which is the same output bandwidth. The advantage is that it's easier to deal with simple 8-bit fractions and you've got 8 of these components to split your data over.
e.g. you could store depth over 3 (24 bit), normal over 2 and albedo over the remaining 3 (maybe with normal.z's sign packed into one of the albedo channels).

### #3Lightness1024  Members

Posted 31 August 2012 - 04:33 PM

Yes, I thought of the 2xRGBA8 targets, it seems to have advantages. The only thing : I was afraid that there is a possibility of inferior perf, even if theoretically it shouldn't.
Because we verified that it happens on some cards, the example here, is the ATI FireGL 5600, (which is a bad card really); if you plug 4x32bits render targets you get much less perf than a 2x64bits render targets setup.
This is pretty crazy, and in my opinion, could be due to a voluntarily sub-optimized driver for marketing reason.
It is not impossible, we have seen examples : NVidia is allegedly doing that with OpenGL pixel read up between GeForce and Quadro lines; they are also putting some limitations into double computing performance for not-Tesla cards...

Well, but while between 2 and 4 RT, this reasoning could hold since it was verified once, between 1 and 2 it is a bit paranoiac.
So I might as well go for that, we can never tell before trying anyway.

24bits depth: yes, should be more than enough. (little thinking : a map thick of 1km will have a 60µm precision !)
Also, normal could be stored as 2 angles, but it requires some acos and atan2. I have no idea of the actual cost of those operations.

Thanks

### #4Lightness1024  Members

Posted 31 August 2012 - 06:28 PM

I just had another crazy idea, we can push it down to one RGBA8, 16 bits for depth, still enough because we get 1.5cm precision for a 1km thick map. (providing we store depth fitting the map bounding cube into the 0-65536 space)
then 8 bits for normal, 8 bits for color.

for the color, we know how 8 bits color looks, not terrible, but once dithered it is not bad, and then you can reconstruct the high precision color by blurring with neighbor pixels. we loose spatial resolution for colors, but hey, with this trick even 8 bits storage is near perfect. I proved it in paint shop pro: convert an image into dithered 8 bits, reconvert to 24 bits, blur. you get the same thing than the original blurred image ! Though this is to be expected because it is the same to say "I store my color into 4x8bits, but I put this info into my neighbor cells".
Doing a dithering with a shader should be feasible with a fixed pattern. I could even make a reconstruction aware of depth/normal difference to preserve high frequencies. (same issue than depth of field blurring, and PCF)

then the problem will be to store the normal into 8 bits.
with Euler angles: 4 bits for alpha, 4 bits for phi, it means only 22° precision.
or maybe with 9 bits, and hemisphere projection principle, it would mean we slice a hemisphere quadrant into 64 values
-> 16*16 square to project a quadrant -> which means roughly 5° of precision ?
but it will not be distributed evenly, the poles will have better precision the equator.

hm..

### #5Hodgman  Moderators

Posted 31 August 2012 - 09:26 PM

I proved it in paint shop pro: convert an image into dithered 8 bits

Was this converting it to a 256-colour palletized image, or actually a 3.3.2 mode? The former will give much better results than the latter, but will be hard to pull off in a single-pass shader.
You won't be able to generate a 256 colour palette on the fly (as that would require every pixel to be able to inspect every other pixel), so you'd have to use a fixed palette, and even then, choosing which palette entry you should quantize your input to will be difficult -- the naive solution requires you to compare against every palette entry. You could precompute a lookup-table, but it would be a few MB.

Another option; in GPU Pro 2, there's a chapter "Shader Amortization using Pixel Quad Message Passing" which explains that the pixel shader can actually share information with the neighbouring pixels in a 2x2 area, via the ddx/ddy functions. You could use this to share the 4 albedo values, average them so that you're only outputting a single albedo per 2x2 area, and then split the storage of the colour over that whole area (such as top-left writes red, bottom-right writes blue, other two write green).
e.g. In digital cameras, every pixel either captures a red, green OR blue value, and then a demosaicing filter merges them into an RGB image.

I'm not sure about 8-bit normals, but this page here is the bible for 16-bit normal formats. Maybe start with the spheremap transform, but halve the number of output bits..

Edited by Hodgman, 31 August 2012 - 09:28 PM.

### #6MrOMGWTF  Members

Posted 31 August 2012 - 11:18 PM

Hey, sorry for kind of offtopic but,

RGBA8 : albedo

What is albedo? Is it the flux of the object?

Edited by MrOMGWTF, 31 August 2012 - 11:19 PM.

### #7hupsilardee  Members

Posted 02 September 2012 - 04:59 AM

Hey, sorry for kind of offtopic but,

RGBA8 : albedo

What is albedo? Is it the flux of the object?

Albedo = diffuse color (I think)

Lightness1024 - 8 bit normals will look just awful when specular is applied (Maybe you could make it a 'low graphics' option?). I recommend you stick with a 64 bit buffer. Also, see this slideshow
http://www.insomniacgames.com/tech/articles/0409/files/GDC09_Lee_Prelighting.pdf
slides 12-14 for why you should store normals as spherical coordinates and not use z = sqrt(x^2 + y^2)

### #8Tournicoti  Prime Members

Posted 02 September 2012 - 06:43 AM

I think the misconception about "view space normals" is about assuming that z component is always negative, so thinking storing only x and y is enought.
In fact storing the sign of z is needed.
The normal can be stored (in world space directly) via its x and y components, and 1 bit for the sign of its z component.
And then : z = sign * sqrt(1 - x2 - y2 )

### #9Tasty Texel  Members

Posted 02 September 2012 - 08:29 AM

You don't get negative z values if you use "per-pixel view space normals". Obvious drawback is that you have to compute the required matrix on per-pixel base.

Edited by Bummel, 02 September 2012 - 08:43 AM.

### #10Lightness1024  Members

Posted 02 September 2012 - 12:10 PM

Hodgman : ah yes I've looked at this ddx/ddy article vaguely before, but I feel it should be simpler to do the exploded color storage, just using "floor"/"fmod" will give a pixel index that gives us enough to select between r, g and b. Thanks for the idea
I stumbled uppon the aras-p article comparing normal storage quality while looking for materials for this problem, I recommend it to future googlers of this thread.

Hupsilardee : they may look bad, but it doesn't matter much. It is important to keep nice normals in a deferred GBuffer, but for an RSM we don't care much really. In my case it will be used to initialize cells of light propagation (LPV).

### #11kalle_h  Members

Posted 02 September 2012 - 02:48 PM

You could use normal 8-bits to store index just like MD2 format does. For encoding normal to index you need texture look up table. Just use x and y as texture coordinate and remember handle -z cases too. Decoding is really simple lookup table texture or uniform array.

### #12Lightness1024  Members

Posted 05 September 2012 - 03:54 PM

Ok I tried a few things, I'll make a report:

first, I coded a 16 bits storage scheme for color using this code:
outrsm.r = albedo_color.g;
outrsm.g = (bitsnap5(albedo_color.r) * 256. + bitsnap3(albedo_color.b) * 8.) / 255.;
float bitsnap5(float v)
{
return floor(v * 32.) / 32.;
}
// same for bitsnap3 with 8


effectively coding on 8/5/3 bits for RGB.

decoding, this way:
clr.g = rsm.r;
clr.b = fmod(rsm.g * 256., 8.) / 8.;
clr.r = bitsnap5(rsm.g);


this is a 24 bit typical albedo image from sun point of view:

once encoded in 16 bits it gives:

so appart from the bug that makes whites yellow, there are no noticeable differences.
I have tried on a richer image, 24 bits:

once encoded in 16 bits, gives:

we see a bit of a loss in the sky: the gradient have now only 2 shades of blue, but it is barely noticeable.

so to go down to 8 bits we still need to separate storage into 2 pixels.
in the beginning I wanted to avoid to favor one direction (horizontal or vertical) so I went for "two diagonals" pattern:

so I did this code for encoding:
float2 txc = ScreenPosition.xy;
if (fmod(txc.x, 2) < 1)
{   // green components on even columns
outrsm.albedo.r = albedo_color.g;
}
else
{   // red and blue components on odd columns
outrsm.albedo.r = (bitsnap5(albedo_color.r) * 256. + bitsnap3(albedo_color.b) * 8.) / 255.;
}


and this for decode:

float4 ps_compact_albedo( PS_INPUT Input) : COLOR0
{
float c = SampleTex2dLod( Tex2DArg(DiffTexture), Input.Common.Texcoord.xy, 0.0f ).r;
float4 clr = (float4)0;
clr.a = 1;
float gsize = 256;
float4 globalRegionSize = float4(gsize,gsize,1/gsize,1/gsize);
float2 txc = Input.Common.Texcoord.xy * globalRegionSize.xy;
float c2;
if (fmod(txc.x, 2) < 1)  // even columns
{
clr.g = c;  // green here is our green. rest is to lookup:
if (fmod(txc.y, 2) < 1)
{   // green component 1 -> look for blue red in down right diagonal
c2 = SampleTex2dLod( Tex2DArg(DiffTexture), Input.Common.Texcoord.xy + globalRegionSize.zw, 0.0f ).r;
}
else
{   // green component 2 -> look for blue red in up right diagonal
c2 = SampleTex2dLod( Tex2DArg(DiffTexture), Input.Common.Texcoord.xy + float2(globalRegionSize.z, -globalRegionSize.w), 0.0f ).r;
}
}
else  // odd columns
{
c2.r = c.r;  // red blue is our red blue. green is to lookup:
if (fmod(txc.y, 2) < 1)
{   // RB comp 2 : green is up left
clr.g = SampleTex2dLod( Tex2DArg(DiffTexture), Input.Common.Texcoord.xy - globalRegionSize.zw, 0.0f ).r;
}
else
{   // RB comp 1 : green is down left
clr.g = SampleTex2dLod( Tex2DArg(DiffTexture), Input.Common.Texcoord.xy + float2(-globalRegionSize.z, globalRegionSize.w), 0.0f ).r;
}
}
clr.b = fmod(c2.r * 256., 8.) / 8.;
clr.r = bitsnap5(c2.r);
return clr;
}


and this gave:

which is almost acceptable because it was going to be used downsampled !
but I decided the unwanted parasite frequencies were dangerous for the stability of the light injection in the LPV cells.
so i went with the favor of vertical resolution, using the same encoding code, but a simpler reconstruction:

float4 ps_compact_albedo( PS_INPUT Input) : COLOR0
{
float gsize = 256;
float4 globalRegionSize = float4(gsize,gsize,1/gsize,1/gsize);
float2 txc = Input.Common.Texcoord.xy * globalRegionSize.xy;
float2 halfpixel = globalRegionSize.zw * 0.;
float c = SampleTex2dLod( Tex2DArg(DiffTexture), Input.Common.Texcoord.xy + halfpixel, 0.0f ).r;
float4 clr = (float4)0;
clr.a = 1;
float c2;
if (fmod(txc.x, 2) < 1)  // even columns
{
clr.g = c;  // green here is our green. rest is to lookup:
c2 = SampleTex2dLod( Tex2DArg(DiffTexture), Input.Common.Texcoord.xy + float2(globalRegionSize.z, 0)+halfpixel, 0.0f ).r;
}
else  // odd columns
{
c2.r = c.r;  // red blue is our red blue. green is to lookup:
clr.g = SampleTex2dLod( Tex2DArg(DiffTexture), Input.Common.Texcoord.xy - float2(globalRegionSize.z, 0)+halfpixel, 0.0f ).r;
}
clr.b = fmod(c2.r * 256., 8.) / 8.;
clr.r = bitsnap5(c2.r);
return clr;
}


now we get:

which actually seems blurrier to the eye but we have the same amount of information.

however, while it looks totally OK on this image, back to our typical albedo image, we get a serious problem

(I forgot to copy this image, but you will be able to imagine from the next one)
the problem is that at high frequency regions, we get nasty chromatic errors. this is due to the fact that we are keeping channels of two different colors that we reconstruct and assign to 2 pixels. imagine a region passing from all white to all black, we get a terrible pink line and a green other one at the frontier.
To avoid this issue, I have thought of the "Pixel Quad Message Passing" paper from gpu pro2 that was mentioned above in the thread.
We can reconstruct the color of our neighbor pixel using ddx function:

float2 txc = ScreenPosition.xy;
if (fmod(txc.x, 2) < 1)
{   // green components on even columns
outrsm.albedo.r = albedo_clr3.g;
}
else
{   // red and blue components on odd columns
// use ddx to discover the color of our neighbor. (we voluntarily loose color at this pixel). c.f. paper "Pixel Quad Message Passing" in gpu pro2
float3 colorDiff = ddx(albedo_clr3);
albedo_clr3 = saturate(albedo_clr3 - colorDiff);
outrsm.albedo.r = (bitsnap5(albedo_clr3.r) * 256. + bitsnap3(albedo_clr3.b) * 8.) / 255.;
}


now the resulting image becomes:

which you don't know but it looks much better on the building windows. in the image without ddx we got lots of those pink and green lines instead of seeing ANY cyan color, that we perfect get here.

however my graphic card being and ATI (firegl 4800), every errors are not suppressed with that method. it could work better with nvidia, need to try. (cf PQA paper)

but this costed me a lot of time and I feel the gains are potentially Nil, because most cards are not limited by ROP/bandwidth output. For the moment I dropped this research, and went for dual render target with 24 bits color.

Old topic!

Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.