Jump to content

  • Log In with Google      Sign In   
  • Create Account

Banner advertising on our site currently available from just $5!

1. Learn about the promo. 2. Sign up for GDNet+. 3. Set up your advert!


Member Since 07 Oct 2007
Offline Last Active Yesterday, 10:34 AM

#5224925 GPU Ternary Operator

Posted by AliasBinman on 22 April 2015 - 03:56 PM

That optimization could potentially make things worse. This involves an indirect data look up which can be slower than simply a couple of predicated moves. 

#5222329 What to do when forward vector equals up vector

Posted by AliasBinman on 09 April 2015 - 05:30 PM

This is a good blog which may help you.



#5131093 Old school 3D engines

Posted by AliasBinman on 13 February 2014 - 12:56 PM

Have you read the Michael Abrash series of books. Its more than just graphics programming but still a great read.



#5121928 Filmic Tone Mapping Questions

Posted by AliasBinman on 07 January 2014 - 09:52 AM

This bit looks unneeded and should be removed


float exposure = key * L / avgL;


change it to 


float exposure = key  / avgL;


I don't see why you need the multiply by pixel luminance

#5060993 Blending Normal Maps

Posted by AliasBinman on 10 May 2013 - 10:15 PM


#5057308 Lighten/Darken a color

Posted by AliasBinman on 27 April 2013 - 01:27 PM

The simplest way is to do a lerp.


newcolor = lerp(color, vec3(1,1,1), x)   to Lighten

newcolor = lerp(color, vec3(0,0,0), x)   to Darken


In both cases x goes from 0 (no change) to 1 (full change)


if you don't know what a lerp is its a linear interpolate from 1 component to the next.


c = a + (b-a)*x

#5051881 Color correction - 3D LUT

Posted by AliasBinman on 10 April 2013 - 12:07 PM

You really need to ensure you remap the UV's to address from pixel centres to get the full range. This will stop the bleeding onto neighbour charts and give more exact results.

So if XY is RG and B is the chart index then do

UV.rg = (UV.rg * 15.0f/16.0f) + (0.5f/16.0f);
You may not need the (0.5f/16.0f) at the end depending on whether you need the half texel offset.

To filter across charts you'll need to do two taps and a lerp on the fractional distance between charts.

#5049806 Depth pre pass worth it ?

Posted by AliasBinman on 03 April 2013 - 08:17 PM

If you have some memory to spare then its worth doing a position only VB and IB for each mesh. As well as this just storing the position component of the original mesh it can usually contain a lot less vertices.

Think of the case of cubes with hard faces. This requires 24 vertices total (6 faces * 4 vertices). However the position only mesh just requires 8 vertices. This cuts down on the bandwidth for vertex fetching, needs less transforms and makes better use of the post transform cache. 


By using the position only VB and IB the original mesh can then be a standard interleaved format.


For the above don't forget to optimise for post-TC and pre-TC, use a position only decl and VS that only transforms position.


Secondly when doing you're frustum culling pass then use a different frustum for the prepass with a much closer far plane so the accepted number of meshes drawn is much lower. There is little gain to drawing meshes far away, firstly they are likely to cover little pixels on the screen, secondly they are less likely to occlude many pixels and thirdly HiZ buffers tend to really lose precision at far distances. It not uncommon to set the far plane to be as close as say 150m.

You can also cull meshes from the prepass using some heuristics. For example Don;t even bother considering meshes which are unlikely to cover many screen pixels.


I usually find it a gain to not draw alpha-tested objects in the prepass but to ensure they get drawn first in the base pass. That way they will benefit from opaque objects in the pre-pass and then update the depth buffer before the opaque objects in the base pass.

#5029238 Alpha Testing: I manage to "scorch" my edges

Posted by AliasBinman on 05 February 2013 - 05:54 PM

float4 PS_light_SHADOW(...) : COLOR
return tex2D(g_texturemap_sampler, a_texcoord0);

Change the body to

float4 col = tex2D(g_texturemap_sampler, a_texcoord0);
return float4(col.rgb/col.a, col.a);

#5029087 Alpha Testing: I manage to "scorch" my edges

Posted by AliasBinman on 05 February 2013 - 11:22 AM

A good trick to fix black borders especially when storing in 1bit DXT1 is to do the following in the shader.

color.rgb /=color.a;

So as you are start filtering towards a edge pixel you renormalize the colours.

#5026927 Geometry jumping around / distorted on close up(Depth problem?)

Posted by AliasBinman on 29 January 2013 - 02:45 PM

When you are 6000 units from the camera you have effectively chopped off up to 13 bits of fractional precision. Typically this shouldn't matter for the rendering as you should have aggregate transforms which work in camera space. Is it possible you are transforming from local to world then world to view? If so then combine them together on the CPU beforehand.

#5026813 About GPU-Memory interaction

Posted by AliasBinman on 29 January 2013 - 11:08 AM

A) Yes there will usually be a stall here. But rather than letting the GPU sit idle it will start to work on other pixels/vertices instead. GPUs can have many thousands of pixels/ vertices in some stage of execution at any point in time. One of the limiting factors is each element currently in progress requires some registers to store intermediate values so optimizing the shader to use less registers can help with ensuring there are enough elements in flight to hide these stalls.

B) Typically RTs are not in cache but they do have local ROP tiles which can cache data. These ROP tiles are flushed to VRAM when they are finished being written to or there is a RT switch.

C) Some render states can be pipelined with the draw call. Some can't and are set in one of many state contexts. Potentially some render state changes could cause the pipeline to flush or partially flush leading to bubbles of the GPU going idle. Which can and can't is very much hardware dependent. Also note that some render state switches could potentially cause a lot of work in the driver on the CPU side if the hardware doesn't directly support the feature or the CPU has to do some kind of processing on the data first.

#5012641 Triangle rasterization troubles

Posted by AliasBinman on 19 December 2012 - 06:09 PM

Typically the way to fix this is to quantize your screen-space vertices to some fixed grid. This grid can be finer than the size of a pixel. Then when creating your interpolants you have a finer resolution so can you step using a finer resolution than this quantized grid. If done correctly then the scanline interpolation should never under or over shoot the endpoints. I use fixed point math to store screen space positions and interpolants, though you can use fp math when calculating intermediate values.

#5006703 gloss mapping?

Posted by AliasBinman on 03 December 2012 - 12:14 PM

It is common in game engines now to store the spec power in log space, even with 8 bit precicion.

i.e see how UE4 does it here (slide 30)


Frostbite2 also does it this way too (albeit with maybe a slightly different range)

Also as mentioned above a saturate will be better. Its usually free whereas max is not.

#5005494 Cascade Shadow Map Pixel Shader

Posted by AliasBinman on 29 November 2012 - 05:23 PM

You cannot generally do it in the VS because a triangle can cross over cascade boundaries. If you can guarantee that it doesn't (using frustum checks on the CPU) then you can do it on the VS.

Typically for orthographic projected cascades you only need to do a single vector by matrix transform. Each cascade can then be done via a bias and scale operation which is a single MAD operation per cascade check.