Jump to content

  • Log In with Google      Sign In   
  • Create Account

Ciprian Stanciu

Member Since 12 Jul 2002
Offline Last Active Jun 23 2014 04:18 AM

Topics I've Started

Have you used GL_ARB_shader_subroutine or DX Dynamic Shader Linking ?

28 January 2014 - 07:12 AM

Have you used GL_ARB_shader_subroutine or DX Dynamic Shader Linking ?


I'm curious to know the performance advantages/disadvantages you got from implementing this over a bigger number of shader swaps. So far I've only tested this on a Shadow Map pass where I write only to the Z buffer, and made a shader that has skinning on/off based on the subroutines. I only got 1 skinned object though and 100 other ones. This was the quickest change I could make to observe the performance difference and to my surprise I have 1% lower overall performance. I imagine there's a number of shader swaps at which point it's faster to use subroutines. I'm also using an AMD HD7850 and I noticed lower GPU usage ratio when using subroutines, as if the driver is doing more work. This is very similar with separate shader objects where I observe a 50% drop in performance and a 50% drop in GPU usage while driver calls like glBindProgramPipeline take a whole lot more than glUseProgram, so I'm also questioning driver quality for this feature.

Generating Barycentric Coordinates similar to hardware tesselation ?

16 September 2013 - 06:30 PM

I was thinking about a way to do tessellation under OpenGL ES 3 using instancing. What I thought about doing is store 3 times the vertex data per vertex ( the 3 constituent vertices of a triangle, while ofcourse giving up on indexing ) and then set up a uniform block with the barycentric coordinates and using instancing I could render 3*n triangles per each original triangle and then interpolate for their final values. This will later on be coupled with vertex shader texture sampling into a depth map. This all sounds good in theory but I'm wondering if there is already out there a library of some sort to give me an n-complexity set of barycentric coordinates for sub-triangles. Like n=1 (instance) would result in 1,0,0 - 0,1,0 - 0,0,1 or the original triangle. n=3(instances) would create a vertex inside the triangle and the 3 related sub-triangles. I know GL4+ hardware does this, I'm looking for some already written code that does this, if you happen to know.

GL_ARB_separate_shader_objects Performance ?

30 August 2013 - 01:34 AM


As soon as I learned about GL_ARB_separate_shader_objects I imagined that I would have huge performance benefits because with this one OpenGL would look and feel more like DirectX9+ and my cross-API code could remain similar while maintaining high performance.

To my surprise, I implemented GL_ARB_separate_shader_objects only to find out that my performance is halved, and my GPU usage dropped from ~95% to 45%. So basically, having them as a monolithic program is twice as fast as being separate. This is on an AMD HD7850 under Windows 8 and OpenGL 4.2 Core.

I originally imagined that this extension was created to boost performance by separating constant buffers and shader stages, but it seems it might have been created for people wanting to port DirectX shaders more directly, with disregard to any performance hits.

So my question, is if you have implemented this feature in a reasonable scene, what is your performance difference compared to monolitic programs ?

Single vs Multiple Constant Buffers

30 August 2013 - 12:45 AM

So basically after upgrading from DX9 to DX10 I read a lot of docs from Microsoft about how it's better to organize constant buffers by update frequency, so I made 3 types of constant buffers:


PerFrame (view & projection matrix)

PerMaterial (MaterialColor, specular, shinyness,etc)

PerObject ( world matrix )


I didn't really thought about performance considerations though but one day it stroke me, how about I just make 1 buffer to encapsulate all data ? So after I did this I noticed that performance actually increased by ~3-5%, even though I was updating an entire sightly bigger buffer. I thought that maybe drivers at the time (I was having a HD5770, a first gen DX11 device) are not that optimized for multiple constant buffers and reverted back to multiple buffers.


I now have a HD7850 and after doing this little test again, I'm seeing a performance boost of up to +50% for ~100 drawcalls when having a single huge constant buffer per object. So in effect, the difference is not smaller, it's bigger, signalling that there's something inherently wrong with having too many constant buffers binded. I'm now assuming this may be because my buffers are fairly small. The huge constant buffers is around 460 bytes ( I only have 4 matrices, one light and a few other variables ), so perhaps the multiple buffer switches are more advantageous when you are doing something like fetching an entire vertex buffer (for real-time ambient aocclusion based on vertices) or when you work with skinned meshes of 100 bones each.


My question is if you have tried to render a scene with multiple buffers and with a single huge buffer and compared performance ?

Specular Mapping in the Lighting Equation

18 March 2008 - 11:53 AM

I was wondering what is the correct lighting equation for use with specular mapping ? I'm using something like this but the specularity doesn't seem to be too correct :
 AmbientLight * DiffuseTexture
+DiffuseLight * DiffuseTexture
+SpecularLight* SpecularTexture.r;
Also, where's the best place to compute the half angle ? vertex or pixel shader ? [Edited by - cippyboy on March 19, 2008 8:04:28 AM]