Jump to content

  • Log In with Google      Sign In   
  • Create Account

Ciprian Stanciu

Member Since 12 Jul 2002
Offline Last Active Feb 06 2015 10:31 AM

#5146183 GL_TEXTURE_SPARSE_ARB on Nvidia

Posted by Ciprian Stanciu on 11 April 2014 - 02:12 AM

After about a couple of hours of swearing I realized the code from the slides is just wrong !

Thanks to Christophe Riccio and his samples ( http://ogl-samples.g-truc.net/ ) the code on the slides should've been :


glGetInternalformativ( GL_TEXTURE_2D, GL_RGBA8, GL_VIRTUAL_PAGE_SIZE_X_ARB, 1, &page_sizes_x[0]);


The difference being that it's GL_RGBA8 and then GL_VIRTUAL_PAGE_SIZE_X_ARB and not the other way around !

#5090366 Single vs Multiple Constant Buffers

Posted by Ciprian Stanciu on 30 August 2013 - 07:43 AM

There are around 107 objects and for the most part it's just 1-3 objects per material, so I have around 107 materials too. However, with the 3 constant buffers I was only updating the perframe buffer once per frame and then each material had it's own permaterial buffer and each object it's own perobject buffer that didn't change ( I don't animate any objects or material properties currently), so only the view/projection matrices changed.


Also, why do you say 1 bone would be 2x my current per object buffer ? 1 bone would be just a float4x4 so that's like 64 bytes. I'm planning to do some skinning in the near future and I have around 30 bones, so I'm curious how that will go.

#5090321 GL_ARB_separate_shader_objects Performance ?

Posted by Ciprian Stanciu on 30 August 2013 - 01:34 AM


As soon as I learned about GL_ARB_separate_shader_objects I imagined that I would have huge performance benefits because with this one OpenGL would look and feel more like DirectX9+ and my cross-API code could remain similar while maintaining high performance.

To my surprise, I implemented GL_ARB_separate_shader_objects only to find out that my performance is halved, and my GPU usage dropped from ~95% to 45%. So basically, having them as a monolithic program is twice as fast as being separate. This is on an AMD HD7850 under Windows 8 and OpenGL 4.2 Core.

I originally imagined that this extension was created to boost performance by separating constant buffers and shader stages, but it seems it might have been created for people wanting to port DirectX shaders more directly, with disregard to any performance hits.

So my question, is if you have implemented this feature in a reasonable scene, what is your performance difference compared to monolitic programs ?

#5090314 Single vs Multiple Constant Buffers

Posted by Ciprian Stanciu on 30 August 2013 - 12:45 AM

So basically after upgrading from DX9 to DX10 I read a lot of docs from Microsoft about how it's better to organize constant buffers by update frequency, so I made 3 types of constant buffers:


PerFrame (view & projection matrix)

PerMaterial (MaterialColor, specular, shinyness,etc)

PerObject ( world matrix )


I didn't really thought about performance considerations though but one day it stroke me, how about I just make 1 buffer to encapsulate all data ? So after I did this I noticed that performance actually increased by ~3-5%, even though I was updating an entire sightly bigger buffer. I thought that maybe drivers at the time (I was having a HD5770, a first gen DX11 device) are not that optimized for multiple constant buffers and reverted back to multiple buffers.


I now have a HD7850 and after doing this little test again, I'm seeing a performance boost of up to +50% for ~100 drawcalls when having a single huge constant buffer per object. So in effect, the difference is not smaller, it's bigger, signalling that there's something inherently wrong with having too many constant buffers binded. I'm now assuming this may be because my buffers are fairly small. The huge constant buffers is around 460 bytes ( I only have 4 matrices, one light and a few other variables ), so perhaps the multiple buffer switches are more advantageous when you are doing something like fetching an entire vertex buffer (for real-time ambient aocclusion based on vertices) or when you work with skinned meshes of 100 bones each.


My question is if you have tried to render a scene with multiple buffers and with a single huge buffer and compared performance ?