• Advertisement
Sign in to follow this  

OpenGL Dynamic lighting & sorting geometry to reduce switches

This topic is 3799 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi, I've been asking here several times about reducing texture switches and things like that. In most cases, I just shouldn't worry that much. But since my scenes are getting more complex, I still wonder if I'm doing it allright. In the perfect situation, I should have 1 VBO, and a minimum amount of texture switches (by the way, does switching other parameters such as vectors also count relative "heavily"?). But, the problem is that my polygons can differ pretty much: * other shader * other texture(s) / parameters for that shader * other cubemap for reflections * other assignment of dynamic lights First I splitted the world up into "material groups". For example, everything that has texture "wood01" will be placed in group 1. Each group had a list of geometry, which could be a VBO as well. But later on, I added cubeMaps for reflections. I can have wood everywhere, but each polygon could potentionally have another cubeMap for its reflections. So, in each material group, I made subgroups (sorted on cubeMap). The amount of polygons under each subgroup is now so small (2 polygons average, sometimes more though), that a VBO won't be very sufficient anymore I suppose. So far everything is running. Quite alot switching and no VBO's, but it runs. But now yet another difficulty comes, dynamic lighting. I never did this before, so correct me if I say stupid things. But I suppose you should tell for each polygon/face which (nearby) lights its using. Some faces could use only 1 light, others maybe 4, and so on. And the lighting pass could differ as well. Some materials use parallax mapping, others not. So I get different lighting shaders as well. As far as I know its still not really possible to make a shader that loops through a variable count of lights. And how to handle them with different types of lighting (omni, spot lights, directional, ...)? Now I have the following sorting/render flow:
	< for each "room" (I do portal culling) >
	*. First render opaque stuff
	*. Then render the transparent stuff
	1. Apply textures that are used for the entire room (lightMap for 
	2. Geometry is sorted on shader program (normal mapping yes/no, cubeMap
           reflections yes/no, and so on)
		- Now the shader program is activated for all upcoming subgroups
	3. Each shader-group has 1 or more "material groups". Each material is 
           a set of parameters for the shader program
	   (texture maps, vector parameters, ...)
	   	- Now the (texture)parameters are activated in the current   
                  shader for all coming "faces"
	4. Each material group is divided in "faces" (pairs of polygons, such 
           as a floor, wall or ceiling). Each face can have its own cubeMap 
           texture, and is assigned to a couple of dynamic lights.
	5. Each face will be rendered. Since the amount of polygons is low 
           (only 2 in many cases), no VBO's are used.
	   Just simple (OpenGL) operations to draw the geometry.
	   	- Polygons are using the current shader/material settings. This 
                  ussually means doing lightMapping (with normals), diffuse 
                  texture mapping, maybe a cubeMap refleciton, and 
	   	  eventually parallax mapping.
I don't know if this is the wisest way to do it. And the dynamic lighting has not been done yet. I could make a step 6. In most cases, the lighting pass(es) will need the same parameters (normal/specular/parallax map). But, then I have to deactivate the current shader for each face, apply the lighting shader, and then switch back to the previous one. Another way to do it would be doing the lighting afterwards. First I do the basic (static) rendering, and then I do the whole thing again, but then sorted on the lighting parameters per face/group. Pfff... Hard to explain. What I really want to know is, is this a proper way to do it? For example, how do games such as Doom3 or Facry handle all the sorting? I think I'm switching too much between shaders/textures, and I also don't use faster methods such as VBO's or at least vertex array's. Because the sub-sub-sub groups only have a few polygons on most cases... Greetings, Rick

Share this post

Link to post
Share on other sites
Ok, here is what i do:

I sort the geometry by material (your materialgroup), but not textures. The reason is, that materials are quite unique but textures are most likely diverse. An other problem with textures is, that most geometry uses more than one texture (multitexturing).

I'm not an hardware expert, but I hope that by reducing the number of texture switches (not texture bind commands) by grouping your geometry by classes of texture 'sets' is quite a good compromise. Example:

You got terrain, trees, characters,particles,water each with their own set of textures. So by rendering your terrain first, then trees, water, characters, particles you always consider only a 'small' set of textures concurrently, avoiding frequent texture uploads and benefiting texture cache.

That is my way to reduce the number of texture switches.

A fast way of sorting is bucket/radix sort, perfectly suitable for sorting geometry by materials.


Share this post

Link to post
Share on other sites
Well, I basically do that already. I sort on shader first, then on the "parameter set". Most of the shaders require multiple textures indeed, but the combination of those is the same in 99% of the cases. That means a diffuse texture won't be used with 3 different normal map textures, or really different number/vector parameters. The textures that are used for alot of surfaces (a lightmap for example) are activated before the whole thing, so that is not a real problem either.

It becomes nasty when adding dynamic lights. Luckily, most of the "dynamic" lights won't really move around though, but I still need to tell for each face/group/whatever which lights it will use.

I was hoping SM3.0 would support looping through an x number of lights. Then I could just do the normal (lightmapping ala Halflife2) rendering pass, and then do lighting in the same texture. But it seems that I need at least 2 passes, unless I want to make hunderds of different shaders... I could "auto generate" those, but even with everything 1 shader, I still need to change parameters/parameters quite alot. Each face has different lighting settings, and therefore possibly also another shader that needs to be activated/deactivated.


Share this post

Link to post
Share on other sites
I suggest that you don't SORT your geometry by material but that you GROUP it by material. The difference is that sorting is at best an O(n * log n) operation while grouping is a O(n) operation.

Here's how grouping can be done very efficiently:
Give each material an ID an increment it with each material your create, so something like

class Material
static lastId = 0;
int id;
id = Material::lastId++;

Separate your geometry into chunks of the same material and then give each of these chunks a next-pointer as well as the material id:

class GeometryChunk
int materialId;
GeometryChunk* next;

Also, once all material-objects have been created, create a new array like this:

GeometryChunk** chunksByMaterial = new GeometryChunk*[Material::lastId + 1];

Now when you render your geometry do something like this (heavily simplified)

for (int i = 0; i <= Material::lastId; i++)
chunksByMaterial = NULL;

for (int i = 0; i < numChunks; i++)
if (IsVisible(allChunks))
int matId = allChunks->materialId;
allChunks->next = chunksByMaterial[matId];
chunksByMaterial[matId] = allChunks;

for (int i = 0; i <= Material::lastId; i++)
if (chunksByMaterial == NULL)

GeometryChunk* chunk = chunksByMaterial;
while (chunk != NULL)
chunk = chunk->next;


This is quite fast and doesn't require you to sort anything in advance. Plus, it reduces shader/material switches just like sorting would but with far less overhead.

Share this post

Link to post
Share on other sites
Uhm, I think I do that :) Sorting, grouping... Everthing is grouped indeed once. First everything with material "wood" is rendered, then "metal", and so on. I don't sort at runtime, since nothing in the geometry changes. So far. So far it also worked. Maybe its not the best grouping in all situations (sometimes you do everything with the same shader, in other case really not), but it worked so far.

But now I want to add dynamic lighting as well, and that makes it difficult. Each polygon/face could have a different parameter set/shader/cubemap and now also different light(s). All these possible combinations make it hard to sort. And I think I need to render multi-pass now. So far I did everything in 1 pass (static lighting with a lightMap, is predictable). Although my lights might not even really "move around" (except for a flashlight maybe), I get hundreds of different shaders. 1 light, 2 lights, with parallax, without parallax, omni/directional/spotlights, fog enabled/disabled, lightMap on/off, other specific effects ....

Thanks for the tips guys,

Share this post

Link to post
Share on other sites
There doesnt seem to be much more that you can do to sort out fundamentally different shaders (parallax, normal mapping, ect..). I think the way you have that art set up is about as good as you can get. At least, im pretty sure you dont want to flood your shaders with if statements that select material...

As far as lighting with multiple dynamic lights goes, It really depends on your target hardware. If your targeting SM3 or SM4, then I personally think you would be better off making a "supershader" for each of your different materials. For example, you would compute all the lights that would effect an object on the cpu, then send down an array of these lights as a shader constant and simply loop through them and accumulate the effect. For complex shaders especially this is a plus because, in the case of parallax mapping, you avoid the costly operation of performing the raytrace more than once.

THe problem with the supershader idea is the 1) it wont work well on SM2 + SM1 hardware and 2) it becomes difficult when you start using shadowmaps. My solution in the case of shadowmaps is to not store all the shadowmaps in a seperate texture, but tile them in one big texture and then you dont have the problem of needing to bind a variable number of textures; you just bind the big tiled one and read from wherever you need. In my engine, I limit the total number of lights done in one pass to 4, since the shadowmap texture can only be so large; however, if your using directx 10, you could use texture arrays and have all your lights and shadows done in a single pass.

Share this post

Link to post
Share on other sites
if you like trying new things, you can try to create huge texture atlas and do megatexturing. you would get rid of texture switches for good. there is a thread about it nearby(http://www.gamedev.net/community/forums/topic.asp?topic_id=460053)

there is one more thing i`m thinking about. what if you stored some material things in texture? i mean specular term and similar. in most cases it wouldnt consume much memory, becouse if you have for example mesh of wood, you need single specular term value for whole mesh and you could put 1x1 texture on it(stored in texture atlas, of cource). then you would have to do only shader switches, like normal mapping/ parallax mapping and so on. if you are thinking about using more lights than four, i think deferred shading can solve the problem.

Share this post

Link to post
Share on other sites
I whish I had 20 hands and 10 brains, then I could try all the possible solutions :)

I don't have that much dynamic lights. 0 to 3 or 4 (per face), I think. I also use a lightMap, so its some sort of mixture (is that a smart idea? I don't know how else to get the nice radiosity effect). I don't aim for SM2 either. If I ever get this finished, I think SM23.0 will be out :). For now, its just for me and my current card.

I like the Atlas idea. I have no idea how mega texturing works, but it means I should be able to put "as many textures as I want" into 1 atlas (if there was no memory limit)? That certainly makes life easier, although I don't know the additional cost to access those textures in a shader. And how about tiling? Most of my textures are tiled. Some info is indeed combined in 1 texture. I use alpha channels for heights or specular terms in many cases.

Deffered shading is also something I like. Since it's a post effect, it doesn't affect the sorting or multipassing. But I never tried it... I heard you can get nasty problems with transparent surfaces...

About the looping... Is that possible then? It would indeed be really nice if I could use a "dynamic lighting loop". I could put it after all the basic shaders programs, and do everything in 1 pass. Reading the texture once and doing somewhat more complex stuff like parallax once indeed saves energy as well. I recently tried some things with the newest version of Cg, but it did not support real loops. I could check per light if its enabled, but that's not a real fast approach I suppose. What I want is just "for (i=0; i<lightCount...)" where lightCount is a variable parameter. Unsized array's weren't supported either, although I could just say I can maximally use 4 lights in a shader for example (which still gives me fixed sized array's). Am I doing something wrong here, is Cg not up-to-date with SM3.0, or is this just not (yet) possible?


Share this post

Link to post
Share on other sites
I havent much experience with the new cg, but i know when your working with hlsl, you have to do a few hacks in the shader to get the loop to compile. Basically, instead of for(int i=0;i<lightCount;i++), you have to fix it at some high value like 8 and inside the for loop put an if...so like this:
for (int i=0;i<8;i++)
if (lightEnabled)
//do lighting

I think the reason that works and the regular loop doesnt is because the hacked forloop unrolls the if statement 8 times and does dynamic branching on the 8 lights instead of dynamic branching in the loop; I suppose shader model 3 cards dont like to dynamic branch inside a loop or something...
But im no expert here, hopefully someone will step in and give a better answer.

As far as I know though, the real unhacked loop works in SM4

Share this post

Link to post
Share on other sites
"Unrolled loops", that was the word I was looking for. Yeah, the Shader Model really should support that. But so far, I need to use your solution, if I want to do everything in 1 shader. And maybe a little bit more, I also need to check the type (spotlight, omnin, etc.) per source. That would be 4 if's or something in my case... A waste probably, I think in most cases at least 50% is used. I could also disable lights just by using a black color (which means no contribution), but that probably costs even more energy than a couple if's.

Well, the if is not that much of a problem. But its another "heavy" technique that piles on top of so many other heavy techniques (HDR, DoF, complex shaders, .... ). On the other hand, only 1 pass. I think I will use either this technique, or deffered shading (if there are some tricks for the transparency).

But I still wonder how games such as Doom3 sort/group it. In those days, multipassing was nescesary. So what was the "flow" of rendering a frame? I'd like to know that.

Ow, and by the way, when doing multiple lights in 1 shader, how to pass all the data from the vertex shader to the fragment shader? I mean, I was always limited to 4 x 8 numbers (8 texcoords available to put the data in). The 4 light directions will fit, but I need it also for other stuff. Especially when building that "uber shader". I haven't tried it lately, but I assume SM3.0 cards are still limited to 8 texture channels and 8 texcoords, right? I believe my GeForce 6600 was...

Thanks for the tips,

[Edited by - spek on October 2, 2007 3:06:57 AM]

Share this post

Link to post
Share on other sites
Lighting isn't the problem, shadowing is :)

Ok, you dont need much for a lot of lights, here is some pseudo shader code:

>>> vertex_shader:

// calculate your position to render it normally
position =....

// calculate your position and normal in world space !
position_world_space = ...
normal_world_space = ...

>>> fragment_shader:

uniform vec3 light_color[MAX_NUMBER_OF_LIGHTS];
uniform vec3 light_direction[MAX_NUMBER_OF_LIGHTS]; // light direction in WORLD SPACE


// for lighting
vec3 final_light_color
// diffuse color
final_light_color += dot(normal_world_space,light_direction) * light_color;

//speculare ...

That's all, you dont need to transform the light_direction for each face, just transform the face to world space and set your light per shader once in world space.

However, this is just lighting, not shadowing. Shadowing with many light sources is quite difficult.


Share this post

Link to post
Share on other sites
Of course :) Normally I translate to tangent space, but I already have my normal in world space inside the fragment shader, so why bothering.

I got a "ubershader", or whatever its called, now. Quite nice, just give some compiler options and it makes a specific shader. Not long ago I used to work with ten billion variants of each shader, you can imagine it's alot of work to maintain that. Only difficulty I expect to discover is adding a projected flashlight. I'm running out of texture channels. But that's another story.

Shadowing is indeed the problem. The dynamic lights won't do any shadows at all. Therefore I also use a lightMap. The lightMap contains the "global" lights that cast shadows. I'm using a directional map (like "Radiosity mapping" from HL2) so that I can still use some normalMapping on the pre-calculated lights as well. The dynamic lights are nearby sources. Together with reflections, parallax mapping, and some other stuff like detail normal Mapping, it can produce some really heavy shaders :) Luckily SM3.0 supports quite alot instructions. And maybe I still skip the "lighting loop", and do it after this stage with deffered lighting... If I can still do transparency with deffered lighting.

I'm not sure combining dynamic lights and a static lightMap is such a good idea. But I hate sharp stencil shadows. And only lightMapping looks a little bit dull too. I wonder how games like Bioshock or Farcry exactly combined the two though. That was basically my question in some other topic 1 or 2 weeks ago. Some people said lightMaps were out-dated and nowadays shadow maps are used more and more. But is that really true? I can imagine that for dynamic objects (characters, lockers, barrels, etc.), but for the static environment as well...?


Share this post

Link to post
Share on other sites
using shadow mapping for static environemts works very well and looks good also; The way to do it is very simple; Render the shadow map for your static light just like you normally would, but only do that once. Since its static, you know nothing will change in the shadowmap so you dont even have to render it again, you can just use the same old one every frame. Dynamic objects can even recieve the shadows from the static light (cant cast them though unless you re-render the shadow map).
You can even do some very nice filtering on the shadowmap to create penumbra accurate soft shadows...but that takes a little more work to get looking nice

If you want to be super efficient, you can keep that static shadowmap for the static environments, and allocate a separate, dynamic shadow map just for your dynamic objects; that way you aoid re-rendering the static environment (when dealing with static lights) all together

Share this post

Link to post
Share on other sites
...you can just use the same old one every frame...

That's true of course, I forgot about that. However, shadow maps still won't simulate indirect lighting of course. Although you could combine it with a simple lightMap that only measures light bounced of from other surfaces to add some "ambient" to prevent pitch black area's.

But I see some other problems. First, I have multiple lights in most cases, so that means multiple shadowmaps as well. I need to do multipassing / adding everything together. Well, like discussed earlier in this topic, I could handle multiple shadow maps in one 1 shader, but its still limited of course. Another problem, how to define which surfaces should use a shadow map casted from light X? In larger scenes, you don't want to use all shadow maps of course.

And last but not least, the detail. The further away from the lightSource, the less pixels an object-shadow gets. I read that there are tricks to improve that, but would that still be a fast solution (can I still use the same map every frame?).

I dunno... It sounds interesting though. Is this technique really used in modern games for the static environment? How do they deal with these problems?


Share this post

Link to post
Share on other sites
Hmmm... I've been thinking about shadowMaps. I think I can do it like this:

pass1, the diffuse/phong specular lighting pass
- Handle up to 4 shadowMaps
- The other 4 textures are for normals, albedo, maybe parallax, etc.
- This pass does the diffuse/specular lighting per source, including the shadowMap

pass2, ambient / emmissive / reflections
- Fog
- Add ambient (via a lightMap or occlusion map) / emmissive
- Add cubeMap / mirror reflections
- Transparency / reflections
- Other material specific effects

Pass1 and 2 are combined with additive blend

I'm limited to 4 lights then, but I think that should be enough. If not, I could make an extra pass between that does the same as pass1. Some operations are done twice in this case, but what the heck. I'll have to accept not everything is possible :)

But now the difficult part comes. How to assign (dynammically) 4 shadowMaps / lights to a surface? Most lights aren't moving, but a dynamic solution would still be nice. I assume a shadowMap should be used for all geometry that is (partially) inside its "view frustum". However, since the limitation, I must assign the most "important" sources to a surface.

The other difficult part remains the quality. In most cases the light won't be shining far, so a 512x512 map would give a good result. But for large lights far away...


Share this post

Link to post
Share on other sites
sounds like a good outline; i think with all the features you plan on implementing, 2 passes is a minimum; though you could squeez it into one if you pull some strings.

Im guessing you have some type of scene organizational structure, like an octree. You can use that to determine what lights are affecting what surfaces.

Regarding shadows;
I personally would do away with any kind of precomputed lightmap texture and use shadow maps globally. You could still use lightmaps or something similar to get the global illumination effect, but if you want nice crisp (but not too crisp) shadows your better of unifying it and use shadowmaps for everything. For spot lights, a single shadow map should be sufficient. For directional lights, you can use parallel split shadow maps (you can use this technique for spot lights also). For point lights, you can tile the 6 maps needed into a single texture.
If the PSSM's dont give you good enough quality, you can add some kind of perspective correction onto that. Further, if you want it to be softer, you can use PCF filtering, or if you want super soft but fast shadows, use variance shadow mapping.

Most commercial games dont even have very nice looking shadow maping (in my opinion), so its not a trivial process to get working and looking good; I think variance shadow maps combined with some perspective correction and PSSM for directional lights is the way to go. Just a couple minutes ago i noticed someone posted about some kind of new shadow mapping thing that makes it pixel perfect; there are all kinds of neat shadow mapping approaches out there, so there are tons of options and potential.

Share this post

Link to post
Share on other sites
I think you convinced me. I should try it.

The lightMap is indeed only for indirect lighting now. Unless 4 lights is too little, in that case I could let the lightMap help a little bit. But yes, it's more an ambient thing now.

I was about to ask how to do pointlights, but you answered that already. Just make 6 textures. I only need to figure out which from the 6 a face is receiving (or in the worst case it receives multiple... could become a little problem, especially when I have a limit of 3 or 4 maps per surface (light4 is also used for my flashlight).

I should use my portal culling and add an octree for light management. Well, I should just start and try it. I did shadowmapping 5 years ago... I think I have to learn it again :). I think I can get more quality with it indeed. A 512x512 shadowMap contains pretty much pixels comparing to a lightMap that gave way less pixels to the average polygon(the complete lightMap used 1024x766 in my case). Only problem are the surfaces litten far away from its lightSource when using shadowmaps. I shall look into the available techniques.

Thanks for helping everybody,

Share this post

Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
  • Advertisement
  • Popular Tags

  • Advertisement
  • Popular Now

  • Similar Content

    • By sergio2k18
      Hi all
      this is my first post on this forum.
      First of all i want to say you that i've searched many posts on this forum about this specific argument, without success, so i write another one....
      Im a beginner.
      I want use GPU geometry clipmaps algorithm to visualize virtual inifinte terrains. 
      I already tried to use vertex texture fetch with a single sampler2D with success.
      Readed many papers about the argument and all speak about the fact that EVERY level of a geometry clipmap, has its own texture. What means this exactly? i have to 
      upload on graphic card a sampler2DArray?
      With a single sampler2D is conceptually simple. Creating a vbo and ibo on cpu (the vbo contains only the positions on X-Z plane, not the heights)
      and upload on GPU the texture containing the elevations. In vertex shader i sample, for every vertex, the relative height to te uv coordinate.
      But i can't imagine how can i reproduce various 2d footprint for every level of the clipmap. The only way i can imagine is follow:
      Upload the finer texture on GPU (entire heightmap). Create on CPU, and for each level of clipmap, the 2D footprints of entire clipmap.
      So in CPU i create all clipmap levels in terms of X-Z plane. In vertex shader sampling these values is simple using vertex texture fetch.
      So, how can i to sample a sampler2DArray in vertex shader, instead of upload a sampler2D of entire clipmap?
      Sorry for my VERY bad english, i hope i have been clear.
    • By too_many_stars
      Hello Everyone,
      I have been going over a number of books and examples that deal with GLSL. It's common after viewing the source code to have something like this...
      class Model{ public: Model(); void render(); private: GLSL glsl_program; }; ////// .cpp Model::Model(){ glsl_program.compileAndLinkShaders() } void Model::render(){ glsl_program.use() //render something glsl_program.unUse(); } Is this how a shader program should be used in real time applications? For example, if I have a particle class, for every particle that's created, do I want to compiling and linking a vertex, frag shader? It seems to a noob such as myself this might not be the best approach to real time applications.
      If I am correct, what is the best work around?
      Thanks so much for all the help,
    • By getoutofmycar
      I'm having some difficulty understanding how data would flow or get inserted into a multi-threaded opengl renderer where there is a thread pool and a render thread and an update thread (possibly main). My understanding is that the threadpool will continually execute jobs, assemble these and when done send them off to be rendered where I can further sort these and achieve some cheap form of statelessness. I don't want anything overly complicated or too fine grained,  fibers,  job stealing etc. My end goal is to simply have my renderer isolated in its own thread and only concerned with drawing and swapping buffers. 
      My questions are:
      1. At what point in this pipeline are resources created?
      Say I have a
      class CCommandList { void SetVertexBuffer(...); void SetIndexBuffer(...); void SetVertexShader(...); void SetPixelShader(...); } borrowed from an existing post here. I would need to generate a VAO at some point and call glGenBuffers etc especially if I start with an empty scene. If my context lives on another thread, how do I call these commands if the command list is only supposed to be a collection of state and what command to use. I don't think that the render thread should do this and somehow add a task to the queue or am I wrong?
      Or could I do some variation where I do the loading in a thread with shared context and from there generate a command that has the handle to the resources needed.
      2. How do I know all my jobs are done.
      I'm working with C++, is this as simple as knowing how many objects there are in the scene, for every task that gets added increment a counter and when it matches aforementioned count I signal the renderer that the command list is ready? I was thinking a condition_variable or something would suffice to alert the renderthread that work is ready.
      3. Does all work come from a singular queue that the thread pool constantly cycles over?
      With the notion of jobs, we are basically sending the same work repeatedly right? Do all jobs need to be added to a single persistent queue to be submitted over and over again?
      4. Are resources destroyed with commands?
      Likewise with initializing and assuming #3 is correct, removing an item from the scene would mean removing it from the job queue, no? Would I need to send a onetime command to the renderer to cleanup?
    • By Finalspace
      I am starting to get into linux X11/GLX programming, but from every C example i found - there is this XVisualInfo thing parameter passed to XCreateWindow always.
      Can i control this parameter later on - when the window is already created? What i want it to change my own non GLX window to be a GLX window - without recreating. Is that possible?
      On win32 this works just fine to create a rendering context later on, i simply find and setup the pixel format from a pixel format descriptor and create the context and are ready to go.
      I am asking, because if that doesent work - i need to change a few things to support both worlds (Create a context from a existing window, create a context for a new window).
    • By DiligentDev
      This article uses material originally posted on Diligent Graphics web site.
      Graphics APIs have come a long way from small set of basic commands allowing limited control of configurable stages of early 3D accelerators to very low-level programming interfaces exposing almost every aspect of the underlying graphics hardware. Next-generation APIs, Direct3D12 by Microsoft and Vulkan by Khronos are relatively new and have only started getting widespread adoption and support from hardware vendors, while Direct3D11 and OpenGL are still considered industry standard. New APIs can provide substantial performance and functional improvements, but may not be supported by older hardware. An application targeting wide range of platforms needs to support Direct3D11 and OpenGL. New APIs will not give any advantage when used with old paradigms. It is totally possible to add Direct3D12 support to an existing renderer by implementing Direct3D11 interface through Direct3D12, but this will give zero benefits. Instead, new approaches and rendering architectures that leverage flexibility provided by the next-generation APIs are expected to be developed.
      There are at least four APIs (Direct3D11, Direct3D12, OpenGL/GLES, Vulkan, plus Apple's Metal for iOS and osX platforms) that a cross-platform 3D application may need to support. Writing separate code paths for all APIs is clearly not an option for any real-world application and the need for a cross-platform graphics abstraction layer is evident. The following is the list of requirements that I believe such layer needs to satisfy:
      Lightweight abstractions: the API should be as close to the underlying native APIs as possible to allow an application leverage all available low-level functionality. In many cases this requirement is difficult to achieve because specific features exposed by different APIs may vary considerably. Low performance overhead: the abstraction layer needs to be efficient from performance point of view. If it introduces considerable amount of overhead, there is no point in using it. Convenience: the API needs to be convenient to use. It needs to assist developers in achieving their goals not limiting their control of the graphics hardware. Multithreading: ability to efficiently parallelize work is in the core of Direct3D12 and Vulkan and one of the main selling points of the new APIs. Support for multithreading in a cross-platform layer is a must. Extensibility: no matter how well the API is designed, it still introduces some level of abstraction. In some cases the most efficient way to implement certain functionality is to directly use native API. The abstraction layer needs to provide seamless interoperability with the underlying native APIs to provide a way for the app to add features that may be missing. Diligent Engine is designed to solve these problems. Its main goal is to take advantages of the next-generation APIs such as Direct3D12 and Vulkan, but at the same time provide support for older platforms via Direct3D11, OpenGL and OpenGLES. Diligent Engine exposes common C++ front-end for all supported platforms and provides interoperability with underlying native APIs. It also supports integration with Unity and is designed to be used as graphics subsystem in a standalone game engine, Unity native plugin or any other 3D application. Full source code is available for download at GitHub and is free to use.
      Diligent Engine API takes some features from Direct3D11 and Direct3D12 as well as introduces new concepts to hide certain platform-specific details and make the system easy to use. It contains the following main components:
      Render device (IRenderDevice  interface) is responsible for creating all other objects (textures, buffers, shaders, pipeline states, etc.).
      Device context (IDeviceContext interface) is the main interface for recording rendering commands. Similar to Direct3D11, there are immediate context and deferred contexts (which in Direct3D11 implementation map directly to the corresponding context types). Immediate context combines command queue and command list recording functionality. It records commands and submits the command list for execution when it contains sufficient number of commands. Deferred contexts are designed to only record command lists that can be submitted for execution through the immediate context.
      An alternative way to design the API would be to expose command queue and command lists directly. This approach however does not map well to Direct3D11 and OpenGL. Besides, some functionality (such as dynamic descriptor allocation) can be much more efficiently implemented when it is known that a command list is recorded by a certain deferred context from some thread.
      The approach taken in the engine does not limit scalability as the application is expected to create one deferred context per thread, and internally every deferred context records a command list in lock-free fashion. At the same time this approach maps well to older APIs.
      In current implementation, only one immediate context that uses default graphics command queue is created. To support multiple GPUs or multiple command queue types (compute, copy, etc.), it is natural to have one immediate contexts per queue. Cross-context synchronization utilities will be necessary.
      Swap Chain (ISwapChain interface). Swap chain interface represents a chain of back buffers and is responsible for showing the final rendered image on the screen.
      Render device, device contexts and swap chain are created during the engine initialization.
      Resources (ITexture and IBuffer interfaces). There are two types of resources - textures and buffers. There are many different texture types (2D textures, 3D textures, texture array, cubmepas, etc.) that can all be represented by ITexture interface.
      Resources Views (ITextureView and IBufferView interfaces). While textures and buffers are mere data containers, texture views and buffer views describe how the data should be interpreted. For instance, a 2D texture can be used as a render target for rendering commands or as a shader resource.
      Pipeline State (IPipelineState interface). GPU pipeline contains many configurable stages (depth-stencil, rasterizer and blend states, different shader stage, etc.). Direct3D11 uses coarse-grain objects to set all stage parameters at once (for instance, a rasterizer object encompasses all rasterizer attributes), while OpenGL contains myriad functions to fine-grain control every individual attribute of every stage. Both methods do not map very well to modern graphics hardware that combines all states into one monolithic state under the hood. Direct3D12 directly exposes pipeline state object in the API, and Diligent Engine uses the same approach.
      Shader Resource Binding (IShaderResourceBinding interface). Shaders are programs that run on the GPU. Shaders may access various resources (textures and buffers), and setting correspondence between shader variables and actual resources is called resource binding. Resource binding implementation varies considerably between different API. Diligent Engine introduces a new object called shader resource binding that encompasses all resources needed by all shaders in a certain pipeline state.
      API Basics
      Creating Resources
      Device resources are created by the render device. The two main resource types are buffers, which represent linear memory, and textures, which use memory layouts optimized for fast filtering. Graphics APIs usually have a native object that represents linear buffer. Diligent Engine uses IBuffer interface as an abstraction for a native buffer. To create a buffer, one needs to populate BufferDesc structure and call IRenderDevice::CreateBuffer() method as in the following example:
      BufferDesc BuffDesc; BufferDesc.Name = "Uniform buffer"; BuffDesc.BindFlags = BIND_UNIFORM_BUFFER; BuffDesc.Usage = USAGE_DYNAMIC; BuffDesc.uiSizeInBytes = sizeof(ShaderConstants); BuffDesc.CPUAccessFlags = CPU_ACCESS_WRITE; m_pDevice->CreateBuffer( BuffDesc, BufferData(), &m_pConstantBuffer ); While there is usually just one buffer object, different APIs use very different approaches to represent textures. For instance, in Direct3D11, there are ID3D11Texture1D, ID3D11Texture2D, and ID3D11Texture3D objects. In OpenGL, there is individual object for every texture dimension (1D, 2D, 3D, Cube), which may be a texture array, which may also be multisampled (i.e. GL_TEXTURE_2D_MULTISAMPLE_ARRAY). As a result there are nine different GL texture types that Diligent Engine may create under the hood. In Direct3D12, there is only one resource interface. Diligent Engine hides all these details in ITexture interface. There is only one  IRenderDevice::CreateTexture() method that is capable of creating all texture types. Dimension, format, array size and all other parameters are specified by the members of the TextureDesc structure:
      TextureDesc TexDesc; TexDesc.Name = "My texture 2D"; TexDesc.Type = TEXTURE_TYPE_2D; TexDesc.Width = 1024; TexDesc.Height = 1024; TexDesc.Format = TEX_FORMAT_RGBA8_UNORM; TexDesc.Usage = USAGE_DEFAULT; TexDesc.BindFlags = BIND_SHADER_RESOURCE | BIND_RENDER_TARGET | BIND_UNORDERED_ACCESS; TexDesc.Name = "Sample 2D Texture"; m_pRenderDevice->CreateTexture( TexDesc, TextureData(), &m_pTestTex ); If native API supports multithreaded resource creation, textures and buffers can be created by multiple threads simultaneously.
      Interoperability with native API provides access to the native buffer/texture objects and also allows creating Diligent Engine objects from native handles. It allows applications seamlessly integrate native API-specific code with Diligent Engine.
      Next-generation APIs allow fine level-control over how resources are allocated. Diligent Engine does not currently expose this functionality, but it can be added by implementing IResourceAllocator interface that encapsulates specifics of resource allocation and providing this interface to CreateBuffer() or CreateTexture() methods. If null is provided, default allocator should be used.
      Initializing the Pipeline State
      As it was mentioned earlier, Diligent Engine follows next-gen APIs to configure the graphics/compute pipeline. One big Pipelines State Object (PSO) encompasses all required states (all shader stages, input layout description, depth stencil, rasterizer and blend state descriptions etc.). This approach maps directly to Direct3D12/Vulkan, but is also beneficial for older APIs as it eliminates pipeline misconfiguration errors. With many individual calls tweaking various GPU pipeline settings it is very easy to forget to set one of the states or assume the stage is already properly configured when in fact it is not. Using pipeline state object helps avoid these problems as all stages are configured at once.
      Creating Shaders
      While in earlier APIs shaders were bound separately, in the next-generation APIs as well as in Diligent Engine shaders are part of the pipeline state object. The biggest challenge when authoring shaders is that Direct3D and OpenGL/Vulkan use different shader languages (while Apple uses yet another language in their Metal API). Maintaining two versions of every shader is not an option for real applications and Diligent Engine implements shader source code converter that allows shaders authored in HLSL to be translated to GLSL. To create a shader, one needs to populate ShaderCreationAttribs structure. SourceLanguage member of this structure tells the system which language the shader is authored in:
      SHADER_SOURCE_LANGUAGE_DEFAULT - The shader source language matches the underlying graphics API: HLSL for Direct3D11/Direct3D12 mode, and GLSL for OpenGL and OpenGLES modes. SHADER_SOURCE_LANGUAGE_HLSL - The shader source is in HLSL. For OpenGL and OpenGLES modes, the source code will be converted to GLSL. SHADER_SOURCE_LANGUAGE_GLSL - The shader source is in GLSL. There is currently no GLSL to HLSL converter, so this value should only be used for OpenGL and OpenGLES modes. There are two ways to provide the shader source code. The first way is to use Source member. The second way is to provide a file path in FilePath member. Since the engine is entirely decoupled from the platform and the host file system is platform-dependent, the structure exposes pShaderSourceStreamFactory member that is intended to provide the engine access to the file system. If FilePath is provided, shader source factory must also be provided. If the shader source contains any #include directives, the source stream factory will also be used to load these files. The engine provides default implementation for every supported platform that should be sufficient in most cases. Custom implementation can be provided when needed.
      When sampling a texture in a shader, the texture sampler was traditionally specified as separate object that was bound to the pipeline at run time or set as part of the texture object itself. However, in most cases it is known beforehand what kind of sampler will be used in the shader. Next-generation APIs expose new type of sampler called static sampler that can be initialized directly in the pipeline state. Diligent Engine exposes this functionality: when creating a shader, textures can be assigned static samplers. If static sampler is assigned, it will always be used instead of the one initialized in the texture shader resource view. To initialize static samplers, prepare an array of StaticSamplerDesc structures and initialize StaticSamplers and NumStaticSamplers members. Static samplers are more efficient and it is highly recommended to use them whenever possible. On older APIs, static samplers are emulated via generic sampler objects.
      The following is an example of shader initialization:
      ShaderCreationAttribs Attrs; Attrs.Desc.Name = "MyPixelShader"; Attrs.FilePath = "MyShaderFile.fx"; Attrs.SearchDirectories = "shaders;shaders\\inc;"; Attrs.EntryPoint = "MyPixelShader"; Attrs.Desc.ShaderType = SHADER_TYPE_PIXEL; Attrs.SourceLanguage = SHADER_SOURCE_LANGUAGE_HLSL; BasicShaderSourceStreamFactory BasicSSSFactory(Attrs.SearchDirectories); Attrs.pShaderSourceStreamFactory = &BasicSSSFactory; ShaderVariableDesc ShaderVars[] = {     {"g_StaticTexture", SHADER_VARIABLE_TYPE_STATIC},     {"g_MutableTexture", SHADER_VARIABLE_TYPE_MUTABLE},     {"g_DynamicTexture", SHADER_VARIABLE_TYPE_DYNAMIC} }; Attrs.Desc.VariableDesc = ShaderVars; Attrs.Desc.NumVariables = _countof(ShaderVars); Attrs.Desc.DefaultVariableType = SHADER_VARIABLE_TYPE_STATIC; StaticSamplerDesc StaticSampler; StaticSampler.Desc.MinFilter = FILTER_TYPE_LINEAR; StaticSampler.Desc.MagFilter = FILTER_TYPE_LINEAR; StaticSampler.Desc.MipFilter = FILTER_TYPE_LINEAR; StaticSampler.TextureName = "g_MutableTexture"; Attrs.Desc.NumStaticSamplers = 1; Attrs.Desc.StaticSamplers = &StaticSampler; ShaderMacroHelper Macros; Macros.AddShaderMacro("USE_SHADOWS", 1); Macros.AddShaderMacro("NUM_SHADOW_SAMPLES", 4); Macros.Finalize(); Attrs.Macros = Macros; RefCntAutoPtr<IShader> pShader; m_pDevice->CreateShader( Attrs, &pShader );
      Creating the Pipeline State Object
      After all required shaders are created, the rest of the fields of the PipelineStateDesc structure provide depth-stencil, rasterizer, and blend state descriptions, the number and format of render targets, input layout format, etc. For instance, rasterizer state can be described as follows:
      PipelineStateDesc PSODesc; RasterizerStateDesc &RasterizerDesc = PSODesc.GraphicsPipeline.RasterizerDesc; RasterizerDesc.FillMode = FILL_MODE_SOLID; RasterizerDesc.CullMode = CULL_MODE_NONE; RasterizerDesc.FrontCounterClockwise = True; RasterizerDesc.ScissorEnable = True; RasterizerDesc.AntialiasedLineEnable = False; Depth-stencil and blend states are defined in a similar fashion.
      Another important thing that pipeline state object encompasses is the input layout description that defines how inputs to the vertex shader, which is the very first shader stage, should be read from the memory. Input layout may define several vertex streams that contain values of different formats and sizes:
      // Define input layout InputLayoutDesc &Layout = PSODesc.GraphicsPipeline.InputLayout; LayoutElement TextLayoutElems[] = {     LayoutElement( 0, 0, 3, VT_FLOAT32, False ),     LayoutElement( 1, 0, 4, VT_UINT8, True ),     LayoutElement( 2, 0, 2, VT_FLOAT32, False ), }; Layout.LayoutElements = TextLayoutElems; Layout.NumElements = _countof( TextLayoutElems ); Finally, pipeline state defines primitive topology type. When all required members are initialized, a pipeline state object can be created by IRenderDevice::CreatePipelineState() method:
      // Define shader and primitive topology PSODesc.GraphicsPipeline.PrimitiveTopologyType = PRIMITIVE_TOPOLOGY_TYPE_TRIANGLE; PSODesc.GraphicsPipeline.pVS = pVertexShader; PSODesc.GraphicsPipeline.pPS = pPixelShader; PSODesc.Name = "My pipeline state"; m_pDev->CreatePipelineState(PSODesc, &m_pPSO); When PSO object is bound to the pipeline, the engine invokes all API-specific commands to set all states specified by the object. In case of Direct3D12 this maps directly to setting the D3D12 PSO object. In case of Direct3D11, this involves setting individual state objects (such as rasterizer and blend states), shaders, input layout etc. In case of OpenGL, this requires a number of fine-grain state tweaking calls. Diligent Engine keeps track of currently bound states and only calls functions to update these states that have actually changed.
      Binding Shader Resources
      Direct3D11 and OpenGL utilize fine-grain resource binding models, where an application binds individual buffers and textures to certain shader or program resource binding slots. Direct3D12 uses a very different approach, where resource descriptors are grouped into tables, and an application can bind all resources in the table at once by setting the table in the command list. Resource binding model in Diligent Engine is designed to leverage this new method. It introduces a new object called shader resource binding that encapsulates all resource bindings required for all shaders in a certain pipeline state. It also introduces the classification of shader variables based on the frequency of expected change that helps the engine group them into tables under the hood:
      Static variables (SHADER_VARIABLE_TYPE_STATIC) are variables that are expected to be set only once. They may not be changed once a resource is bound to the variable. Such variables are intended to hold global constants such as camera attributes or global light attributes constant buffers. Mutable variables (SHADER_VARIABLE_TYPE_MUTABLE) define resources that are expected to change on a per-material frequency. Examples may include diffuse textures, normal maps etc. Dynamic variables (SHADER_VARIABLE_TYPE_DYNAMIC) are expected to change frequently and randomly. Shader variable type must be specified during shader creation by populating an array of ShaderVariableDesc structures and initializing ShaderCreationAttribs::Desc::VariableDesc and ShaderCreationAttribs::Desc::NumVariables members (see example of shader creation above).
      Static variables cannot be changed once a resource is bound to the variable. They are bound directly to the shader object. For instance, a shadow map texture is not expected to change after it is created, so it can be bound directly to the shader:
      PixelShader->GetShaderVariable( "g_tex2DShadowMap" )->Set( pShadowMapSRV ); Mutable and dynamic variables are bound via a new Shader Resource Binding object (SRB) that is created by the pipeline state (IPipelineState::CreateShaderResourceBinding()):
      m_pPSO->CreateShaderResourceBinding(&m_pSRB); Note that an SRB is only compatible with the pipeline state it was created from. SRB object inherits all static bindings from shaders in the pipeline, but is not allowed to change them.
      Mutable resources can only be set once for every instance of a shader resource binding. Such resources are intended to define specific material properties. For instance, a diffuse texture for a specific material is not expected to change once the material is defined and can be set right after the SRB object has been created:
      m_pSRB->GetVariable(SHADER_TYPE_PIXEL, "tex2DDiffuse")->Set(pDiffuseTexSRV); In some cases it is necessary to bind a new resource to a variable every time a draw command is invoked. Such variables should be labeled as dynamic, which will allow setting them multiple times through the same SRB object:
      m_pSRB->GetVariable(SHADER_TYPE_VERTEX, "cbRandomAttribs")->Set(pRandomAttrsCB); Under the hood, the engine pre-allocates descriptor tables for static and mutable resources when an SRB objcet is created. Space for dynamic resources is dynamically allocated at run time. Static and mutable resources are thus more efficient and should be used whenever possible.
      As you can see, Diligent Engine does not expose low-level details of how resources are bound to shader variables. One reason for this is that these details are very different for various APIs. The other reason is that using low-level binding methods is extremely error-prone: it is very easy to forget to bind some resource, or bind incorrect resource such as bind a buffer to the variable that is in fact a texture, especially during shader development when everything changes fast. Diligent Engine instead relies on shader reflection system to automatically query the list of all shader variables. Grouping variables based on three types mentioned above allows the engine to create optimized layout and take heavy lifting of matching resources to API-specific resource location, register or descriptor in the table.
      This post gives more details about the resource binding model in Diligent Engine.
      Setting the Pipeline State and Committing Shader Resources
      Before any draw or compute command can be invoked, the pipeline state needs to be bound to the context:
      m_pContext->SetPipelineState(m_pPSO); Under the hood, the engine sets the internal PSO object in the command list or calls all the required native API functions to properly configure all pipeline stages.
      The next step is to bind all required shader resources to the GPU pipeline, which is accomplished by IDeviceContext::CommitShaderResources() method:
      m_pContext->CommitShaderResources(m_pSRB, COMMIT_SHADER_RESOURCES_FLAG_TRANSITION_RESOURCES); The method takes a pointer to the shader resource binding object and makes all resources the object holds available for the shaders. In the case of D3D12, this only requires setting appropriate descriptor tables in the command list. For older APIs, this typically requires setting all resources individually.
      Next-generation APIs require the application to track the state of every resource and explicitly inform the system about all state transitions. For instance, if a texture was used as render target before, while the next draw command is going to use it as shader resource, a transition barrier needs to be executed. Diligent Engine does the heavy lifting of state tracking.  When CommitShaderResources() method is called with COMMIT_SHADER_RESOURCES_FLAG_TRANSITION_RESOURCES flag, the engine commits and transitions resources to correct states at the same time. Note that transitioning resources does introduce some overhead. The engine tracks state of every resource and it will not issue the barrier if the state is already correct. But checking resource state is an overhead that can sometimes be avoided. The engine provides IDeviceContext::TransitionShaderResources() method that only transitions resources:
      m_pContext->TransitionShaderResources(m_pPSO, m_pSRB); In some scenarios it is more efficient to transition resources once and then only commit them.
      Invoking Draw Command
      The final step is to set states that are not part of the PSO, such as render targets, vertex and index buffers. Diligent Engine uses Direct3D11-syle API that is translated to other native API calls under the hood:
      ITextureView *pRTVs[] = {m_pRTV}; m_pContext->SetRenderTargets(_countof( pRTVs ), pRTVs, m_pDSV); // Clear render target and depth buffer const float zero[4] = {0, 0, 0, 0}; m_pContext->ClearRenderTarget(nullptr, zero); m_pContext->ClearDepthStencil(nullptr, CLEAR_DEPTH_FLAG, 1.f); // Set vertex and index buffers IBuffer *buffer[] = {m_pVertexBuffer}; Uint32 offsets[] = {0}; Uint32 strides[] = {sizeof(MyVertex)}; m_pContext->SetVertexBuffers(0, 1, buffer, strides, offsets, SET_VERTEX_BUFFERS_FLAG_RESET); m_pContext->SetIndexBuffer(m_pIndexBuffer, 0); Different native APIs use various set of function to execute draw commands depending on command details (if the command is indexed, instanced or both, what offsets in the source buffers are used etc.). For instance, there are 5 draw commands in Direct3D11 and more than 9 commands in OpenGL with something like glDrawElementsInstancedBaseVertexBaseInstance not uncommon. Diligent Engine hides all details with single IDeviceContext::Draw() method that takes takes DrawAttribs structure as an argument. The structure members define all attributes required to perform the command (primitive topology, number of vertices or indices, if draw call is indexed or not, if draw call is instanced or not, if draw call is indirect or not, etc.). For example:
      DrawAttribs attrs; attrs.IsIndexed = true; attrs.IndexType = VT_UINT16; attrs.NumIndices = 36; attrs.Topology = PRIMITIVE_TOPOLOGY_TRIANGLE_LIST; pContext->Draw(attrs); For compute commands, there is IDeviceContext::DispatchCompute() method that takes DispatchComputeAttribs structure that defines compute grid dimension.
      Source Code
      Full engine source code is available on GitHub and is free to use. The repository contains tutorials, sample applications, asteroids performance benchmark and an example Unity project that uses Diligent Engine in native plugin.
      Atmospheric scattering sample demonstrates how Diligent Engine can be used to implement various rendering tasks: loading textures from files, using complex shaders, rendering to multiple render targets, using compute shaders and unordered access views, etc.

      Asteroids performance benchmark is based on this demo developed by Intel. It renders 50,000 unique textured asteroids and allows comparing performance of Direct3D11 and Direct3D12 implementations. Every asteroid is a combination of one of 1000 unique meshes and one of 10 unique textures.

      Finally, there is an example project that shows how Diligent Engine can be integrated with Unity.

      Future Work
      The engine is under active development. It currently supports Windows desktop, Universal Windows, Linux, Android, MacOS, and iOS platforms. Direct3D11, Direct3D12, OpenGL/GLES backends are now feature complete. Vulkan backend is coming next, and Metal backend is in the plan.
  • Advertisement