• Advertisement
Sign in to follow this  

Render Queue Design

This topic is 1045 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi.

I'm trying to understand in detail how to program a render queue, in
particular the way L. Spiro has explained in several posts.

This is what I think she has said:

Each camera and light that cast shadows has a render queue set.

A render queue set is composed of two render queues, one for opaque
submeshes and one for alpha submeshes:
 
class Render_queue_set {
        Render_queue opaque_queue;
        Render_queue alpha_queue;
};
A render queue is a sequence (array, vector, etc.) of render items.
 
class Render_item {
        Model *model; // Not sure
        Mesh *mesh; // Not sure
        Submesh *submesh; // Not sure
        Shader_id shader_id;
        int num_textures;
        Texture_id textures[MAX_TEXTURE_UNITS];
        float depth; // Distance from model (or mesh?) to the camera
};
Render procedure for each viewpoint:
- The Scene Manager collects models than are in view (frustrum culling,
  occlusion culling, etc.).
- Tell each collected model to push items to the render queue set.
- Sort the render queues.
  - Sort only indices.
  - Take advantage of per-frame temporal coherence.
- For each render item in the render queues (first opaque and second
  alpha) tell the mesh (or model?) to render the item.

I'm not sure if a render item of the L. Spiro design needs pointers to model,
mesh and/or submesh. In my current engine, I would need the three pointers because:
- The model has the matrices I need to set uniforms.
- The mesh has the vertex buffer (I share the same vertex buffers for all
  submeshes) and currently a submesh doesn't have a pointer to his mesh.
- Each submesh has the index buffer and the material.

Share this post


Link to post
Share on other sites
Advertisement
I think you're in the right direction.
To make this really beneficial you can make a "key" (bitkey) for each renderable, so you can use bitwise sorting to sort anyway you like. For example on depth (far to near for non-opaque objects). In the end I think the definition of an object in this case is a renderable/subobject (set of vertices/polys which share a transformation and material), because these properties you need for sorting. For any "higher level" with more transforms, materials etc., the sorting would be less useful.

Here's a nice topic on the bitkeys and sorting:
http://www.gamedev.net/topic/659607-sorting-a-bucket/

Reading your question I'd say you got the basics to start off and see how it goes along the way

Share this post


Link to post
Share on other sites

The model has the matrices I need to set uniforms.

A model has only a single transform. That transform can’t be used to place wheels in the correct locations or perform skinning.
Each mesh is a modular unit and needs to have its own transform. A model is really nothing but a container for meshes and a single primary world position. From there, each mesh has a local transform and a world transform based on each parent actor. The model hierarchy takes advantage of the parent/child relationship (the scene graph) already present in the actor class.
There is nothing on a model that needs to be set in shaders.

The mesh has the vertex buffer (I share the same vertex buffers for all
  submeshes) and currently a submesh doesn't have a pointer to his mesh.

While there isn’t necessarily a correct answer here, I allow sub-meshes to have pointers to their mesh owners. This makes many things easier and allows you to reduce the size of your render-queue key. It helps in other areas too, such as when you want to pass just a sub-mesh to a function without needing to also pass the mesh that owns it.


The rest is similar to what I do. You differ in areas that are not necessarily right or wrong, but just for the sake of explanation, the meshes draw themselves in my implementation.
A mesh submits itself to the render-queue once per sub-mesh, as well as an index to the sub-mesh. Since there will likely never be over 65,535 sub-meshes on a mesh this can save you 16 (on 32-bit systems) or 48 (on 64-bit machines) bits over using a pointer.
For a mesh to draw itself it implements a pure virtual function which accepts the index to the sub-mesh to draw and any flags the mesh wanted to be passed back to it for the render (among other things at your discretion).

This would allow you to continue omitting the sub-mesh’s pointer to its owning mesh if you still prefer to do that.


I am currently right at the point in my new engine where I am just beginning to re-implement render-queues and plan to write an article on it as I go along.


L. Spiro

Share this post


Link to post
Share on other sites
Thanks for your responses.

A model has only a single transform. That transform can’t be used to place wheels in the correct locations or perform skinning.
Each mesh is a modular unit and needs to have its own transform. A model is really nothing but a container for meshes and a single primary world position. From there, each mesh has a local transform and a world transform based on each parent actor. The model hierarchy takes advantage of the parent/child relationship (the scene graph) already present in the actor class.
There is nothing on a model that needs to be set in shaders.


Do you calculate the object-to-view-space matrix in the vertex shader, the
CPU or any place?

A mesh submits itself to the render-queue once per sub-mesh, as well as an index to the sub-mesh. Since there will likely never be over 65,535 sub-meshes on a mesh this can save you 16 (on 32-bit systems) or 48 (on 64-bit machines) bits over using a pointer.
For a mesh to draw itself it implements a pure virtual function which accepts the index to the sub-mesh to draw and any flags the mesh wanted to be passed back to it for the render (among other things at your discretion).


In your implementation, after the render queues are sorted, do you process the render queues once per light source?

Share this post


Link to post
Share on other sites

Do you calculate the object-to-view-space matrix in the vertex shader

Except in rare special cases, never concatenate matrices in a vertex shader. In all typical cases, all matrices should be sent to the shaders fully formed.

My model-view matrix is created on the CPU.


In your implementation, after the render queues are sorted, do you process the render queues once per light source?

There is no reason to make a single pass for just a single light. Always render as many lights as possible in a single pass.
This of course applies to forward rendering.


L. Spiro

Share this post


Link to post
Share on other sites
A model has only a single transform. That transform can’t be used to place wheels in the correct locations or perform skinning.
Each mesh is a modular unit and needs to have its own transform. A model is really nothing but a container for meshes and a single primary world position. From there, each mesh has a local transform and a world transform based on each parent actor. The model hierarchy takes advantage of the parent/child relationship (the scene graph) already present in the actor class.
There is nothing on a model that needs to be set in shaders.

That's a point of view, in my engine I call mesh an array of node which contains geometry or not and each geometry contains subsets. Each geometry contains an array of bone which is an index of a node of the mesh, each node can be a bone (geometry or not).

 

In your implementation, after the render queues are sorted, do you process the render queues once per light source?

You should consider to look at clustered shading which gives an efficient forward rendering (and can be used on a deferred opaque pass).

You render each mesh and add the subset of the geometry in the array if transparent, then you do a quicksort + insertion sort and you have your back-to-front array ready to render.

The first pass (opaque) can be done using the clustered deferred shading and the second pass (transparents) after the sort back-to-front can be done using the clustered forward shading.

Using Direct3D12 and GLNext since that's low-level architecture, a good performance on order independent transparency will be possible, the back-to-front pass is then useless.

 

class Render_queue_set {

        Render_queue opaque_queue;
        Render_queue alpha_queue;
};

In case you don't know, sort additive blending is not needed, you can have 3 array to avoid sort them.

But since transparent is not a lot percent of the rendering, on most of case, you can just have one array with a constant size (a pool), to avoid allocation.

---

One last thing about transparency is back-to-front is not accurate at all, that's why order independent transparency is a current research.

If the geometry is big and overlap other geometry, the result of back-to-front will be bad on parts of the object.

Edited by Alundra

Share this post


Link to post
Share on other sites

That's a point of view, in my engine I call mesh an array of node which contains geometry or not and each geometry contains subsets. Each geometry contains an array of bone which is an index of a node of the mesh, each node can be a bone (geometry or not).

I was keeping my description short for the sake of clarity.
A model is a container for actors. A mesh is an actor, a bone is an actor, a point light is an actor, even a “group” is an actor. You know how in Maya you can select a bunch of objects and hit Ctrl-G to group them together, and then you can scale that group or move it and all the objects in the group scale or move too? You need that information at run-time to run a fully dynamic animation system properly.

All of these things are actors, so they can be parented. A model acts as the “root node”. A model is also an actor, so if you have a sword model you can make a joint in a character as its parent, effectively putting the sword in the character’s hand.
Etc.

Although the hierarchy of parents and children is enough to find anything inside the model, the model obviously also keeps a linear dictionary of objects as well, so you can run over only the meshes or only the lights or only the groups, etc.


Of course there was a lot missing in my explanation. The whole system is quite complex, but we don’t need to worry about that when we are focusing on render-queues.


But since transparent is not a lot percent of the rendering, on most of case, you can just have one array with a constant size (a pool), to avoid allocation.

The render-queues never deallocate their memory (until a specific point in the game, such as when changing states or scenes), so this isn’t a concern.


L. Spiro

Share this post


Link to post
Share on other sites

Ok yeah, my engine works using actor<->actor-component, apparently you only do a hierarchy of actor from what you said.

Edited by Alundra

Share this post


Link to post
Share on other sites
I'm near to complete the first render queue implementation iteration, just need to sort the render queues.
 

Except in rare special cases, never concatenate matrices in a vertex shader. In all typical cases, all matrices should be sent to the shaders fully formed.

My model-view matrix is created on the CPU.


I already calculate matrices in the CPU in most cases. The reason for my question is: I need to pass to most shaders the model-view matrix (because I calculate lighting in view space). Currently, I calculate this matrix just before telling a mesh to render itself, but that means this matrix can be calculated more than once
because a mesh can be in many render queue items.

I have thought this solution: just after collecting the meshes in view (or along collecting), calculate the matrix once per mesh and store it in the mesh. That is, each mesh temporarily stores the model-view matrix for the current pass. It's easy to implement but conceptually I'm not convinced. Edited by Arbos

Share this post


Link to post
Share on other sites
If a renderqueue item is a "renderable" (aka submesh), it wil/ should have it's own world matrix, which you need to pass to the shaders.

Share this post


Link to post
Share on other sites

I'm near to complete the first render queue implementation iteration, just need to sort the render queues.
 

Except in rare special cases, never concatenate matrices in a vertex shader. In all typical cases, all matrices should be sent to the shaders fully formed.

My model-view matrix is created on the CPU.


I already calculate matrices in the CPU in most cases. The reason for my question is: I need to pass to most shaders the model-view matrix (because I calculate lighting in view space). Currently, I calculate this matrix just before telling a mesh to render itself, but that means this matrix can be calculated more than once
because a mesh can be in many render queue items.

I have thought this solution: just after collecting the meshes in view (or along collecting), calculate the matrix once per mesh and store it in the mesh. That is, each mesh temporarily stores the model-view matrix for the current pass. It's easy to implement but conceptually I'm not convinced.

 

 

Here's how I deal with it (and none of this is final because it's still work in progress): In our engine we fix that duplicated computations problem with draw bundles that basically contain a list of things that want to be rendered with a certain configuration (i.e.  fully featured meshes with materials and everything for a full forward lighting pass, or position attribute only for something like shadow map passes, etc.).

 

More technically the bundles store lists of DrawItems (inspired by Hodgman's post, but our DrawItems are expanded to also include stuff like transformation matrices, or the shaders that have to be bound for a draw, etc.). So a drawbundle might have a list of DrawItems that describe a bunch of meshes that should be drawn to depth only (position attribute only, no materials to bind, etc.), another for full featured rendering (bind all kinds of materials before drawing, etc.). There can be more of these lists in a single draw bundle if you need them, the only important thing is that a single draw bundle always draws the same logical objects, the only thing that differs between the different DrawItem lists is renderer-specific information, like the difference in full featured vs depth only mesh rendering I mentioned. Finally the DrawBundle also contains a list of transforms that are shared between the different DrawItem lists (they only store offsets into the matrix list, not full matrices). These DrawItem lists can then be sorted for batching independently (and the resulting ordering may be different between them).

 

As a concrete example of how this avoids repeated matrix transform computations: Forward+ which has (or rather can have) a depth only prepass and then another normal geometry pass. Each frame we do frustum culling for this render pass, fill the DrawBundle from scratch with 2 lists (a DrawItem list for "fully featured" rendering, and another for depth-only rendering that is created from the first one by nulling out the stuff we don't need), and alongside both there is a list of transforms that is created once and the two DrawItem lists share pointers or index offsets into the transform list for each DrawItem.  Then before the draw calls happen a single GPU buffer is filled with those matrices and used for both depth only pass and the other geometry pass. 

 

Also, if you have something like a shadowmap pass for a light that moves rarely, you can keep a separate DrawBundle for all the static geometry and never have to update either the transforms or the DrawItem lists until the light moves (or for whatever reason the static geometry changes). 

 

Notice that this is all for a DX11-level renderer. Then after I designed all this I read the DX12 docs and noticed that they also have something that they literally call DrawBundles which are conceptually similar, and this makes me happy because code will be easy to adapt to DX12 & Vulkan (which probably has the same things).

Edited by agleed

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement