General mesh submission and sorting for rendering

Started by
9 comments, last by haegarr 13 years ago
So I'm at a stage in my engine where I've got a model format and textures that are being loaded and rendered, but at the moment, my way of rendering is just calling RenderDevice_DrawMesh of my RenderDevice object.

I've never really gotten much past where I am now, even though I intend the system design to be more robust and hopefully longer lasting than the other "engines" I've written. What I need (and have never done before) is how to have all the my meshes rendered each frame. How do I go about doing that, would I simply have a large, sorted (by texture maybe?) array (or list) in the RenderDevice of all the meshes? Would I just be submitting to that list each frame on the "Draw" call of each model, having the RenderDevice render later, and then clear the list after the render? I'm not sure what the different kinds of rendering methods there are. I have heard a bit about deferred rendering where you basically store the depth and lighting data in large frame-like buffers for each pixel and then that enables you to have as many lights as you want (I guess there's a limit on lights normally?).

Also, I'm currently running on DX9, although haven't gotten started with shaders yet, and hope to move to Windows 7 on my main dev computer so I can forget about DX9 and go to DX11. I don't know how this would change any answers, or even if it would.

So I'm really in the dark about this, and any information you can give, or links that you could pass to me that gives some comprehensive info would be appreciated. But even before looking at deferred rendering, I still have to figure out how the models will work and render.
[size="2"][size=2]Mort, Duke of Sto Helit: NON TIMETIS MESSOR -- Don't Fear The Reaper
Advertisement
What I need (and have never done before) is how to have all the my meshes rendered each frame. How do I go about doing that, would I simply have a large, sorted (by texture maybe?) array (or list) in the RenderDevice of all the meshes?
I have a list of lists. Each "root" element is the mesh itself. The mesh is an abstract entity which does not really exist. Each "sub-list" is the list of "mesh instances". This includes at the very bottom line, positioning, orientation and scaling. In line of principle everything could be put in a matrix, but I find easier to keep the information separated, it's not much of a big problem anyway. I mark each instance as visible or not and then render them all by state bucket.
While state sorting comes naturally for each mesh (and their instances) sorting across different meshes is not going to be much of a win for me. Perhaps with your assets the situation could be different but I'd suggest to keep it easy for a first iteration.

Would I just be submitting to that list each frame on the "Draw" call of each model, having the RenderDevice render later, and then clear the list after the render?
I'm not really sure of what's your idea here, but I'd say there's no reason to drop that list in the trashcan each frame. Make sure you have a clear separation of
what your scene graph does VS
what your culling algos do VS
what your renderer does.

I'm not sure what the different kinds of rendering methods there are. I have heard a bit about deferred rendering where you basically store the depth and lighting data in large frame-like buffers for each pixel and then that enables you to have as many lights as you want (I guess there's a limit on lights normally?).
The choice of deferred rendering (against standard "forward" rendering) is pretty much orthogonal to your problem. In essence, it doesn't affect much the solution you have to build.
Yes, deferred rendering allows higher amounts of lights to be applied (because they essentially become a post-processing effect) but I don't think it's its main advantage. IMHO, the main advantage is the separation between lighting and "shading" stages which helps a lot in avoiding combinatorial shader explosion. It also introduces a few problems.
For anything that aims at being in the complexity range of 2005 or older, I think going deferred is not going to buy much.

The most interesting feature you can use regarding your problem is "instancing". There are a few instancing techniques, each with its pros and cons. Think at it but don't even try it without hard data on your performances.


Also, I'm currently running on DX9, although haven't gotten started with shaders yet, and hope to move to Windows 7 on my main dev computer so I can forget about DX9 and go to DX11. I don't know how this would change any answers, or even if it would.
Good, supposing you have the card features to support the change! No, they don't affect much the problem you're trying to solve, because the problem you're asking about is architectural/algorithmic and not implementation-oriented in essence.

Previously "Krohm"

How do I go about doing that, would I simply have a large, sorted (by texture maybe?) array (or list) in the RenderDevice of all the meshes?
I have each "scene" (high level organisation of renderable items, out of the scope of the core engine) prepare a list of items that need to be drawn each frame. These lists are then submitted to the core renderer.

The data-type stored in the list contains - draw call information (i.e. the parameters for DrawIndexedPrimitive, etc...), a pointer to a block state information (textures, shaders, streams, render-states, etc), and a 32-bit sort-key.
Would I just be submitting to that list each frame on the "Draw" call of each model, having the RenderDevice render later, and then clear the list after the render?[/quote]That's what I do. The list is a reserved allocation, to "clear" the list you just set the write pointer back to the start of the allocation.

The renderer processes the lists by performing a radix-sort on the 32-bit keys, to reorder the draw-items into some arbitrary order (could be sorted by depth for a transparency pass, or sorted by material/texture/shader/etc...).
It then goes through each item, and compares it's requested states to the previously-set states. If the states are different, it writes a state-change command into a command buffer. After processing all the states for an item, it writes a draw command into a command buffer.

i.e.
"Scene" --[Traversal]--> List of render items --[Processing]--> Command buffer.

"Scene" is a game-specific organisation or renderables (e.g. an oct-tree).
"Render items" contain a sort-key, a draw-call, and one or more pointers to "state blocks".
"State blocks" contain any kind of state change (bind texture, set shader, set shader parameter, set blend mode, depth test, render-target, etc...).
"Command buffer" can just be the D3D9 device. When you "write to the command buffer", you're just calling "pDevice->Set...."

[color=#1C2837][size=2]I'm not really sure of what's your idea here, but I'd say there's no reason to drop that list in the trashcan each frame.[/quote]I throw out the lists because there's no reason to keep them. The high level scene structures have a complete list (or tree or whatever), but these are intermediate lists that only contain a sub-set of renderables (those that passed culling tests).
Some things come to my mind. Whether they are relevant depends on the pursued abilities of the engine.

1. recursive rendering

When you support portals, mirrors, environment maps, other RTT, ... then you need to render also from different point of views than the camera view. You may first do a frustum (or another visibility) culling to decide whether another view is to be rendered. In such a case you do a kind of recursive rendering; several pipelines may hence be needed.

2. different rendering phases

Using a forward / deferred / inferred / another-fancy rendering pipeline may include to have different phases, e.g. a depth rendering, normal rendering, shading, or what else. You typically need to support different shader scripts for different phases.

3. translucent objects

Translucency can be seen as phases, too. I mention it here separately because it usually requires a special handling. Translucency is done by blending with the framebuffer and hence cannot be performed by shader scripts (as long as not rendering to a texture). Additive (usual for light effects like fire, lightning, etc) and subtractive (usually used for shadowing effects like, err, anti-light, shadows, etc) translucency are linear processes and hence are order independent, but they need to be done after rendering the opaque objects. Standard translucency (e.g. used for glass) is not order independent if not using one of the OIT methods (like depth peeling), which then requires several passes.

4. sorting

As said, you may need to sort objects by depth, typically the standard translucent objects. You may want to sort opaque objects by depth, too, for the purpose to avoid massive overdraw (but maybe the scene doesn't show massive overdraw anyway).

After having considered the required order, a sorting can be done to minimize state switching. AFAIK, nowadays shader program switches, then geometry switches and texture switches are the most worst things one can do performance wise.

5. batching & instancing

Objects, especially static meshes, may be batched to reduce the draw call amount.

6. passes

The hardware may not be sufficient to render an object in a single pass. The rendering method itself may require several passes (see depth peeling as an example).


All this leads IMHO "naturally" to a solution where objects that are not culled will be send as a rendering job to the rendering pipeline. The job should refer to the rendering technique, material, geometry, placement, and perhaps other parameters. The job may be marked with a numeric tag encoding the criteria for sorting as described above (there are some articles, e.g. this one, that describe this method). However, the rendering pipeline then has the responsibility to put the job in correct order into the belonging phases, dependent on its parametrization.
[color="#1C2837"]I'm not really sure of what's your idea here, but I'd say there's no reason to drop that list in the trashcan each frame.
I throw out the lists because there's no reason to keep them. The high level scene structures have a complete list (or tree or whatever), but these are intermediate lists that only contain a sub-set of renderables (those that passed culling tests).[/quote]Yes, this makes sense. I agree there's no point in keeping those lists.
While we're at it, do you mind sharing elaborating on how to generate the 32-bit keys? Is it fully automatic?
Thank you very much.

Previously "Krohm"

While we're at it, do you mind sharing elaborating on how to generate the 32-bit keys? Is it fully automatic?
In the low-level renderer, it just treats them as arbitrary 32bit numbers and sorts on them.
Each high-level scene manager can choose what data it wants to put into the key field. If a scene manager needs to do a translucent pass, then while it's traversing it's scene and doing culling information, then it can calculate the view-space depth of each object and put that data into the key field.

If it's doing an opaque pass, it can pass it's "state-block" pointers to a common utility function that builds a key by hashing state values.
For example, let's say that changing vertex declarations is the most expensive change, and changing shaders is 2nd most expensive -- this utility function can take the pointer to the shader, XOR the top 16 bits with the lower 16bits to generate a simple hash of that pointer, and then put that 16-bit hash in the lower 2 bytes of the key. It can then do the same with the pointer to the vertex declaration, and put that hash in the top 2 bytes of the key. Now when sorted, objects sharing the same vertex declaration will be grouped together. Within each group, the objects that share the same shader will also be grouped together.

[color="#1C2837"]2. different rendering phases[/quote]Regarding this, when a high-level scene submits a list of render items, I also allow the scene to attach a state-block that applies to the entire list. This lets "phase"-specific states be set for the entire pass, without adding those states to each object.
[color="#1C2837"]
[color="#1C2837"][edit] haegarr - I was trying to find that article you linked below a few weeks ago, but I couldn't remember which blog I'd read it on. Thanks!
While we're at it, do you mind sharing elaborating on how to generate the 32-bit keys? Is it fully automatic?

You may be interested in this article, too.
Unfortunately there's always a compromise. I need to render my objects in such a way : Each frame build a list only for the visible renderables using some spatial culling. Sort those renderables by view-space depth to avoid overdraw and do occlusion culling. But I need to sort the also by material to avoid state swtiches. And on top of this I need to render them using instancing by issueing draw calls not much than the individual mesh count(excluding instances). There are some conflictiing "zones". Sorting the renderables by view space depth or by material, cull them against view frustum and draw them one by one using occlusion culling, or draw a bunch of them no matter what, using one draw call and instancing. And on top of that, I need to decide if I should use distance LOD for the selected meshes in view frustum, or draw the hell of them using full geometry but saving from draw calls via instancing. <IMG class=bbc_emoticon alt=:) src="http://public.gamedev.net/public/style_emoticons/default/smile.gif"> I know, I can use LOD, and culling with instancing, but it maybe more complicated and probably need some buffers to be locked/updated.
Thank you very much hodgman, I've took some time to read your post as well as the link and I think I've figured out a key fact: I was thinking at two different problems at the same time. I'll definitely look at this system in the future but don't think I will go for it right now.
You may be interested in this article, too.
The two systems are variations of the same concept. While I had little trouble in understanding the workings of the linked article, it was the exact meaning of the exact numbers that confused me. Ultimately, this doesn't matter as my concerns dealt with a really different problem which admittedly interacts with this but my doubts were about more subtle issues I'm considering.

Previously "Krohm"

Okay, so I've been making some notes and thinking a bit, and just wanted to make sure that I'm on the right track before sitting down and slamming out pages of useless code.

  1. Models get created and added to a World/Scene object that contains a list (or other data type) of models/meshes/geometry that needs to be rendered (not necessarily all geometry to be rendered, there may be others for special effects, etc, but most).
  2. The World/Scene contains a scene graph that is an Octree in which the models/meshes are inserted to be spatially sorted. Is that right, it sounds wrong when I write it, a scene graph can be an octree right, or should they be separate?
  3. Models are culled via the octree and a camera frustrum.
  4. Now we have a list of a subset of all models and meshes that want to be rendered and are actually visible.
  5. Sort meshes according to other criteria (translucency, depth, to minimize state change, etc).
  6. Render meshes.

Would the order of operations for rendering go something like that? Of course, that is just for a single RenderTarget/View, I'll probably have to duplicate steps 3-6 once for each RenderTarget/View.
[size="2"][size=2]Mort, Duke of Sto Helit: NON TIMETIS MESSOR -- Don't Fear The Reaper

This topic is closed to new replies.

Advertisement