Is automatic state monitoring necessary in a renderer?

Started by
5 comments, last by mrheisenberg 10 years, 3 months ago

My renderer(based on D3D11) is currently built a lot like the one from the Hieroglyph3 engine.Its something like this:

class Renderer
{
static ImmediateContextManager Immediate; //Encapsulates an immediate context, inherits from DeferredContextManager
static vector<DeferredContextManager> Deferred; //Each one encapsulates a deferred context

//ID3D11Device functionality - encapsulating methods, like CreateBuffer, CreateTexture, etc.
};

In the ContextManager each time you set some state it checks if it's been set before and this way it prevents redundant API calls.However that's quite a lot of if checks involved for each object I render.I thought about sorting objects by material type, but that means first I need to sort by shader, then by texture, then by vertex buffer and pretty much anything that would require a state change in the pipeline, and that's a huge amount of sorting each frame and sometimes I still end up with cases where a redundant API calls i made.Basically:

-for automatic state monitoring you get tons of branching each time you render an object
-for sorting you have to perform a huge amount of sorts to make sure everything turns out ok

The performance hits of both these methods get noticed with large amounts of objects on scene.The sorting actually comes up way heavier when I do it(I use Quicksort3), than just using the state monitoring method.Maybe I should make some combination of the two?

Advertisement

You don't need to do multiple sorts in order to sort by multiple criteria. For each criteria, you just need to be able to map it to an integer with N bits. So lets say you were sorting by shader and by texture. If you come up with a system that assigns a unique 16-bit ID for each shader and and a similar system for assigning a 16-bit ID for each texture, you could combine those two into a single 32-bit integer. When combining you just want the higher-precedence criteria to go into the higher bits, and the lower-precedence criteria into the lower bits. So in your case if you want to sort first by shader then sort by texture, you would do sortID = (shaderID << 16) | textureID. Then you just sort one using your combined sort ID.

Expanding on what MJP said (I just typed “MJ” and hit Tab hoping it would auto-complete the name).

  • Do I need to check if a state is set before changing it?
    • It may not be necessary for things to work, but it is a major improvement on performance. It is heavily recommended, especially for shaders, textures, vertex buffers, index buffers, and render targets.

  • I use Quicksort3

    • Use an insertion sort and take advantage of per-frame temporal coherence.
    • Sort indices of items, not actual items themselves. Copying items back and forth is transfer overhead.
    • You should always restrict yourself to stable sorts for this type of sorting. Flickering artifacts on translucent objects can otherwise occur.

  • for sorting you have to perform a huge amount of sorts to make sure everything turns out ok

    • For completeness I am just going to reiterate that you don’t have to sort by all forms of criteria. The main ones are shaders, textures, and depth.

  • for automatic state monitoring you get tons of branching each time you render an object

    • Branching is faster than calling functions. Especially if those functions always go to ring 0 or cause pipeline flushes/stalls.

  • Maybe I should make some combination of the two?

    • They are meant to be used together. Sorting has no meaning if you are still calling Direct3D functions on each redundant state change, and preventing redundant state changes has no meaning if you are drawing in random order. Both are necessary.

Most of this is covered here: 3D Performance Tips


L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid

Similar to MJP and LS above, I create and array of structs that contain a 32bit sorting key, and a 32bit draw item index, then sort that array using a radix sort.
After that, I iterate through all the draw items (using the sorted indices) and ignore any redundant states (hopefully there is now a lot of redundancy thanks to the sorting pass).

Ideally, you'd structure your code so that you can disable these features (maybe a #define) so that you can validate that it's actually an optimization.

Similar to MJP and LS above, I create and array of structs that contain a 32bit sorting key, and a 32bit draw item index, then sort that array using a radix sort.
After that, I iterate through all the draw items (using the sorted indices) and ignore any redundant states (hopefully there is now a lot of redundancy thanks to the sorting pass).

Ideally, you'd structure your code so that you can disable these features (maybe a #define) so that you can validate that it's actually an optimization.

Ok just one question - why a 32bit draw item index - do you mean an integer?Why not just a direct pointer to the draw item?Isn't that the fastest possible way?

Also a question about L.Spiro's link - he talks about depth sorting, but what if I plan to instance a lot of objects?I just add their transform matrices to the instance buffer and pass instanceCount to the draw call.Does the way they are depth sorted correspond to how the GPU draws them?Or does it just draw them in random order(when instancing)?


do you mean an integer?

Yes.

Though my sortables always use more than that (try to at least fit it within a 64-bit integer though) plus the depth component, which is why I sort indices rather than the sortables themselves.


Why not just a direct pointer to the draw item?

Because what is used to sort is only a subset of what a mesh (or mesh part) actually is; sending a pointer to a mesh/sub-mesh off to a render queue means the render knows far more than it needs to know to do its job. All it needs is a small structure or key to use for sorting. Fill it out and send it one. Don’t create unnecessary dependencies.


Does the way they are depth sorted correspond to how the GPU draws them?

They are drawn in the order specified by your instance buffer. Except when drawing translucent objects, instancing is preferred over anything else as it implicitly means no shader swapping, texture swapping, vertex-buffer swapping, index-buffer swapping, etc. Any gains by drawing front-to-back on opaque objects is completely trumped by this, so ignore depth.

L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid

Do you use command lists and deferred contexts?I imagine they would complicate the sorting if you split the renderables between the different deferred contexts on different threads.Like if you have a system that automatically distributes draw calls and it messes up the draw order.

Also to add - I'm not using a second vertex buffer as an instance buffer, I'm using a buffer bound as a SRV, so I suppose my method is less efficient.The people from DICE said this technique is used in Battlefield 2, but they didn't mention anything like draw order.

This topic is closed to new replies.

Advertisement