Jump to content

  • Log In with Google      Sign In   
  • Create Account

Interested in a FREE copy of HTML5 game maker Construct 2?

We'll be giving away three Personal Edition licences in next Tuesday's GDNet Direct email newsletter!

Sign up from the right-hand sidebar on our homepage and read Tuesday's newsletter for details!


Is automatic state monitoring necessary in a renderer?


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
6 replies to this topic

#1 mrheisenberg   Members   -  Reputation: 356

Like
0Likes
Like

Posted 09 January 2014 - 04:12 PM

My renderer(based on D3D11) is currently built a lot like the one from the Hieroglyph3 engine.Its something like this:

class Renderer
{
     static ImmediateContextManager Immediate; //Encapsulates an immediate context, inherits from DeferredContextManager
     static vector<DeferredContextManager> Deferred; //Each one encapsulates a deferred context
    

    //ID3D11Device functionality - encapsulating methods, like CreateBuffer, CreateTexture, etc.
};

 

In the ContextManager each time you set some state it checks if it's been set before and this way it prevents redundant API calls.However that's quite a lot of if checks involved for each object I render.I thought about sorting objects by material type, but that means first I need to sort by shader, then by texture, then by vertex buffer and pretty much anything that would require a state change in the pipeline, and that's a huge amount of sorting each frame and sometimes I still end up with cases where a redundant API calls i made.Basically:

-for automatic state monitoring you get tons of branching each time you render an object
-for sorting you have to perform a huge amount of sorts to make sure everything turns out ok

The performance hits of both these methods get noticed with large amounts of objects on scene.The sorting actually comes up way heavier when I do it(I use Quicksort3), than just using the state monitoring method.Maybe I should make some combination of the two?



Sponsor:

#2 MJP   Moderators   -  Reputation: 11567

Like
2Likes
Like

Posted 09 January 2014 - 04:27 PM

You don't need to do multiple sorts in order to sort by multiple criteria. For each criteria, you just need to be able to map it to an integer with N bits. So lets say you were sorting by shader and by texture. If you come up with a system that assigns a unique 16-bit ID for each shader and and a similar system for assigning a 16-bit ID for each texture, you could combine those two into a single 32-bit integer. When combining you just want the higher-precedence criteria to go into the higher bits, and the lower-precedence criteria into the lower bits. So in your case if you want to sort first by shader then sort by texture, you would do sortID = (shaderID << 16) | textureID. Then you just sort one using your combined sort ID.



#3 L. Spiro   Crossbones+   -  Reputation: 13904

Like
3Likes
Like

Posted 09 January 2014 - 05:44 PM

Expanding on what MJP said (I just typed “MJ” and hit Tab hoping it would auto-complete the name).

  • Do I need to check if a state is set before changing it?
    • It may not be necessary for things to work, but it is a major improvement on performance.  It is heavily recommended, especially for shaders, textures, vertex buffers, index buffers, and render targets.

  • I use Quicksort3

    • Use an insertion sort and take advantage of per-frame temporal coherence.
    • Sort indices of items, not actual items themselves.  Copying items back and forth is transfer overhead.
    • You should always restrict yourself to stable sorts for this type of sorting.  Flickering artifacts on translucent objects can otherwise occur.

  • for sorting you have to perform a huge amount of sorts to make sure everything turns out ok

    • For completeness I am just going to reiterate that you don’t have to sort by all forms of criteria.  The main ones are shaders, textures, and depth.

  • for automatic state monitoring you get tons of branching each time you render an object

    • Branching is faster than calling functions.  Especially if those functions always go to ring 0 or cause pipeline flushes/stalls.

  • Maybe I should make some combination of the two?

    • They are meant to be used together.  Sorting has no meaning if you are still calling Direct3D functions on each redundant state change, and preventing redundant state changes has no meaning if you are drawing in random order.  Both are necessary.

Most of this is covered here: 3D Performance Tips
 
 
L. Spiro


Edited by L. Spiro, 09 January 2014 - 06:03 PM.

It is amazing how often people try to be unique, and yet they are always trying to make others be like them. - L. Spiro 2011
I spent most of my life learning the courage it takes to go out and get what I want. Now that I have it, I am not sure exactly what it is that I want. - L. Spiro 2013
I went to my local Subway once to find some guy yelling at the staff. When someone finally came to take my order and asked, “May I help you?”, I replied, “Yeah, I’ll have one asshole to go.”
L. Spiro Engine: http://lspiroengine.com
L. Spiro Engine Forums: http://lspiroengine.com/forums

#4 Hodgman   Moderators   -  Reputation: 30850

Like
3Likes
Like

Posted 09 January 2014 - 05:59 PM

Similar to MJP and LS above, I create and array of structs that contain a 32bit sorting key, and a 32bit draw item index, then sort that array using a radix sort.
After that, I iterate through all the draw items (using the sorted indices) and ignore any redundant states (hopefully there is now a lot of redundancy thanks to the sorting pass).

Ideally, you'd structure your code so that you can disable these features (maybe a #define) so that you can validate that it's actually an optimization.

#5 mrheisenberg   Members   -  Reputation: 356

Like
0Likes
Like

Posted 09 January 2014 - 06:27 PM

Similar to MJP and LS above, I create and array of structs that contain a 32bit sorting key, and a 32bit draw item index, then sort that array using a radix sort.
After that, I iterate through all the draw items (using the sorted indices) and ignore any redundant states (hopefully there is now a lot of redundancy thanks to the sorting pass).

Ideally, you'd structure your code so that you can disable these features (maybe a #define) so that you can validate that it's actually an optimization.

Ok just one question - why a 32bit draw item index - do you mean an integer?Why not just a direct pointer to the draw item?Isn't that the fastest possible way?

Also a question about L.Spiro's link - he talks about depth sorting, but what if I plan to instance a lot of objects?I just add their transform matrices to the instance buffer and pass instanceCount to the draw call.Does the way they are depth sorted correspond to how the GPU draws them?Or does it just draw them in random order(when instancing)?



#6 L. Spiro   Crossbones+   -  Reputation: 13904

Like
0Likes
Like

Posted 09 January 2014 - 06:49 PM


do you mean an integer?

Yes.

Though my sortables always use more than that (try to at least fit it within a 64-bit integer though) plus the depth component, which is why I sort indices rather than the sortables themselves.

 


Why not just a direct pointer to the draw item?

Because what is used to sort is only a subset of what a mesh (or mesh part) actually is; sending a pointer to a mesh/sub-mesh off to a render queue means the render knows far more than it needs to know to do its job.  All it needs is a small structure or key to use for sorting.  Fill it out and send it one.  Don’t create unnecessary dependencies.

 


Does the way they are depth sorted correspond to how the GPU draws them?

They are drawn in the order specified by your instance buffer.  Except when drawing translucent objects, instancing is preferred over anything else as it implicitly means no shader swapping, texture swapping, vertex-buffer swapping, index-buffer swapping, etc.  Any gains by drawing front-to-back on opaque objects is completely trumped by this, so ignore depth.

 

 

L. Spiro


It is amazing how often people try to be unique, and yet they are always trying to make others be like them. - L. Spiro 2011
I spent most of my life learning the courage it takes to go out and get what I want. Now that I have it, I am not sure exactly what it is that I want. - L. Spiro 2013
I went to my local Subway once to find some guy yelling at the staff. When someone finally came to take my order and asked, “May I help you?”, I replied, “Yeah, I’ll have one asshole to go.”
L. Spiro Engine: http://lspiroengine.com
L. Spiro Engine Forums: http://lspiroengine.com/forums

#7 mrheisenberg   Members   -  Reputation: 356

Like
0Likes
Like

Posted 09 January 2014 - 06:58 PM

Do you use command lists and deferred contexts?I imagine they would complicate the sorting if you split the renderables between the different deferred contexts on different threads.Like if you have a system that automatically distributes draw calls and it messes up the draw order.

Also to add - I'm not using a second vertex buffer as an instance buffer, I'm using a buffer bound as a SRV, so I suppose my method is less efficient.The people from DICE said this technique is used in Battlefield 2, but they didn't mention anything like draw order.


Edited by mrheisenberg, 09 January 2014 - 07:41 PM.





Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS