Best way to sort draw calls?

Graphics and GPU Programming Programming

Started by Chapa August 05, 2014 07:44 AM

10 comments, last by Chapa 9 years, 8 months ago

136

Author

August 05, 2014 07:44 AM

Hello, before anything, sorry for my english, since it is not my native language i am not really good on it, but here i go...

Im new to DirectX and i been thinking on the best way to manage the draw calls i will be doing, so i can get the best performance i can, i now understand a good way is sorting the meshes on a way that i make the less state changes possible, for example sorting by shader, then materials, etc, and i been trying to approach this, but also thinking in the future when i will be needing to get control of a certain mesh, or test collisions, etc, and i been trying to search for this info, but no one talks about models having more than one mesh, and managing all those meshes together, etc...

So this are the 2 ways i think i could do it, but cant decide which one is best or which one to choose...

Lets say i have a material, mesh and model class and each mesh has a link to the material it uses...

So the first idea would be loading all the materials and meshes on a list, and on the model class only saving a list of pointers to them, so i can modify or check something if needed, for example changing the world matrix of all the meshes on the model, the benefit of this idea is that i will be able to sort all of the meshes in the scene, ignoring to what model it belongs, and render them by material, saving material state changes even if two or more different models uses the same material, but the bad part is that i will be updating the matrix buffer on each mesh, to be able to move different meshes to the same place, ignoring their place on the list.

My second idea would be loading the materials and meshes inside the model class and by that way only having one world matrix for the whole model, in this case i will be rendering by model, and inside it, rendering by material anyway, and the benefit would be only updating the matrix buffer once per model, instead of once per mesh, but the bad part here is that if i have the same material on different meshes between models, i will need to update the material again on the other model or models.

So this are my 2 ideas and i still cant decide which one to choose, i really would appreciate any advice thank you.

kauna

2,925

August 05, 2014 09:09 AM

I think that the base problem to solve is relatively simple : you have a certain data which needs to be sorted. So in order to sort things, you'll need a number for each of the things - instead of using pointers to meshes or materials or shaders, you can use a handle which is smaller than the pointer - let's say 16-bits. So, when you create a mesh or a material you'll create a handle also, which can be used to retrieve the object.

With the handles, you can create an index number, let's say 64-bits. With simple shifts and maybe masking you can store all the handles inside the index number - such in a way:

index = (MeshHandle << 48) + (MaterialHandle << 32) + (...)

You can configure the data as your needs ... you'll need also index which indicates where to find the data for the rendering (such as transform matrices). Keep in mind that at this position of your renderer, you should have all the data required for the rendering already available.

After you have stored the data inside a std::vector for example, you can sort it with simple std::sort. Now, depending on the order you have created the index, you'll get data nicely layout. For example, if the mesh index is the first element and the material second, this minimizes the changes of the vertex / index buffers. If the material handle was the first element, then you'd be changing the material less often etc.

I consider a model to be a collection of meshes, so when doing the actual drawing, the model isn't really relevant, since we are only interested in the meshes. Each mesh needs one or more matrices (consider skinned meshes).

So, the actual model doesn't do any drawing calls. It just adds meshes to the renderer with the desired material and transform matrices.

Cheers!

ps. there is lots of examples how to implement this on the net

JohnnyCode

1,084

August 05, 2014 07:59 PM

you should order primarily by:

1- shaders (then)

2- textures (then)

3- vertex buffers

You can create a binding manager that always tryies to bind a resource mindlessly but in its internal managing refuses to do so if resource is already gpu avtive-provided. With this technique - after sorted by priorities mentioned upper is done- the manager will simply minimize all gpu expensive commands.

All this should be less prior than drawing by distance from camera and so on and so on

Chapa

136

Author

August 05, 2014 09:34 PM

Thank you very much for your answers, im actually also new to programming, im still in college and never saw programming before, and since we see alot of stuff at the same time we do not really learn everything so deeply...

But i like to do my own research and learn by myself, so i did it, and i just learned what a handle is, and a little bit of bitwise operators, and i managed to do an example code, making handles and sorting them, just as kauna told me, it is amazing, i guess the handlers, will be the index number of my object on their own list, and i will just walk thru the vector, and change the states when needed, looks like some nested fors or whiles with some ifs, and thats it, pretty cool and easy.

About JohnnyCode advises, i was already planning to do a kind of resourece manager that does all this kind of stuff, thank you for making me sure of doing it, and im already using frustrum culling before all this rendering...

But i still have some doubts about defining models and meshes that i cant get straight on my head, i still do not import any animations, so i dont manage skinning, i actually dont know how to yet, im actually importing static models from obj and mtl files, i decided to always divide the meshes by material, so each mesh has only 1 material assigned of course, so having everything divided as kauna told me, everything will fit cool with this method...

But i still keep thinking, what if i make a model class with its own materials and meshes, this still will make me only have one world matrix per model and also one vertex buffer, and i wont need to send that many world matrix changes to shader or vertex buffer changes... for example lets say in my sorted vector with handlers, my last handle is the matrices transformations, if i sort by material for example, probably all of the meshes that has the same transformations will be all far away from another, so i will need to update the transformations almost each iteration, and doing it the way i think it will happen a lot of less times, since all the meshes in the model will be rendered in order, and i guess i can also do this by putting the matrices transformations handler at the beggining on my vector, or also using a vertex buffer handler, but i dont know if its the correct thing to do, i dont really know what is more expensive to do, sending material info or sending matrices info, since both are just constant buffers and material buffer adds some texture sends, this confuses me alot lol..

Also doing this, i would probably only check if the whole model is inside the frustrum, and not each of the meshes, or also when testing collisions, this has nothing to do with rendering, but i guess it will save some cpu cycles also...

I would really appreciate if you guys can help me get this clear for me, thank you.

JohnnyCode

1,084

August 06, 2014 12:42 AM

Thank you very much for your answers, im actually also new to programming, im still in college and never saw programming before, and since we see alot of stuff at the same time we do not really learn everything so deeply...

You do not need to manage things to be extremly effective, a program can run without this in mind. Yet if you wonder how to optimze calibrate stuff it is matter of realy extensive knowledge about - nothing in general - but the provider you run over and its extensivity,

L. Spiro

25,818

August 06, 2014 01:59 AM

i decided to always divide the meshes by material, so each mesh has only 1 material assigned of course, so having everything divided as kauna told me, everything will fit cool with this method...

A mesh may have any number of materials applied to its individual triangles and thus may require more than 1 index buffer, shader, and draw call to render entirely.
These have no official term but could be called a “sub-mesh”.

But i still keep thinking, what if i make a model class with its own materials and meshes, this still will make me only have one world matrix per model and also one vertex buffer, and i wont need to send that many world matrix changes to shader or vertex buffer changes...

Meshes should not be combined into a single megamesh that requires only 1 world transform.
Making everything use only 1 world transform isn’t a sensible goal. It saves you nothing and costs you everything. It makes things far too complex to manage for your level in programming, and at the same doesn’t offer any gains that you would be able to use at your level (are you aiming for performance?).

Meshes and their sub-meshes need to be kept as they are in hierarchical form and drawn in as many calls with as many world transforms as needed.

sending material info or sending matrices info, since both are just constant buffers and material buffer adds some texture sends, this confuses me alot

It’s “sending information”. Nothing more.
The amount matters, not its name.
If you send diffuse, specular, ambient, and emissive (not that I would recommend this, but let’s just say you want to model the fixed-function pipeline), that’s 4 vectors.
If you send a standard world matrix, that’s 4 vectors.
It’s exactly the same.

Activating textures is a completely separate issue.
As was mentioned, you should always be redundancy-checking to make sure you don’t activate a texture that is already active.
And if you sort so that the same texture will be activated multiple times in a row then “activating” the texture the 2nd, 3rd, etc. time becomes free. Then it’s only a matter of sending material, transform, whatever constant data you have, which is virtually always going to be faster than a texture swap.

L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid

JMab

237

August 06, 2014 02:40 AM

I can explain how this works in my engine currently.

Only considering relevance to model rendering, my resource manager manages vertex shaders, pixel shaders, models, materials and textures.

A model is a collection of 1-to-many meshes. Each mesh is a subset of the model, with 1 material only. Each mesh contains a vertex buffer, index buffer, material handle and render type.

The render type will instruct the render queue how to render the mesh. The most common render type set is opaque- opaque meshes will end up on the standard forward or deferred rendering path. It could also be alpha-clipped, semi-transparent or an overlay, which are also sorted to be rendered last, always via forward rendering.

My scene is a collection of entities. Entities may contain other entities. Each entity is a collection of 1-to-many components. One type of component is a model component. When rendering the scene, the model component handles the OnRender event by constructing a render command for each mesh within the model. The render command contains the render type, material handle, depth (of the model from the camera), model handle, mesh index and a pointer to the owning component (and therefore indirectly entity). This render command is submitted to the render queue.

Once the entire visibility-culled scene has had a chance to add render commands to the render queue, the render queue is instructed to quick sort the queue. It does this by having all relevant render command information tightly packed (1 byte packing) into 3 uint32_t's.

The structure looks like this:

#pragma pack(1)

union

{

struct

{

uint32_t SortValueA;

uint32_t SortValueB;

uint32_t SortValueC;

};

struct

{

// Packed into SortValueA in reverse priority order.

uint16_t MaterialID;

uint16_t RenderType;

// Packed into SortValueB in reverse priority order.

uint16_t MeshIndex;

uint16_t ModelID;

// Packed into SortValueC in reverse priority order.

float Depth;

};

};

#pragma pack()

So the queue sorts by the priority of: Render Type - MaterialID - Model ID - MeshIndex - Depth.

When rendering the queue:

A change in render type instructs the render queue to activate the appropriate vertex and pixel shaders.

A change in material instructs the mesh to set the material parameters in the per-material constant buffer and set the material textures.

A change in model instructs the mesh to set the model parameters in the per-model constant buffer (e.g. transformation matrices).

A change in mesh index instructs the mesh to activate the mesh vertex and index buffers.

The mesh is then ready to render.

Chapa

136

Author

August 06, 2014 08:27 PM

Thank you L. Spiro and JMab, that was very useful,

So let me see if i understood how it should be, L. Spiro,

- Model will have only pointers to its meshes, i guess this will help in the future to know which meshes are related.

- Mesh will have the physics helpers (BB or BS, etc) and a list of sub-meshes, i guess this is what i would call a "group" lika a head for example, which would be part of a model.

- Sub-Mesh will be the one having all the info like vertex/index buffers, shaders and material IDs, matrices transformations, etc, and this one would represent for example, the eyes, or the mouth, etc.

If this is correct, i think it is pretty clear now then, after having this i think i will first loop thru all meshes to check which one is inside frustrum, but since the sub-meshes will be sorted by material i would need to activate a bool or something like that, and then go thru all the sub-meshes on the scene, and render the ones that are inside the frustrum by reading the bool i modified.

Am i near now? this has been very helpfull, thank you everyone for the answers.

JMab

237

August 07, 2014 01:15 AM

I should add on the Model/Mesh/SubMesh debate, I don't think my current structure is future proof, as it doesn't consider Level of Detail (LOD). This structure makes sense to me, but each to their own:

Model - A collection of Meshes. Linked to by the actor/gameObject/entity.

Mesh - A collection of SubMeshes. Each Mesh represents a different LOD level of the Model.

SubMesh - A shaders/material/vertex buffer/index buffer set for a particular LOD level.

L. Spiro

25,818

August 07, 2014 03:57 AM

Model will have only pointers to its meshes

Or just an array. Keep cache as nice as you can.

Mesh will have the physics helpers

The model will also have a bounding box. If the model is not in view there is no reason to test any meshes.
Meshes also need to maintain a hierarchy. Every mesh should have a parent except root meshes, and any number of children.
Transforms cascade into children.

Sub-Mesh will be the one having all the info like vertex/index buffers, shaders and material IDs, matrices transformations, etc, and this one would represent for example, the eyes, or the mouth, etc.

Only the mesh needs a vertex buffer. Sub-meshes can do with an index buffer and an offset into the shared vertex buffer. Otherwise you end up with more resources, more duplicate vertices, etc. On the other hand, using multiple vertex buffers is simpler to implement.
Matrices are part of the mesh. Sub-meshes cannot have a different world matrix from the mesh’s.

but since the sub-meshes will be sorted by material i would need to activate a bool or something like that, and then go thru all the sub-meshes on the scene, and render the ones that are inside the frustrum by reading the bool i modified.

I don’t see the point in the boolean. Sorting via a render queue is something that happens every frame within an entirely separate domain. Trying to gain anything by sorting sub-meshes is useless. If the render-queue itself has the ability to take advantage of frame-to-frame temporal coherence and if the visible objects are added to the render-queue each frame in the same order (which is a natural by-product of any deterministic culling process), then you are already maximizing your performance and anything else, such as caring about the order of the sub-meshes, is simply superfluous.

Sorting doesn’t happen on the model/mesh/sub-mesh level and is an entirely different beast.

This structure makes sense to me, but each to their own:

You would be better off sticking to the standard terminology described above and use separate models for LOD purposes. An LOD could be a “sub-model” inside a model (so it becomes associated as an LOD rather than as an entirely new model) but it would basically just be a lowerpoly repeat of the model->mesh->sub-mesh structure.

L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid

Best way to sort draw calls?

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Best way to sort draw calls?

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines