Best way to sort draw calls?
I think that the base problem to solve is relatively simple : you have a certain data which needs to be sorted. So in order to sort things, you'll need a number for each of the things - instead of using pointers to meshes or materials or shaders, you can use a handle which is smaller than the pointer - let's say 16-bits. So, when you create a mesh or a material you'll create a handle also, which can be used to retrieve the object.
With the handles, you can create an index number, let's say 64-bits. With simple shifts and maybe masking you can store all the handles inside the index number - such in a way:
index = (MeshHandle << 48) + (MaterialHandle << 32) + (...)
You can configure the data as your needs ... you'll need also index which indicates where to find the data for the rendering (such as transform matrices). Keep in mind that at this position of your renderer, you should have all the data required for the rendering already available.
After you have stored the data inside a std::vector for example, you can sort it with simple std::sort. Now, depending on the order you have created the index, you'll get data nicely layout. For example, if the mesh index is the first element and the material second, this minimizes the changes of the vertex / index buffers. If the material handle was the first element, then you'd be changing the material less often etc.
I consider a model to be a collection of meshes, so when doing the actual drawing, the model isn't really relevant, since we are only interested in the meshes. Each mesh needs one or more matrices (consider skinned meshes).
So, the actual model doesn't do any drawing calls. It just adds meshes to the renderer with the desired material and transform matrices.
Cheers!
ps. there is lots of examples how to implement this on the net
you should order primarily by:
1- shaders (then)
2- textures (then)
3- vertex buffers
You can create a binding manager that always tryies to bind a resource mindlessly but in its internal managing refuses to do so if resource is already gpu avtive-provided. With this technique - after sorted by priorities mentioned upper is done- the manager will simply minimize all gpu expensive commands.
All this should be less prior than drawing by distance from camera and so on and so on
Thank you very much for your answers, im actually also new to programming, im still in college and never saw programming before, and since we see alot of stuff at the same time we do not really learn everything so deeply...
But i like to do my own research and learn by myself, so i did it, and i just learned what a handle is, and a little bit of bitwise operators, and i managed to do an example code, making handles and sorting them, just as kauna told me, it is amazing, i guess the handlers, will be the index number of my object on their own list, and i will just walk thru the vector, and change the states when needed, looks like some nested fors or whiles with some ifs, and thats it, pretty cool and easy.
About JohnnyCode advises, i was already planning to do a kind of resourece manager that does all this kind of stuff, thank you for making me sure of doing it, and im already using frustrum culling before all this rendering...
But i still have some doubts about defining models and meshes that i cant get straight on my head, i still do not import any animations, so i dont manage skinning, i actually dont know how to yet, im actually importing static models from obj and mtl files, i decided to always divide the meshes by material, so each mesh has only 1 material assigned of course, so having everything divided as kauna told me, everything will fit cool with this method...
But i still keep thinking, what if i make a model class with its own materials and meshes, this still will make me only have one world matrix per model and also one vertex buffer, and i wont need to send that many world matrix changes to shader or vertex buffer changes... for example lets say in my sorted vector with handlers, my last handle is the matrices transformations, if i sort by material for example, probably all of the meshes that has the same transformations will be all far away from another, so i will need to update the transformations almost each iteration, and doing it the way i think it will happen a lot of less times, since all the meshes in the model will be rendered in order, and i guess i can also do this by putting the matrices transformations handler at the beggining on my vector, or also using a vertex buffer handler, but i dont know if its the correct thing to do, i dont really know what is more expensive to do, sending material info or sending matrices info, since both are just constant buffers and material buffer adds some texture sends, this confuses me alot lol..
Also doing this, i would probably only check if the whole model is inside the frustrum, and not each of the meshes, or also when testing collisions, this has nothing to do with rendering, but i guess it will save some cpu cycles also...
I would really appreciate if you guys can help me get this clear for me, thank you.
Thank you very much for your answers, im actually also new to programming, im still in college and never saw programming before, and since we see alot of stuff at the same time we do not really learn everything so deeply...
You do not need to manage things to be extremly effective, a program can run without this in mind. Yet if you wonder how to optimze calibrate stuff it is matter of realy extensive knowledge about - nothing in general - but the provider you run over and its extensivity,
A mesh may have any number of materials applied to its individual triangles and thus may require more than 1 index buffer, shader, and draw call to render entirely.i decided to always divide the meshes by material, so each mesh has only 1 material assigned of course, so having everything divided as kauna told me, everything will fit cool with this method...
These have no official term but could be called a “sub-mesh”.
Meshes should not be combined into a single megamesh that requires only 1 world transform.But i still keep thinking, what if i make a model class with its own materials and meshes, this still will make me only have one world matrix per model and also one vertex buffer, and i wont need to send that many world matrix changes to shader or vertex buffer changes...
Making everything use only 1 world transform isn’t a sensible goal. It saves you nothing and costs you everything. It makes things far too complex to manage for your level in programming, and at the same doesn’t offer any gains that you would be able to use at your level (are you aiming for performance?).
Meshes and their sub-meshes need to be kept as they are in hierarchical form and drawn in as many calls with as many world transforms as needed.
It’s “sending information”. Nothing more.sending material info or sending matrices info, since both are just constant buffers and material buffer adds some texture sends, this confuses me alot
The amount matters, not its name.
If you send diffuse, specular, ambient, and emissive (not that I would recommend this, but let’s just say you want to model the fixed-function pipeline), that’s 4 vectors.
If you send a standard world matrix, that’s 4 vectors.
It’s exactly the same.
Activating textures is a completely separate issue.
As was mentioned, you should always be redundancy-checking to make sure you don’t activate a texture that is already active.
And if you sort so that the same texture will be activated multiple times in a row then “activating” the texture the 2nd, 3rd, etc. time becomes free. Then it’s only a matter of sending material, transform, whatever constant data you have, which is virtually always going to be faster than a texture swap.
L. Spiro
I can explain how this works in my engine currently.
Only considering relevance to model rendering, my resource manager manages vertex shaders, pixel shaders, models, materials and textures.
A model is a collection of 1-to-many meshes. Each mesh is a subset of the model, with 1 material only. Each mesh contains a vertex buffer, index buffer, material handle and render type.
The render type will instruct the render queue how to render the mesh. The most common render type set is opaque- opaque meshes will end up on the standard forward or deferred rendering path. It could also be alpha-clipped, semi-transparent or an overlay, which are also sorted to be rendered last, always via forward rendering.
My scene is a collection of entities. Entities may contain other entities. Each entity is a collection of 1-to-many components. One type of component is a model component. When rendering the scene, the model component handles the OnRender event by constructing a render command for each mesh within the model. The render command contains the render type, material handle, depth (of the model from the camera), model handle, mesh index and a pointer to the owning component (and therefore indirectly entity). This render command is submitted to the render queue.
Once the entire visibility-culled scene has had a chance to add render commands to the render queue, the render queue is instructed to quick sort the queue. It does this by having all relevant render command information tightly packed (1 byte packing) into 3 uint32_t's.
The structure looks like this:
#pragma pack(1)
union
{
struct
{
uint32_t SortValueA;
uint32_t SortValueB;
uint32_t SortValueC;
};
struct
{
// Packed into SortValueA in reverse priority order.
uint16_t MaterialID;
uint16_t RenderType;
// Packed into SortValueB in reverse priority order.
uint16_t MeshIndex;
uint16_t ModelID;
// Packed into SortValueC in reverse priority order.
float Depth;
};
};
#pragma pack()
So the queue sorts by the priority of: Render Type - MaterialID - Model ID - MeshIndex - Depth.
When rendering the queue:
A change in render type instructs the render queue to activate the appropriate vertex and pixel shaders.
A change in material instructs the mesh to set the material parameters in the per-material constant buffer and set the material textures.
A change in model instructs the mesh to set the model parameters in the per-model constant buffer (e.g. transformation matrices).
A change in mesh index instructs the mesh to activate the mesh vertex and index buffers.
The mesh is then ready to render.
Thank you L. Spiro and JMab, that was very useful,
So let me see if i understood how it should be, L. Spiro,
- Model will have only pointers to its meshes, i guess this will help in the future to know which meshes are related.
- Mesh will have the physics helpers (BB or BS, etc) and a list of sub-meshes, i guess this is what i would call a "group" lika a head for example, which would be part of a model.
- Sub-Mesh will be the one having all the info like vertex/index buffers, shaders and material IDs, matrices transformations, etc, and this one would represent for example, the eyes, or the mouth, etc.
If this is correct, i think it is pretty clear now then, after having this i think i will first loop thru all meshes to check which one is inside frustrum, but since the sub-meshes will be sorted by material i would need to activate a bool or something like that, and then go thru all the sub-meshes on the scene, and render the ones that are inside the frustrum by reading the bool i modified.
Am i near now? this has been very helpfull, thank you everyone for the answers.
I should add on the Model/Mesh/SubMesh debate, I don't think my current structure is future proof, as it doesn't consider Level of Detail (LOD). This structure makes sense to me, but each to their own:
Model - A collection of Meshes. Linked to by the actor/gameObject/entity.
Mesh - A collection of SubMeshes. Each Mesh represents a different LOD level of the Model.
SubMesh - A shaders/material/vertex buffer/index buffer set for a particular LOD level.
Or just an array. Keep cache as nice as you can.Model will have only pointers to its meshes
The model will also have a bounding box. If the model is not in view there is no reason to test any meshes.Mesh will have the physics helpers
Meshes also need to maintain a hierarchy. Every mesh should have a parent except root meshes, and any number of children.
Transforms cascade into children.
Only the mesh needs a vertex buffer. Sub-meshes can do with an index buffer and an offset into the shared vertex buffer. Otherwise you end up with more resources, more duplicate vertices, etc. On the other hand, using multiple vertex buffers is simpler to implement.Sub-Mesh will be the one having all the info like vertex/index buffers, shaders and material IDs, matrices transformations, etc, and this one would represent for example, the eyes, or the mouth, etc.
Matrices are part of the mesh. Sub-meshes cannot have a different world matrix from the mesh’s.
I don’t see the point in the boolean. Sorting via a render queue is something that happens every frame within an entirely separate domain. Trying to gain anything by sorting sub-meshes is useless. If the render-queue itself has the ability to take advantage of frame-to-frame temporal coherence and if the visible objects are added to the render-queue each frame in the same order (which is a natural by-product of any deterministic culling process), then you are already maximizing your performance and anything else, such as caring about the order of the sub-meshes, is simply superfluous.but since the sub-meshes will be sorted by material i would need to activate a bool or something like that, and then go thru all the sub-meshes on the scene, and render the ones that are inside the frustrum by reading the bool i modified.
Sorting doesn’t happen on the model/mesh/sub-mesh level and is an entirely different beast.
You would be better off sticking to the standard terminology described above and use separate models for LOD purposes. An LOD could be a “sub-model” inside a model (so it becomes associated as an LOD rather than as an entirely new model) but it would basically just be a lowerpoly repeat of the model->mesh->sub-mesh structure.This structure makes sense to me, but each to their own:
L. Spiro