Jump to content

  • Log In with Google      Sign In   
  • Create Account

Banner advertising on our site currently available from just $5!

1. Learn about the promo. 2. Sign up for GDNet+. 3. Set up your advert!

"Sorting out" render order

Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
32 replies to this topic

#1 cozzie   Members   -  Reputation: 2699


Posted 05 January 2013 - 01:57 PM


I'm at the point of implementing alpha blending (shader) to be able to use transparant/translucent textures, grass etc.

For now I didn't use any object/ mesh sorting because I didn't see the advantage yet.


But.. when using alpha blending, I've learned that I have to render objects (or better triangles?) from back to front for the expected result.


A few questions I'd like to hear your opinion on;

- is it worthwhile to render meshes/ objects based on world positition Z (initial worldpos vector * actual world matrix)?

(by the CPU, finding an efficient way of looping through objects, materials, effects, entities)

- what would you suggest to sort objects on when it comes to alpha blending?


The direction I'm following now is:

- update worldmatrix of each mesh each frame

- multiply by initial worldpos, resulting in a vector with 'Z' value

- use Z value to render 'regular' objects, front to back (increase performance on rasterization/ pixel shader)

- use Z value to render 'blended' objects, back to front


Any hints or pointers are really appreciated.

Will also be quite a challenge too find the best way to combine the 'loops' (materials, effects, effects, subobjecs/entities).


#2 Nyssa   Members   -  Reputation: 426


Posted 06 January 2013 - 06:35 AM

I think the way your currently going about it is fine and fairly common. So your just adding all visible transparent objects to a list then sorting that list based on world distance from the camera. One thing to consider however, using the world position to camera may not be the best metric because consider two objects being close together, and the one at the back being much larger than the one in front. In this case you should probably render the object at the back first. So instead maybe calculate the distance from the camera to the objects bounding box.

#3 cozzie   Members   -  Reputation: 2699


Posted 06 January 2013 - 10:04 AM

Thanks Nyssa, I can easily do that by using my radius value for each mesh, in combination with the world position (based on current camera position).

Good to hear that I'm on the right way.

Sorting 'regular' meshes (cpu usage) for rendering isn't that usefull (yet/ for now).


In the end I might need to merge vertex and indexbuffers to save renderstates etc..

I know loop through all effects, then materials, then meshes, the entities. This is potentially not perfect, because I switch meshes (index/ vertexbuffer) more then once for each entity. I find this a difficult choice, it's more setting of vars for the materials (in the effect/shader) versus setting more matrices and index/vertexbuffers. Don't know what's worse.


I might go for index table to prevent loops (for static objects that is, now 90% of the scene), but also then I would need to do switches on the meshes or materials. When I think of it now, probably switching materials is quicker then meshes:


- material select:

=> 2 SetFloatArray calls (amb and diff)

=> 1 TexTure call (or more when multitexturing or so)


- mesh select:

=> SetMatrix (World)

=> SetMatrix (WorldInvTransp, for normals/ lighting)

=> SetStreamSource

=> SetIndices


But then again going through all meshes and then materials, means selecting the materials again for other meshes.

Dilemma's dilemma's .... huh.png


Any advice on this?

#4 cozzie   Members   -  Reputation: 2699


Posted 06 January 2013 - 10:08 AM

as an addition, I do have a table (unsorted), with for each entity/ sub object:

- entity nr of mesh

- mesh ID

- material ID

- effect/ shader ID

#5 cozzie   Members   -  Reputation: 2699


Posted 07 January 2013 - 02:56 PM

@Nyssa (and others :))


I've been studying my renderframe and found some flaws (I believe). To illustrate.


Sample scene:

15 meshes

5 entities for each mesh

7 unique materials


Assume that I do the following one time for Opaque meshes with Opaque technique and another time for blended meshes with Blending technique (already have 2 mesh index int lists in use):


loop through effects/shaders


    begin effect

    set technique (blended or opaque)

    set viewprojection matrix of camera

    loop through passes of technique


        loop through materials


            select material (setfloatarray 2x, settexture)

            loop through meshes; if mesh.mEffect = current effect


                if mesh in frustum


                    setstream vertices, setindices

                    set worldmatrix


                    loop through entities; if entity.material = current material


                        if in frustum











This means I do:

- 7 material switches (2 float arrays, texture)

- 105 mesh switches (setstreamsource, indices and world matrix)

- 105 commitchanges

- 105x check effectindex for mesh

- 525x check material index for entity


I figure if I turn it around and go through meshes and then materials:

- 15 mesh switches

- 105 material switches

- 105 commitchanges

- 15x check effectindex for mesh

- 525x chec material index for entity


Room for improvement?

- what would take more performance, switching a mesh versus switching a material?

- bringing back the if loops for effect and material ID checking, I can do with index lists (int array's)

- what other improvements should I look for?

(I'm not yet ready to go for merged meshes into one vertex or index buffer, which might bring other possibilities)


Any advice is really appreciated.

#6 phantom   Moderators   -  Reputation: 8493


Posted 07 January 2013 - 05:06 PM

Switching material implies switching shaders and switching shaders is basically the most costly thing to do on the GPU, as such you pretty much always want to reduce that to the smallest amount you can by sorting by material and, if you can, using instancing.


The other problem you have, however, if you over all algorithm as while it gets the job done it simply does too much work and too much redundant work at that.


The way many engines do it is to generate a list of objects which need to be rendered and then sort them using a sort key into material order, after which they simply walk through the list from top to bottom rendering each object as it comes. There will be some logic to reduce redundant shader changes (which can be as simple as keeping a track of the current material and shaders in it and only calling the 'set' functions if the new object requires something different) but over all once you have a sorted list it is simple to do.


http://realtimecollisiondetection.net/blog/?p=86 covers some more details,

#7 cozzie   Members   -  Reputation: 2699


Posted 07 January 2013 - 05:17 PM

Hi Phantom,

Thanks, this clears things up. The article covers recognizable stuff, that's the good thing :)


I thought about making indices of entities (with corresponding mesh ID) who share the same material.

This way I could batch up and reduce the material changes to the number of materials in the scene.


Thinking about it, with what your saying, this saves a lot of state changes.

Assuming that I don't need to 'CommitChanges' when setting another streamsource and indexbuffer, is this correct?


If so, this will be my first step.

Maybe later on bundle the streams/indexbuffer to improve further.

#8 cozzie   Members   -  Reputation: 2699


Posted 08 January 2013 - 02:37 AM

@Phantom; just remembered that this probably wont help/ work, because when changes the streamsource/vertexbuffer (mesh) I also need to set the mesh's world and worldinvtranspose matrix. Which probably means I need to commitchanges anyway...

#9 Nyssa   Members   -  Reputation: 426


Posted 08 January 2013 - 02:52 AM

Yeah Phantom is spot on, your doing alot of redundant searching there.


The way I currently do it is via render "buckets". So as you loop over your scene objects, after you have determined an objects visible, you determine if its solid or transparent (you might have a boolean in your material for this) then add that object to the solid or transparent list. Each Object also has a key based on its render state (this can include its rasteriser state, blend state, texture ids, etc...) and each list is sorted using that key. So objects with similar states will be next to each other thus state changes should be kept to a minimum. Then you just render each object in those lists, setting world matrix values as you go.


Keep in mind that all this sorting needs to be done in an efficient way else it ends up being faster to simply do all the state changes anyway! You could also add to the above method a way of altering these lists only when an object moves in/out of the viewing frustum. That way they don't need to be rebuilt each frame!


I should add...I'm currently working in c++, but the same theory should translate in to c# smile.png

Edited by Nyssa, 08 January 2013 - 02:58 AM.

#10 kubera   Members   -  Reputation: 1094


Posted 08 January 2013 - 03:01 AM

Maybe rendering alpha-blended meshes after opaque meshes would be enough.
(two phases without meshes positions analysis).

#11 cozzie   Members   -  Reputation: 2699


Posted 08 January 2013 - 11:54 AM

Thanks all. I understand the principle of the index lists. I can definately apply this to reduce material state changes. Although within such a list i would have multiple meshes with own vtx/indexbuffers and world matrices, meaning i need to set these too, these are also statechanges right? (unless i combine meshes into big vtx/indexbuffers) I this is through i can only reduce State changes (and commitchanges) to at least the nr of meshes multiplied by the number of meshes, or am i overseeing something?

#12 Krohm   Crossbones+   -  Reputation: 3605


Posted 09 January 2013 - 01:55 AM

Maybe later on bundle the streams/indexbuffer to improve further.

I'm not sure I understand what you plan to do with data layout... to improve what?

To re-order mesh drawing, just reorder the drawcalls.

#13 cozzie   Members   -  Reputation: 2699


Posted 09 January 2013 - 11:39 AM

If i reorder the entities based on materials, i would still need to switch between the meshes (set world matrix, streamsource and indexbuffers) meaning i still need all those State changes. Is this correct? If so, i might be able to merge the entities/meshes in one or two big vtx and indexbuffers, with a common/ shared world matrix. Or am i thinking to difficult and is there an easier way to reduce the number of change states, other then meshes multiplied by materials?

#14 Niello   Members   -  Reputation: 130


Posted 14 January 2013 - 05:04 AM

1. Material should store shader along with constant shader params (that don't change from object to object made of this material)

Other params will be defined in object itself as personal.

2. Sort objects by shader technique (or vertex + pixel shader), then by material, then by geometry

3. When you render:

* set first tech, process all objects of this tech, set second tech etc

* inside the tech, apply constant material params once and process all objects of this material

* as they sorted by geometry, you can render instanced, if you write all differences (ideally only World matrix) to the vertex buffer

if two objects have different personal shader parameters, you can't instance.

Note: for instanced rendering you will switch tech, but it likely won't be redundant



If you want some code, I have it:



This is my endless work-in-progress :) Feel free to read, use and abuse. If there will be questions, I'll try to answer.

#15 Niello   Members   -  Reputation: 130


Posted 14 January 2013 - 05:06 AM

Oh, forgot one thing. Think of object's World (or of WorldViewProjection) matrix as of just another personal shader parameter. It simplifies things.

#16 cozzie   Members   -  Reputation: 2699


Posted 14 January 2013 - 04:27 PM

Thanks Niello. I just went through your code and I think I partially understand it :)

I might need a little more help, if you have time to look at it.


As far as I read your comments, I think I'm on the right way in getting there. Honestly don't know what I can do as a next/ other step of improvement

(other then combining meshes into one vertex/indexbuffer and combined world matrix).

Do you see any other ways to decreasing setting render states, with the current low number of techniques I have up till now?


I added a comment everywhere I set a renderstate or set a parameter of an effect.

The 2 parts where I can make a better index will improve on if statements/ for loops (CPU), but not on renderstates I think.


Main render function:


bool CD3d::RenderFrame(CD3dscene *pD3dscene, CD3dcam *pCam)
	if(!CheckDevice()) { mDeviceLost = true; return true; }
	mEntitiesRendered = 0;

	mD3ddev->Clear(0, NULL, D3DCLEAR_TARGET | D3DCLEAR_ZBUFFER, D3DCOLOR_XRGB(0, 0, 0), 1.0f, 0);

	// SHADER rendering
	if(!RenderScene(pD3dscene, pCam, "OpaqueShader", pD3dscene->mMeshIndexOpaque, pD3dscene->mNrD3dMeshesOpaque)) return false;
	if(!pD3dscene->SortBlendedMeshes(pCam->mPosition)) return false;
	if(!RenderScene(pD3dscene, pCam, "BlendingShader", pD3dscene->mMeshIndexBlended, pD3dscene->mNrD3dMeshesBlended)) return false;
	if(pD3dscene->mSkyBoxInScene) if(!pD3dscene->mSkyBox.Render(pCam->mPosition, pCam, mD3ddev)) return false;

	// FFP rendering
	if(!SetDefaultRenderStates()) return false;			// 6x dev->SetRenderState();
	PrintSceneInfo(pCam, pD3dscene->mNrMaterials);		// draw 2d text with a D3DXFONT

	mD3ddev->Present(NULL, NULL, NULL, NULL); 
	return true;

The function to render the scene with a specific technique (for all effects in the scene, for now not more techniques)


bool CD3d::RenderScene(CD3dscene *pD3dscene, CD3dcam *pCam, char *pTechnique, int *pMeshIndex, int pNrMeshes)
	for(ec=0;ec<pD3dscene->mNrEffects;++ec)		// most of time 1 per scene, today
		if(!SetVertexShader(pD3dscene, ec, pTechnique, pCam)) return false;		// 1x SetTechnique, 1x SetMatrix viewproj
		pD3dscene->mEffect[ec]->Begin(&pD3dscene->mEffectNumPasses[ec], D3DXFX_DONOTSAVESTATE);		// no SetRenderStates
		for(unsigned int i=0;i<pD3dscene->mEffectNumPasses[ec];++i)
				if(pD3dscene->mD3dMeshes[pMeshIndex[oc]].mEffectIndex == ec)
						if(pD3dscene->mD3dMeshes[pMeshIndex[oc]].mDynamic) pD3dscene->mD3dMeshes[pMeshIndex[oc]].UpdateWorldMatrix();
						if(!pD3dscene->PreSelectMesh(pMeshIndex[oc], mD3ddev)) return false;	
						// 2x SetMatrix, world/worldinvtransp, 1x SetStreamSource, 1x SetIndices

							if(!pD3dscene->PreSelectMaterial(mc, ec)) return false;	// 2x SetFloatArray, 1x SetTexture													   
								for(DWORD att=0;att<pD3dscene->mD3dMeshes[pMeshIndex[oc]].mAttrSize;++att) // index needed
									if(pD3dscene->mD3dMeshes[pMeshIndex[oc]].mMatIdPerAttr[att] == mc) // index needed
											pD3dscene->mD3dMeshes[pMeshIndex[oc]].RenderAttr(mD3ddev, att, LIST); // the draw call
	return true;

#17 Niello   Members   -  Reputation: 130


Posted 15 January 2013 - 12:05 AM

Glad to read that my code comes useful for someone but me smile.png
So, I'll give you a couple of advices, but before is the most important one. Don't spend time on writing The Fastest Possible Code if you don't have a performance bottleneck or if it isn't your aim. While the performance is acceptable (say, 30-60 FPS), develop new functionality without micro-optimization.

Ok, now let's switch from boring lectures to what you want to read:

if(!SetVertexShader(pD3dscene, ec, pTechnique, pCam)) return false;        // 1x SetTechnique, 1x SetMatrix viewproj

You can use shared shader constants (google "HLSL shared") and effect pool and set cross-tech variables like ViewProjection once per frame.

shared float4x4 ViewProj;

In my code it works this way.
Here you save (NumberOfTechniques - 1) * SetMatrix
Also note, that you can pre-multiply World * ViewProj on CPU, if your shaders don't require separate World matrix.


Each pass sets render states you described for it. VertexShader, PixelShader, ZEnable, ZFunc and others. Also here shader constants are filled. Use PIX from DX SDK to dig deeper into the D3DXEffect calls. Here you can reduce state changes by writing passes effectively, especially when using D3DXFX_DONOTSAVESTATE. There is a good article: http://aras-p.info/texts/d3dx_fx_states.html

Instead of iterating through all meshes for all techs, you can (and probably should) sort your models. Using qsort or std::sort it is a trivial task and takes about 5 minutes.
Also for big scenes you may want to use spatial partitioning for visibility checks and avoid testing visibility of every object. Renderer will receive only visible objects, which leads to performance improvement (especially sorting, which depends on number of objects being sorted).

if(!pD3dscene->PreSelectMesh(pMeshIndex[oc], mD3ddev)) return false;// 2x SetMatrix, world/worldinvtransp, 1x SetStreamSource, 1x SetIndices

If you sort your models by geometry, you can do 1x SetStreamSource, 1x SetIndices once per all objects of this geometry (inside the same shader, but often objects of the same geometry DO use the same shader).

Again, shader is tightly coupled with material. Material is just a shader tech + shader variable values for this tech. So, set as much shaders param as you can after setting technique, and don't reset them for each mesh. Say, all golden objects have the same DiffuseColor. Use material "Gold" of shader "metal" and yellow DiffuseColor, set it once and render all golden objects. Sorting by material will help you a lot. Now you have to reset material for each mesh, even if it is the same for half of them.

Check for redundant sets. In my code you can see

RenderSrv->SetVertexBuffer(0, pMesh->GetVertexBuffer());RenderSrv->SetIndexBuffer(pMesh->GetIndexBuffer());

called for each object, but inside these methods you will find:

if (CurrVB[Index].get_unsafe() == pVB && CurrVBOffset[Index] == OffsetVertex) return;

Early exits may save you a couple of sets the renderer didn't take care of.

Hope this helps.

Edited by Niello, 15 January 2013 - 12:08 AM.

#18 cozzie   Members   -  Reputation: 2699


Posted 15 January 2013 - 04:23 PM

Wow, thanks for both the lecture and the pointers.

In the mean time I read some articles did some thinking and when through your suggestions one by one.

To begin with I agree with your remark on micro optimizations, I honestly don't now if I need them. I'm a bit anctious because of the specs of my own system and not yet reference tests on older CPU/GPU's (I have I5 2320, 660GTX 2GB, 8GB ram, Win7).


Here's what I'm gonna do/ and a few questions. If you have another minute.... really appreciated :)



1 - I will give meshes an ID to be able to render meshes with the same vertex/ index buffer contents and material (will save some state changes definitely)

(although in memory they still have individual buffers.. hm)

2 - Shared parameters; I believe in my situation 'ViewProj' matrix is the only one thats shared, will implement that (quick win)

3 - Will dig into renderstates setting/ changes with PIX, not sure what's going on. I use D3DXDONOTSAVECHANGES and after shader rendering set my default renderstates (six of them). Although commenting this function/ not doing this, gives the same end result (?). I'll look into the article link you posted

4 - save lots of "if statements"/ CPU load by making indexes with meshes/entities per material (already have it, only needs to be sorted and moved into arrays with more columns)

5 - I just 'fixed' metrics/scaling and now have a scene of 70x70 meters (small desert village), I'll add 8 sand hill instances around it (with some trees), so I have 9 'subscenes'/partitions or how you'd call it.

6 - prefer looping through materials firsts and afterwards on meshes. This will save setting parameters for materials, but increase setting the meshes (world matrix, streamsource etc.), since one mesh might have entities with different materials). Is it correct to assume material setting in an effect is less performance eating then setting a mesh with it's parameters?



1 - what's the advantage of multiplying world matrix for each mesh, with viewprojection and then pass in only the endresult to the shader?

(compared to doing the multiplication in the shader), does this take 'CPU' time and free 'GPU' time?

I know do this and could change it accordingly (depending on the gain);

* float4 worldPosition = mul(input.Pos, World);
* Out.Pos = mul(worldPosition, ViewProj);

2 - spatial devision.

I see a few options/ ideas I have:

* build up the 'subscenes'/areas while loading a scene, for example 100x100m is a scene

* check camera position against areas/ spaces and cull on this VERSUS cull the areas based on camera lookat vector and frustum

 * render only the active area versus this one + the next one facing the camera

(1st option asks from modelling that I 'block' the views to the next area's.


3 - sorting models by geometry.

How you explain it, I could set streamsource and indices just once for multiple meshes (sharing parameters like effect, technique and texture/ material).

Most meshes have their own world matrix, I therefor don't see how to do this. Because I need to set the world matrix anyhow (unless I combine mesh vertexbuffers and indices and one 'general' world matrix for this set of meshes in one buffer? (sounds way to complex for me looking at the possible not necessary micro optimizations :))


4 - checking by redundant vertexbuffer (/indexbuffer) setting; this sounds like not necessary when sorting meshes is correct.

Is this correct or are there other reasons to do this?


5 - batching; I'm gonna check how much triangles I render per draw call, just out of curiosity. I read that drawcalls should be reduced much as possible, with more triangles per draw call (because a draw call will relatively take the same time with more triangles, thus increasing performance). Might this also be a reason why to combine meshes into combined vertex/indexbuffers and shared world matrix?


Looking forward to your answers and ideas.

I'm also curious what hardware/ specs you have, maybe to do a reference tests after my optimizations.

#19 Niello   Members   -  Reputation: 130


Posted 15 January 2013 - 06:04 PM

Not at all. My profit is that I systematize and refine my knowledge writing this. Also maybe someone else will point me if I'm wrong.

If you write under DX9, remember that it exists since the beginning of the past decade, near 10 years. All modern (and many obsolete) hardware supports DX9. It is never too late to optimize if you discover that your scenes are too big. Moreover, at that point you will know why your scenes are rendered slow, and choose optimizations accordingly. Now we discuss about techniques good in general.


1) Don't associate mesh with model instance. You may use the same mesh for many objects in a scene and store vertex & index data once. You even can render the same mesh under different materials and with different World matrix.

3) Do you mean D3DXFX_DONOTSAVESTATE? Docs claim that it prevents saving state in Begin() and restoring in End(). BeginPass() sets states anyway. Can't say more without seeing what's going on in your PIX.

6) World matrix will be set the same number of times anyway, cause it is per-object and set for eac object despite of sorting. AFAIK changing shader tech its the most costly operation. Setting shader constants is less costly. Setting textures and VBs/IBs depends on memory pool and total amount of GPU memory. This is not exact, you should profile. PIX has some profiling functionality.


1) You perform operation World * ViewProj. If you do this in a vertex shader, you have one GPU mul (4 x dp4) per VERTEX. If you do this on CPU, you have 1 matrix multiply (some CPU cycles or, better, fast inlined SSE function) per OBJECT. Given your object has 3 to 15000 vertices...
But if you want to implement per-pixel lighting in shader, you must supply World matrix to it, and perform at least 2 matrix multiplications anyway. Here shared ViewProj helps. Send World matrix to shader, get world position, use it, multiply it by ViewProj and get projected position.

2) Spatial partitioning is a mature conception with many methods developed and information available. Spend some time in reading and googling. As of me, I preferred "loose octree" as a spatial partitioning structure, but now use simple "quadtree", because there are another interesting things to implement and I have no free time to be spread over secondary tasks (not sure there is such idioma in english, hm...).

In a couple of words, spatial partitioning is based on "If I don't see half of level, I don't see any half of that half, etc etc, and I don't see any object there. But if I completely see the half of level, I definitely see all things there".

Some code:
http://code.google.com/p/deusexmachina/source/browse/branches/Dev/DEM/Src/L1/Scene/Scene.cpp (line 173, SPSCollectVisibleObjects)

3) As I already wrote, the World matrix is just a shader parameter. You can do this:

for all models that are represented by this mesh

Moreover, you can use instancing here and render many objects with only World matrix different in one DIP.

One World matrix will set position and orientation of all your meshes the same, so all them will be rendered at one point, looking like a junkyard after a nuclear explosion. You can pre-multiply each mesh by its world matrix, and then save to the vertex buffer. It may worth it if you have static scene of different geometries, rendered with one texture and material, but in general it is work in vain, and completely unacceptable for dynamic (moving or skinned) objects. Don't spend your time on it, setting world matrix for each object is cheap enough. Also read this:

It also was an answer to 5)

4) Check for all redundant sets (except, maybe, shader constants), not only for IB, VB. It is very easy to implement.
If we have objects sorted by material, then by geometry:

M1 G1
M1 G2
M2 G2

for each material
for each geometry

Without redundancy checks we have:


And with it:


It has occasional effect, but since it comes almost for free, use it.

My HW is a notebook with Core i7 2630QM + Radeon HD6770M. There is also integrated Intel Mobile HD3000(?) graphics chip.

#20 cozzie   Members   -  Reputation: 2699


Posted 16 January 2013 - 02:49 PM

Thanks, I'll go work on it and keep you posted in a few days after making quite some changes.


One last thing I read in an MSDN article (Accurately profiling Direct3D API calls), is that:

- Setvertexshadercontant = avg. 1000 - 2700 cycles (I assume float4 arrays, matrices etc.)

- SetTexture = avg. 2500 - 3100 cycles

- SetStreamsource = avg. 3700 - 5800

- SetIndices = avg. 900 - 5600


When I use, say the minimum averages and compare switches a mesh with switching material:

Mesh change => 5.600 (3700 + 900 + 1000 (world matrix) + 1000 (inv trans world matrix))

Material change => 4.500 (2.500 + 1000 mat amb + 1000 mat diff)


Material switching seems just a little lower, or am I overseeing something?

(or one of both does things "under water" which I could find out with PIX)

Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.