"Sorting out" render order

Started by
31 comments, last by cozzie 11 years, 2 months ago
Thanks all. I understand the principle of the index lists. I can definately apply this to reduce material state changes. Although within such a list i would have multiple meshes with own vtx/indexbuffers and world matrices, meaning i need to set these too, these are also statechanges right? (unless i combine meshes into big vtx/indexbuffers) I this is through i can only reduce State changes (and commitchanges) to at least the nr of meshes multiplied by the number of meshes, or am i overseeing something?

Crealysm game & engine development: http://www.crealysm.com

Looking for a passionate, disciplined and structured producer? PM me

Advertisement

[quote name='cozzie' timestamp='1357600643' post='5018799']
Maybe later on bundle the streams/indexbuffer to improve further.
[/quote]I'm not sure I understand what you plan to do with data layout... to improve what?

To re-order mesh drawing, just reorder the drawcalls.

Previously "Krohm"

If i reorder the entities based on materials, i would still need to switch between the meshes (set world matrix, streamsource and indexbuffers) meaning i still need all those State changes. Is this correct? If so, i might be able to merge the entities/meshes in one or two big vtx and indexbuffers, with a common/ shared world matrix. Or am i thinking to difficult and is there an easier way to reduce the number of change states, other then meshes multiplied by materials?

Crealysm game & engine development: http://www.crealysm.com

Looking for a passionate, disciplined and structured producer? PM me

1. Material should store shader along with constant shader params (that don't change from object to object made of this material)

Other params will be defined in object itself as personal.

2. Sort objects by shader technique (or vertex + pixel shader), then by material, then by geometry

3. When you render:

* set first tech, process all objects of this tech, set second tech etc

* inside the tech, apply constant material params once and process all objects of this material

* as they sorted by geometry, you can render instanced, if you write all differences (ideally only World matrix) to the vertex buffer

if two objects have different personal shader parameters, you can't instance.

Note: for instanced rendering you will switch tech, but it likely won't be redundant

If you want some code, I have it:

http://code.google.com/p/deusexmachina/source/browse/branches/Dev/DEM/Src/L1/Render/Renderers/ModelRendererNoLight.cpp

This is my endless work-in-progress :) Feel free to read, use and abuse. If there will be questions, I'll try to answer.

Oh, forgot one thing. Think of object's World (or of WorldViewProjection) matrix as of just another personal shader parameter. It simplifies things.

Thanks Niello. I just went through your code and I think I partially understand it :)

I might need a little more help, if you have time to look at it.

As far as I read your comments, I think I'm on the right way in getting there. Honestly don't know what I can do as a next/ other step of improvement

(other then combining meshes into one vertex/indexbuffer and combined world matrix).


Do you see any other ways to decreasing setting render states, with the current low number of techniques I have up till now?

I added a comment everywhere I set a renderstate or set a parameter of an effect.

The 2 parts where I can make a better index will improve on if statements/ for loops (CPU), but not on renderstates I think.

Main render function:


bool CD3d::RenderFrame(CD3dscene *pD3dscene, CD3dcam *pCam)
{
	if(!CheckDevice()) { mDeviceLost = true; return true; }
	mEntitiesRendered = 0;
	pCam->Update();

	mD3ddev->Clear(0, NULL, D3DCLEAR_TARGET | D3DCLEAR_ZBUFFER, D3DCOLOR_XRGB(0, 0, 0), 1.0f, 0);
	mD3ddev->BeginScene();

	// SHADER rendering
	if(!RenderScene(pD3dscene, pCam, "OpaqueShader", pD3dscene->mMeshIndexOpaque, pD3dscene->mNrD3dMeshesOpaque)) return false;
	
	if(!pD3dscene->SortBlendedMeshes(pCam->mPosition)) return false;
	if(!RenderScene(pD3dscene, pCam, "BlendingShader", pD3dscene->mMeshIndexBlended, pD3dscene->mNrD3dMeshesBlended)) return false;
	
	if(pD3dscene->mSkyBoxInScene) if(!pD3dscene->mSkyBox.Render(pCam->mPosition, pCam, mD3ddev)) return false;

	// FFP rendering
	if(!SetDefaultRenderStates()) return false;			// 6x dev->SetRenderState();
	PrintSceneInfo(pCam, pD3dscene->mNrMaterials);		// draw 2d text with a D3DXFONT

	mD3ddev->EndScene();
	mD3ddev->Present(NULL, NULL, NULL, NULL); 
	return true;
}

The function to render the scene with a specific technique (for all effects in the scene, for now not more techniques)


bool CD3d::RenderScene(CD3dscene *pD3dscene, CD3dcam *pCam, char *pTechnique, int *pMeshIndex, int pNrMeshes)
{
	for(ec=0;ec<pD3dscene->mNrEffects;++ec)		// most of time 1 per scene, today
	{
		if(!SetVertexShader(pD3dscene, ec, pTechnique, pCam)) return false;		// 1x SetTechnique, 1x SetMatrix viewproj
		pD3dscene->mEffect[ec]->Begin(&pD3dscene->mEffectNumPasses[ec], D3DXFX_DONOTSAVESTATE);		// no SetRenderStates
		for(unsigned int i=0;i<pD3dscene->mEffectNumPasses[ec];++i)
		{
			pD3dscene->mEffect[ec]->BeginPass(i);
			for(oc=0;oc<pNrMeshes;++oc)
			{
				if(pD3dscene->mD3dMeshes[pMeshIndex[oc]].mEffectIndex == ec)
				{
					if(pCam->SphereInFrustum(&pD3dscene->mD3dMeshes[pMeshIndex[oc]].mWorldPos, 
											 pD3dscene->mD3dMeshes[pMeshIndex[oc]].mBoundingRadius))
					{
						if(pD3dscene->mD3dMeshes[pMeshIndex[oc]].mDynamic) pD3dscene->mD3dMeshes[pMeshIndex[oc]].UpdateWorldMatrix();
						if(!pD3dscene->PreSelectMesh(pMeshIndex[oc], mD3ddev)) return false;	
						// 2x SetMatrix, world/worldinvtransp, 1x SetStreamSource, 1x SetIndices

						for(mc=0;mc<pD3dscene->mNrMaterials;++mc)
						{
							if(!pD3dscene->PreSelectMaterial(mc, ec)) return false;	// 2x SetFloatArray, 1x SetTexture													   
							pD3dscene->mEffect[ec]->CommitChanges();
							{
								for(DWORD att=0;att<pD3dscene->mD3dMeshes[pMeshIndex[oc]].mAttrSize;++att) // index needed
								{
									if(pD3dscene->mD3dMeshes[pMeshIndex[oc]].mMatIdPerAttr[att] == mc) // index needed
									{
										if(pCam->SphereInFrustum(&pD3dscene->mD3dMeshes[pMeshIndex[oc]].mAttrWorldPos[att], 
											                     pD3dscene->mD3dMeshes[pMeshIndex[oc]].mAttrBoundingRadius[att]))
										{						
											pD3dscene->mD3dMeshes[pMeshIndex[oc]].RenderAttr(mD3ddev, att, LIST); // the draw call
											mEntitiesRendered++;
										}
									}
								}
							}
						}
					}
				}
			}
			pD3dscene->mEffect[ec]->EndPass();
		}
		pD3dscene->mEffect[ec]->End();
	}
	return true;
}

Crealysm game & engine development: http://www.crealysm.com

Looking for a passionate, disciplined and structured producer? PM me

Glad to read that my code comes useful for someone but me smile.png
So, I'll give you a couple of advices, but before is the most important one. Don't spend time on writing The Fastest Possible Code if you don't have a performance bottleneck or if it isn't your aim. While the performance is acceptable (say, 30-60 FPS), develop new functionality without micro-optimization.

Ok, now let's switch from boring lectures to what you want to read:


if(!SetVertexShader(pD3dscene, ec, pTechnique, pCam)) return false;        // 1x SetTechnique, 1x SetMatrix viewproj

You can use shared shader constants (google "HLSL shared") and effect pool and set cross-tech variables like ViewProjection once per frame.


shared float4x4 ViewProj;

In my code it works this way.
Here you save (NumberOfTechniques - 1) * SetMatrix
Also note, that you can pre-multiply World * ViewProj on CPU, if your shaders don't require separate World matrix.


pD3dscene->mEffect[ec]->BeginPass(i);

Each pass sets render states you described for it. VertexShader, PixelShader, ZEnable, ZFunc and others. Also here shader constants are filled. Use PIX from DX SDK to dig deeper into the D3DXEffect calls. Here you can reduce state changes by writing passes effectively, especially when using D3DXFX_DONOTSAVESTATE. There is a good article: http://aras-p.info/texts/d3dx_fx_states.html

Instead of iterating through all meshes for all techs, you can (and probably should) sort your models. Using qsort or std::sort it is a trivial task and takes about 5 minutes.
Also for big scenes you may want to use spatial partitioning for visibility checks and avoid testing visibility of every object. Renderer will receive only visible objects, which leads to performance improvement (especially sorting, which depends on number of objects being sorted).


if(!pD3dscene->PreSelectMesh(pMeshIndex[oc], mD3ddev)) return false;// 2x SetMatrix, world/worldinvtransp, 1x SetStreamSource, 1x SetIndices

If you sort your models by geometry, you can do 1x SetStreamSource, 1x SetIndices once per all objects of this geometry (inside the same shader, but often objects of the same geometry DO use the same shader).

Again, shader is tightly coupled with material. Material is just a shader tech + shader variable values for this tech. So, set as much shaders param as you can after setting technique, and don't reset them for each mesh. Say, all golden objects have the same DiffuseColor. Use material "Gold" of shader "metal" and yellow DiffuseColor, set it once and render all golden objects. Sorting by material will help you a lot. Now you have to reset material for each mesh, even if it is the same for half of them.

Check for redundant sets. In my code you can see


RenderSrv->SetVertexBuffer(0, pMesh->GetVertexBuffer());RenderSrv->SetIndexBuffer(pMesh->GetIndexBuffer());

called for each object, but inside these methods you will find:


if (CurrVB[Index].get_unsafe() == pVB && CurrVBOffset[Index] == OffsetVertex) return;

Early exits may save you a couple of sets the renderer didn't take care of.

Hope this helps.

Wow, thanks for both the lecture and the pointers.

In the mean time I read some articles did some thinking and when through your suggestions one by one.

To begin with I agree with your remark on micro optimizations, I honestly don't now if I need them. I'm a bit anctious because of the specs of my own system and not yet reference tests on older CPU/GPU's (I have I5 2320, 660GTX 2GB, 8GB ram, Win7).

Here's what I'm gonna do/ and a few questions. If you have another minute.... really appreciated :)

Actions;

1 - I will give meshes an ID to be able to render meshes with the same vertex/ index buffer contents and material (will save some state changes definitely)

(although in memory they still have individual buffers.. hm)

2 - Shared parameters; I believe in my situation 'ViewProj' matrix is the only one thats shared, will implement that (quick win)

3 - Will dig into renderstates setting/ changes with PIX, not sure what's going on. I use D3DXDONOTSAVECHANGES and after shader rendering set my default renderstates (six of them). Although commenting this function/ not doing this, gives the same end result (?). I'll look into the article link you posted

4 - save lots of "if statements"/ CPU load by making indexes with meshes/entities per material (already have it, only needs to be sorted and moved into arrays with more columns)

5 - I just 'fixed' metrics/scaling and now have a scene of 70x70 meters (small desert village), I'll add 8 sand hill instances around it (with some trees), so I have 9 'subscenes'/partitions or how you'd call it.

6 - prefer looping through materials firsts and afterwards on meshes. This will save setting parameters for materials, but increase setting the meshes (world matrix, streamsource etc.), since one mesh might have entities with different materials). Is it correct to assume material setting in an effect is less performance eating then setting a mesh with it's parameters?

Questions:

1 - what's the advantage of multiplying world matrix for each mesh, with viewprojection and then pass in only the endresult to the shader?

(compared to doing the multiplication in the shader), does this take 'CPU' time and free 'GPU' time?

I know do this and could change it accordingly (depending on the gain);

* float4 worldPosition = mul(input.Pos, World);
* Out.Pos = mul(worldPosition, ViewProj);

2 - spatial devision.

I see a few options/ ideas I have:

* build up the 'subscenes'/areas while loading a scene, for example 100x100m is a scene

* check camera position against areas/ spaces and cull on this VERSUS cull the areas based on camera lookat vector and frustum

* render only the active area versus this one + the next one facing the camera

(1st option asks from modelling that I 'block' the views to the next area's.

3 - sorting models by geometry.

How you explain it, I could set streamsource and indices just once for multiple meshes (sharing parameters like effect, technique and texture/ material).

Most meshes have their own world matrix, I therefor don't see how to do this. Because I need to set the world matrix anyhow (unless I combine mesh vertexbuffers and indices and one 'general' world matrix for this set of meshes in one buffer? (sounds way to complex for me looking at the possible not necessary micro optimizations :))

4 - checking by redundant vertexbuffer (/indexbuffer) setting; this sounds like not necessary when sorting meshes is correct.

Is this correct or are there other reasons to do this?

5 - batching; I'm gonna check how much triangles I render per draw call, just out of curiosity. I read that drawcalls should be reduced much as possible, with more triangles per draw call (because a draw call will relatively take the same time with more triangles, thus increasing performance). Might this also be a reason why to combine meshes into combined vertex/indexbuffers and shared world matrix?

Looking forward to your answers and ideas.

I'm also curious what hardware/ specs you have, maybe to do a reference tests after my optimizations.

Crealysm game & engine development: http://www.crealysm.com

Looking for a passionate, disciplined and structured producer? PM me

Not at all. My profit is that I systematize and refine my knowledge writing this. Also maybe someone else will point me if I'm wrong.

If you write under DX9, remember that it exists since the beginning of the past decade, near 10 years. All modern (and many obsolete) hardware supports DX9. It is never too late to optimize if you discover that your scenes are too big. Moreover, at that point you will know why your scenes are rendered slow, and choose optimizations accordingly. Now we discuss about techniques good in general.

Actions:

1) Don't associate mesh with model instance. You may use the same mesh for many objects in a scene and store vertex & index data once. You even can render the same mesh under different materials and with different World matrix.

3) Do you mean D3DXFX_DONOTSAVESTATE? Docs claim that it prevents saving state in Begin() and restoring in End(). BeginPass() sets states anyway. Can't say more without seeing what's going on in your PIX.

6) World matrix will be set the same number of times anyway, cause it is per-object and set for eac object despite of sorting. AFAIK changing shader tech its the most costly operation. Setting shader constants is less costly. Setting textures and VBs/IBs depends on memory pool and total amount of GPU memory. This is not exact, you should profile. PIX has some profiling functionality.

Questions:

1) You perform operation World * ViewProj. If you do this in a vertex shader, you have one GPU mul (4 x dp4) per VERTEX. If you do this on CPU, you have 1 matrix multiply (some CPU cycles or, better, fast inlined SSE function) per OBJECT. Given your object has 3 to 15000 vertices...
But if you want to implement per-pixel lighting in shader, you must supply World matrix to it, and perform at least 2 matrix multiplications anyway. Here shared ViewProj helps. Send World matrix to shader, get world position, use it, multiply it by ViewProj and get projected position.

2) Spatial partitioning is a mature conception with many methods developed and information available. Spend some time in reading and googling. As of me, I preferred "loose octree" as a spatial partitioning structure, but now use simple "quadtree", because there are another interesting things to implement and I have no free time to be spread over secondary tasks (not sure there is such idioma in english, hm...).

In a couple of words, spatial partitioning is based on "If I don't see half of level, I don't see any half of that half, etc etc, and I don't see any object there. But if I completely see the half of level, I definitely see all things there".

Some code:
http://code.google.com/p/deusexmachina/source/browse/branches/Dev/DEM/Src/L1/Data/QuadTree.h
http://code.google.com/p/deusexmachina/source/browse/branches/Dev/DEM/Src/L1/Scene/SPS.h
http://code.google.com/p/deusexmachina/source/browse/branches/Dev/DEM/Src/L1/Scene/Scene.cpp (line 173, SPSCollectVisibleObjects)

3) As I already wrote, the World matrix is just a shader parameter. You can do this:


SetStreamSource
SetIndices
for all models that are represented by this mesh
	SetMatrix(World)
	DrawIndexedPrimitive



Moreover, you can use instancing here and render many objects with only World matrix different in one DIP.

One World matrix will set position and orientation of all your meshes the same, so all them will be rendered at one point, looking like a junkyard after a nuclear explosion. You can pre-multiply each mesh by its world matrix, and then save to the vertex buffer. It may worth it if you have static scene of different geometries, rendered with one texture and material, but in general it is work in vain, and completely unacceptable for dynamic (moving or skinned) objects. Don't spend your time on it, setting world matrix for each object is cheap enough. Also read this:
http://msdn.microsoft.com/en-us/library/windows/desktop/bb173349(v=vs.85).aspx

It also was an answer to 5)

4) Check for all redundant sets (except, maybe, shader constants), not only for IB, VB. It is very easy to implement.
If we have objects sorted by material, then by geometry:

M1 G1
M1 G2
M2 G2

for each material
SetShader
for each geometry
SetVB
Render

Without redundancy checks we have:

SetShader(M1)
SetVB(G1)
Render
SetVB(G2)
Render
SetShader(M2)
SetVB(G2)
Render

And with it:

SetShader(M1)
SetVB(G1)
Render
SetVB(G2)
Render
SetShader(M2)
[WE DON'T RESET G2 AS IT IS ALREADY SET]
Render

It has occasional effect, but since it comes almost for free, use it.

My HW is a notebook with Core i7 2630QM + Radeon HD6770M. There is also integrated Intel Mobile HD3000(?) graphics chip.

Thanks, I'll go work on it and keep you posted in a few days after making quite some changes.

One last thing I read in an MSDN article (Accurately profiling Direct3D API calls), is that:

- Setvertexshadercontant = avg. 1000 - 2700 cycles (I assume float4 arrays, matrices etc.)

- SetTexture = avg. 2500 - 3100 cycles

- SetStreamsource = avg. 3700 - 5800

- SetIndices = avg. 900 - 5600

When I use, say the minimum averages and compare switches a mesh with switching material:

Mesh change => 5.600 (3700 + 900 + 1000 (world matrix) + 1000 (inv trans world matrix))

Material change => 4.500 (2.500 + 1000 mat amb + 1000 mat diff)

Material switching seems just a little lower, or am I overseeing something?

(or one of both does things "under water" which I could find out with PIX)

Crealysm game & engine development: http://www.crealysm.com

Looking for a passionate, disciplined and structured producer? PM me

This topic is closed to new replies.

Advertisement