"Sorting out" render order

Started by
31 comments, last by cozzie 11 years, 2 months ago

Hi again!

First, you didn't take into account ID3DXEffect calls like BeginPass, where SetVertexShader & SetPixelShader are called. It may not be actual when you use one tech for all, but it isn't practical, in any game you will use more, and if no, don't think too much about renderer at all.

Second. Since SetIndices is 900-5600, you can't just substitute 900 and make any assumptions. Why not, say, 4200? Or even 5600? It greatly changes things, isn't it? :) The answer is easy. Profile by yourself. Hardware changes, many other circumstances change, and more or less accurate profiling results can be gathered only on your target platform.

But the most significant my advice remains the same: write new features, expand your scene's quality and complexity, and start optimizing only when it comes necessary. Profiling has no real meaning in a synthetic environment. You should profile the things you user will receive or special test scenes where some bottlenecks are reproduced (like scene with lots of different particles to optimize particle systems).

Advertisement

Hi Niello.
Just starting working on all the changes.

Using a shared parameter and 3dxeffectpool for my 'view projection matrix' is working (leaving world matrix calculation as parameter alone for now, looking at future plans with per pixel lighting). What is maybe strange is that it both works with: "float4x4 viewProj"; as with "shared float4x4 viewProj";

I simply created a LP3DXEFFECTPOOL and set the viewProj matrix only once per frame, result is fine (with and without 'shared' in the FX file/shader.
For now I'll keep it in, although don't understand why it works without.

Short version of the code:


// changed part of shader/effect creation function at startup

		D3DXCreateEffectPool(&mEffectPool);
			if(D3D_OK != D3DXCreateEffectFromFileA(pD3ddev, pScene->mEffectFilenames[ec].c_str(), NULL, NULL, 0, mEffectPool, &mEffect[ec], &errorBuffer))

// new function that now only sets technique, instead of also viewProj matrix

bool CD3d::SetShaderTechnique(CD3dscene *pD3dscene, int pEffectIndex, char *pTechnique)
{
	if(D3DERR_INVALIDCALL == pD3dscene->mEffect[pEffectIndex]->SetTechnique(pTechnique)) return false;
	return true;
}

// new part of render function

	// SHADER rendering
	// Set shared parameters first
	if(D3DERR_INVALIDCALL == pD3dscene->mEffect[0]->SetMatrix("ViewProj", &pCam->mMatViewProjection)) return false;	// SHARED PAREMETER IN POOL
	
	if(!RenderScene(pD3dscene, pCam, "OpaqueShader", pD3dscene->mMeshIndexOpaque, pD3dscene->mNrD3dMeshesOpaque)) return false;
	
	if(!pD3dscene->SortBlendedMeshes(pCam->mPosition)) return false;
	if(!RenderScene(pD3dscene, pCam, "BlendingShader", pD3dscene->mMeshIndexBlended, pD3dscene->mNrD3dMeshesBlended)) return false;
	
	if(pD3dscene->mSkyBoxInScene) if(!pD3dscene->mSkyBox.Render(pCam->mPosition, pCam, mD3ddev)) return false;


Will go into splitting my mesh class into a 'real' mesh class and new meshinstance class (including all changes necessary with this).

Crealysm game & engine development: http://www.crealysm.com

Looking for a passionate, disciplined and structured producer? PM me

Just a short update;
Going from meshes to mesh and meshinstance is quite a job, but a big improvement for sure.
When I think about it, I had about 20 or so tree meshes eating memory and buffers, while beeing all the same.

Short update;
Rough implementation done, nice side effect is that loading time is decreased but a couple of thousand % :)
Next step is clean indices..

Will keep you posted rolleyes.gif

Crealysm game & engine development: http://www.crealysm.com

Looking for a passionate, disciplined and structured producer? PM me

@Niello; still there?

In the middle of next steps for using mesh instances instead of full mesh for every object right now.
I'm now starting to take the following approach:

- create index table in mesh class containing a list with ID's of the instances of that mesh
(or maybe do this in my scene class, like 2 dimensional array, not sure if this works memory allocation wise?)
Do this trick 2 times, one for blended and one for opaque instances
- create index table in scene cass with array per material, containing the mesh ID's of meshes using this material
(create at startup for all static objects, no solution yet for dynamic objects)

- at rendertime I split rendering into a few main steps:

1a. culling; loop through all mesh instances and check against frustum. Mark with bool visible true/false
(in the future in this step I could add binary space checking, tree's, portals or whatever)
1b. sort blended meshinstances

2. main rendering loop:

a* loop through all materials
b* select material (state changes)
c* loop through mesh index that contains which meshes contain active material
d* select mesh (state changes, set buffers)
e* for each mesh, loop through the meshinstances index
f* if meshinstance visible true/false
h* if in frustum select meshinstance (state changes)
i* for each submesh of meshinstance do 'live' check boundingsphere in frustum
j* do draw call
... till end of scene

All steps above 2x, one for opaque and one for blended.
On state changes I will definitely save quite few setstreamsources/ setindices.

I'll also do some profiling on the number of batches/ draw calls I do per frame and how many triangles they include.

What's your advice on this, am I shooting myself in the foot for expansions in the future? (i.e. combining buffers, binary space positioning etc.).
Also curious what you think about the 'shared float' thingie above.

update 21-1;
still working on it and making nice steps, just decided I want a renderqueue class to handle all this. To be able in having a flexible 'render bucket'. In the class I'll have all indices for meshes, materials, submeshes, save depths, sorting functions etc.

Still curious though on your thoughts/questions on the last updates

Crealysm game & engine development: http://www.crealysm.com

Looking for a passionate, disciplined and structured producer? PM me

Hi. Here I am again. Btw, happy birthday to both you and me)

Shared params in effects are shared between different effects. While you use 1 effect you won't see any difference, but when there are different ID3DXEffect objects, that are created with the same pool, setting shared variable to one of them sets it in them all.

Your mesh refactoring is a good news. Also, if you use .x mesh files in ASCII format, moving to binary files will result in another big loading time win. And the third could be using precompiled .fx shaders.

As of your indexing system, I prefer sorting each frame. My advice on it all - download a couple of popular 3D engines and explore them. There are different advanced techniques that had prove their efficiency. My teacher, for example, is The Nebula Device of versions 2 and 3, but I don't recommend to copypaste them, instead you can gather ideas from. After all I faced the need of reimplementing the whole Nebula scene graph and renderer. Irrlicht or Ogre are also a good starting point, not sure about architecture, but render techs - definitely.

Hi, good to hear from you.

Unfortunately I made a mistake on my profile, birth day is due one month ,19th of february ;)

Happy birthday to you though! :)

Thanks for your remarks, I'm learning more then one thing from this mesh/ setting up good render(que) funding, one being that I can really use feedback like your input, and the other being that I should just do and try and not ask everything on forehand.

I'll keep you posted upcoming days after I finish the next steps and will show you the result (and get your comments :))

I will be sorting each frame, depending on which index we talk about. Specific static things like mesh/ material index won't change, so I'll not sort on that. What might be worth a try is sorting mesh instances index after culling based on visible yes/no, this should be done each frame them. Is this what you mean?

(I'm not sure if sorting only visible instances is worth it versus checking if visible or not in the render loop)

Another/last thing is that I now use (unsigned) int arrays for the indices.

What might bring a little is using (multi)maps instead of separate int arrays, but personally I think this would be micro optimization (not necessary).

For another optimization that could bring something I could check for redundant sate setting like you mentioned earlier and maybe do some profiling with PIX.

After that back to introduction new and nice goodies in the engine, which is then nicely funded and structured for future expansions / changes.

Crealysm game & engine development: http://www.crealysm.com

Looking for a passionate, disciplined and structured producer? PM me

Good news, the basics are set up and working nicely..

Created a renderqueue class taking care of things and split up updating the scene from rendering (might be usefull if I need multi threading in the future).

What I don't get yet, is how to implement a check for redundant setting of a vertexbuffer or indices.

Next steps are;

- add an index to sort per shader

- add a form a culling 'areas' (quadtree or something, rather think of something myself :))

- after that no more optimizing, just add lots of new goodies

Here are the results (code), please shoot :) really like to hear your suggestions


// RenderFrame function

bool CD3d::RenderFrame(CD3dscene *pD3dscene, CD3dcam *pCam)
{
	if(!CheckDevice()) { mDeviceLost = true; return true; }
	mDrawCallsPerFrame = 0;			mDrawTriPerFrame = 0;

	pCam->Update();

	/** CULLING AND SORTING	**/
	
	if(!UpdateScene(pD3dscene, pCam)) return false;

	mD3ddev->Clear(0, NULL, D3DCLEAR_TARGET | D3DCLEAR_ZBUFFER, D3DCOLOR_XRGB(0, 0, 0), 1.0f, 0);
	mD3ddev->BeginScene();

	/** SET SHARED FX/ SHADER PARAMETERS **/
	
	if(D3DERR_INVALIDCALL == pD3dscene->mEffect[0]->SetMatrix("ViewProj", &pCam->mMatViewProjection)) return false;	// SHARED PAREMETER IN POOL
	
	/** RENDER SCENE USING FX/ SHADER WITH SPECIFIC TECHNIQUE **/
	
	if(!RenderScene(pD3dscene, pCam, "OpaqueShader", _OPAQUE)) return false;
	if(!RenderScene(pD3dscene, pCam, "BlendingShader", _BLENDED)) return false;

	if(pD3dscene->mSkyBoxInScene) if(!pD3dscene->mSkyBox.Render(pCam->mPosition, pCam, mD3ddev)) return false;

	/** FFP RENDERING, I.E. SCENE STATISTICS **/
	if(!SetDefaultRenderStates()) return false;
	PrintSceneInfo(pCam, pD3dscene->mNrMaterials);		

	/** PRESENT THE FINAL RENDERED SCENE FROM BACKBUFFER **/
	mD3ddev->EndScene();
	HRESULT hr = mD3ddev->Present(NULL, NULL, NULL, NULL); 
	return true;
}

// Render a scene with specific technique

bool CD3d::RenderScene(CD3dscene *pD3dscene, CD3dcam *pCam, char *pTechnique, int mattype)
{
	for(fx=0;fx<mRenderQueue.mNrEffects;++fx)		
	{
		if(!SetShaderTechnique(pD3dscene, fx, pTechnique)) return false;							// 1x SetTechnique, 1x SetPixelShader/ SetVertexShader?	
		pD3dscene->mEffect[fx]->Begin(&pD3dscene->mEffectNumPasses[fx], D3DXFX_DONOTSAVESTATE);		// 'x' RenderStates, based on FX/shader content

		for(_i=0;_i<pD3dscene->mEffectNumPasses[fx];++_i)
		{
			pD3dscene->mEffect[fx]->BeginPass(_i);
			for(mat=0;mat<mRenderQueue.mNrMaterials;++mat)
			{
				if(!pD3dscene->PreSelectMaterial(mat, fx)) return false;							// 2x SetFloatArray, 1x SetTexture									   							
				for(m=0;m<mRenderQueue.mMaterialData[mat].nrMeshes;++m)		
				{
					mesh = mRenderQueue.mMaterialData[mat].meshIds[m];
					if(!pD3dscene->mMeshes[mesh].SetBuffers(mD3ddev)) return false;					// SetStreamSource, SetIndices
					
					for(mi=0;mi<mRenderQueue.GetNrInstances(mesh, mattype);++mi) 
					{
						instance = mRenderQueue.GetInstance(mesh, mi, mattype);
						if(mRenderQueue.mMeshInstData[instance].effectId == fx)						// INDEX NEEDED TO?
						{
							if(mRenderQueue.mMeshInstData[instance].visible)						// (MICRO-OPT) optimization? Sort index per frame
							{
								if(!pD3dscene->PreSelectMeshInst(instance, mD3ddev)) return false;	// 2x SetMatrix (World/WorldInvTransp)	
								pD3dscene->mEffect[fx]->CommitChanges();
						
								for(subm=0;subm<mRenderQueue.mMaterialData[mat].meshSubMeshes[m].nrSubMeshes;++subm) 
								{
									submesh = mRenderQueue.mMaterialData[mat].meshSubMeshes[m].subMeshes[subm];
									pD3dscene->mMeshes[mesh].RenderSubMesh(mD3ddev, submesh, LIST); 
								}
							}
						}
					}
				}
			}
			pD3dscene->mEffect[fx]->EndPass();
		}
		pD3dscene->mEffect[fx]->End();
	}
	return true;
}

// Update scene function

bool CD3d::UpdateScene(CD3dscene *pD3dscene, CD3dcam *pCam)
{
	// TODO here; introduce tree - spatial culling

	/** UPDATE DISTANCE TO CAM FOR BLENDED MESH INSTANCES				**/
	for(m=0;m<mRenderQueue.mNrMeshes;++m)
		for(mi=0;mi<mRenderQueue.mMeshData[m].nrInstancesBlended;++mi)
			pD3dscene->mMeshInstances[mRenderQueue.mMeshData[m].instancesBlended[mi]].UpdateDistToCam(pCam->mPosition);

	/** SORT BLENDED MESH INSTANCES, BACK TO FRONT						**/
	if(!mRenderQueue.SortBlendedMeshes(pD3dscene)) return false;

	/** UPDATE WORLD MATRIX, FOR DYNAMIC MESH INSTANCES ONLY			**/
	for(mi=0;mi<mRenderQueue.mNrMeshInstDynamic;++mi)
		pD3dscene->mMeshInstances[mRenderQueue.mDynamicMeshInstIndex[mi]].UpdateWorldMatrix();

	/** CULL MESH INSTANCES AGAINST FRUSTUM, VISIBLE YES/NO				**/
	for(mi=0;mi<mRenderQueue.mNrMeshInst;++mi)
	{
		if(pCam->SphereInFrustum(&pD3dscene->mMeshInstances[mi].mWorldPos, pD3dscene->mMeshInstances[mi].mBoundingRadius))
			mRenderQueue.mMeshInstData[mi].visible = true;
		else mRenderQueue.mMeshInstData[mi].visible = false;
	}
	return true;
}

// the small functions which do the actual parameter changes

bool CD3d::SetShaderTechnique(CD3dscene *pD3dscene, int pEffectIndex, char *pTechnique)
{
	if(D3DERR_INVALIDCALL == pD3dscene->mEffect[pEffectIndex]->SetTechnique(pTechnique)) return false;
	return true;
}

bool CD3dscene::PreSelectMeshInst(int pMeshInstId, LPDIRECT3DDEVICE9 pD3ddev)
{
	if(D3DERR_INVALIDCALL == mEffect[mMeshInstances[pMeshInstId].mEffectIndex]->SetMatrix("World", &mMeshInstances[pMeshInstId].mMatWorld)) return false;
	if(D3DERR_INVALIDCALL == mEffect[mMeshInstances[pMeshInstId].mEffectIndex]->SetMatrix("WorldInvTransp", &mMeshInstances[pMeshInstId].mMatWorldInvTransp)) return false; 
//	OR normalize in Shader for lighting

	return true;
}

bool CD3dscene::PreSelectMaterial(DWORD pMatId, int pEffectIndex)
{
	if(D3DERR_INVALIDCALL == mEffect[pEffectIndex]->SetFloatArray("MatAmb", mMaterials[pMatId].Ambient, 4)) return false;
	if(D3DERR_INVALIDCALL == mEffect[pEffectIndex]->SetFloatArray("MatDiff", mMaterials[pMatId].Diffuse, 4)) return false;
	if(mTextures[pMatId] != NULL) 
		if(D3DERR_INVALIDCALL == mEffect[pEffectIndex]->SetTexture("Tex0", mTextures[pMatId])) return false;
	return true;
}

Crealysm game & engine development: http://www.crealysm.com

Looking for a passionate, disciplined and structured producer? PM me

Addition;

I also profiled/ measured number of draw calls/ triangles per frame, just to know much batches I use and how big they are.

A few numbers:

Draw calls in frame: 399

Triangles in frame: 194616

Average tri per call: 487

D3D Renderframe: present successfull

Draw calls in frame: 399

Triangles in frame: 194616

Average tri per call: 487

D3D Renderframe: present successfull

Draw calls in frame: 381

Triangles in frame: 184848

Average tri per call: 485


Crealysm game & engine development: http://www.crealysm.com

Looking for a passionate, disciplined and structured producer? PM me

Hi Hiello,
This post is become a blog/ book on render queues.. never the less... :)

Profiling with PIX works great, just 'freezed' a frame and compared the results with what I expected from my render function.
I see that I've gained quite a lot with my renderqueue and not looping through unnecessary stuff (materials, meshes etc.).

I also see that adding an index per effect would probably be more then a micro optimization.
In the case of my current testscene, I have the following 'unneeded sets' in one frame because of no FX/shader index per mesh/material:

- 20x SetFloatArray
- 10x SetTexture
- 12x SetStreamSource
- 12x SetIndices

I also noticed that my FX/ shader doesn't set renderstates at all, as shown in pix.
Per frame I measured the following number of setting render states:

- None during going through the effects/shaders (the sampler and render states from HLSL/ FX files not found in PIX output)
- For skybox rendering (after effects/shaders):

* ZWRITEENABLE, false
* CULLMODE, D3DCULL_CW

<render skybox>

* ZWRITEENABLE, true
* CULLMODE, D3DCULL_CCW
* ZENABLE, true
* ZWRITEENABLE, true (redundant!!)
* CULLMODE, D3DCULL_CCW (redundant!!)
* LIGHTING, false
* STENCILENABLE, false
* FILLMODE, solid

I see that after Skybox rendering I change back cullmode and zwriteenable, which I also do in a set of default renderstates at the end of the frame.
Which I not necessary also. Think I'll have to decide what to do you with, making state blocks or do it all in shaders (something for later).

I could definately use an index for effects/ shaders to reduce the not needed 'set's', which will get more and more important as I enlarge my scene and increase the number of different shaders/ FX's.

For now I did a 80/20 quick implementation like this:
- when looping through both meshes and materials (1 time per frame), I check a generated bool table, giving back if the material / mesh combination uses the effect. This way I can early reject based on material and save a lot of sets.

Any ideas/ hints on all this ? :)

Crealysm game & engine development: http://www.crealysm.com

Looking for a passionate, disciplined and structured producer? PM me

Hi.

I was working hard this week, so there was no time to post.

Now you are at the point where I can't see obvious problems in your code. Yes, it isn't perfect and may cause problems in the future, and, moreover, I would wrote (and I actually wrote) the whole scene graph + renderer differently. You are encouraged to dig into my code (there were links) if you want to know what I prefer :) I see no point in copying the same renderer in all projects around the world, and it is good that you try to architect your one by yourself.

And, definitely, implement spatial culling!

Hope to hear from you when you begin to implement new features. This always makes to rethink and improve rendering codebase.

This topic is closed to new replies.

Advertisement