Sign in to follow this  
cozzie

"Sorting out" render order

Recommended Posts

cozzie    5029

Hi,

I'm at the point of implementing alpha blending (shader) to be able to use transparant/translucent textures, grass etc.

For now I didn't use any object/ mesh sorting because I didn't see the advantage yet.

 

But.. when using alpha blending, I've learned that I have to render objects (or better triangles?) from back to front for the expected result.

 

A few questions I'd like to hear your opinion on;

- is it worthwhile to render meshes/ objects based on world positition Z (initial worldpos vector * actual world matrix)?

(by the CPU, finding an efficient way of looping through objects, materials, effects, entities)

- what would you suggest to sort objects on when it comes to alpha blending?

 

The direction I'm following now is:

- update worldmatrix of each mesh each frame

- multiply by initial worldpos, resulting in a vector with 'Z' value

- use Z value to render 'regular' objects, front to back (increase performance on rasterization/ pixel shader)

- use Z value to render 'blended' objects, back to front

 

Any hints or pointers are really appreciated.

Will also be quite a challenge too find the best way to combine the 'loops' (materials, effects, effects, subobjecs/entities).

Share this post


Link to post
Share on other sites
Helo7777    693

I think the way your currently going about it is fine and fairly common. So your just adding all visible transparent objects to a list then sorting that list based on world distance from the camera. One thing to consider however, using the world position to camera may not be the best metric because consider two objects being close together, and the one at the back being much larger than the one in front. In this case you should probably render the object at the back first. So instead maybe calculate the distance from the camera to the objects bounding box.

Share this post


Link to post
Share on other sites
cozzie    5029

Thanks Nyssa, I can easily do that by using my radius value for each mesh, in combination with the world position (based on current camera position).

Good to hear that I'm on the right way.


Sorting 'regular' meshes (cpu usage) for rendering isn't that usefull (yet/ for now).

 

In the end I might need to merge vertex and indexbuffers to save renderstates etc..

I know loop through all effects, then materials, then meshes, the entities. This is potentially not perfect, because I switch meshes (index/ vertexbuffer) more then once for each entity. I find this a difficult choice, it's more setting of vars for the materials (in the effect/shader) versus setting more matrices and index/vertexbuffers. Don't know what's worse.

 

I might go for index table to prevent loops (for static objects that is, now 90% of the scene), but also then I would need to do switches on the meshes or materials. When I think of it now, probably switching materials is quicker then meshes:

 

- material select:

=> 2 SetFloatArray calls (amb and diff)

=> 1 TexTure call (or more when multitexturing or so)

 

- mesh select:

=> SetMatrix (World)

=> SetMatrix (WorldInvTransp, for normals/ lighting)

=> SetStreamSource

=> SetIndices

 

But then again going through all meshes and then materials, means selecting the materials again for other meshes.

Dilemma's dilemma's .... huh.png

 

Any advice on this?

Share this post


Link to post
Share on other sites
cozzie    5029

@Nyssa (and others :))

 

I've been studying my renderframe and found some flaws (I believe). To illustrate.

 

Sample scene:

15 meshes

5 entities for each mesh

7 unique materials

 

Assume that I do the following one time for Opaque meshes with Opaque technique and another time for blended meshes with Blending technique (already have 2 mesh index int lists in use):

 

loop through effects/shaders

{

    begin effect

    set technique (blended or opaque)

    set viewprojection matrix of camera

    loop through passes of technique

    {

        loop through materials

        {

            select material (setfloatarray 2x, settexture)

            loop through meshes; if mesh.mEffect = current effect

            {

                if mesh in frustum

                {

                    setstream vertices, setindices

                    set worldmatrix

                    commitchanges

                    loop through entities; if entity.material = current material

                    {

                        if in frustum

                        {

                            draw

                        }

                    }

                }

            }

        }

    }

}

 

This means I do:

- 7 material switches (2 float arrays, texture)

- 105 mesh switches (setstreamsource, indices and world matrix)

- 105 commitchanges

- 105x check effectindex for mesh

- 525x check material index for entity

 

I figure if I turn it around and go through meshes and then materials:

- 15 mesh switches

- 105 material switches

- 105 commitchanges

- 15x check effectindex for mesh

- 525x chec material index for entity

 

Room for improvement?

- what would take more performance, switching a mesh versus switching a material?

- bringing back the if loops for effect and material ID checking, I can do with index lists (int array's)

- what other improvements should I look for?

(I'm not yet ready to go for merged meshes into one vertex or index buffer, which might bring other possibilities)

 

Any advice is really appreciated.

Share this post


Link to post
Share on other sites
_the_phantom_    11250

Switching material implies switching shaders and switching shaders is basically the most costly thing to do on the GPU, as such you pretty much always want to reduce that to the smallest amount you can by sorting by material and, if you can, using instancing.

 

The other problem you have, however, if you over all algorithm as while it gets the job done it simply does too much work and too much redundant work at that.

 

The way many engines do it is to generate a list of objects which need to be rendered and then sort them using a sort key into material order, after which they simply walk through the list from top to bottom rendering each object as it comes. There will be some logic to reduce redundant shader changes (which can be as simple as keeping a track of the current material and shaders in it and only calling the 'set' functions if the new object requires something different) but over all once you have a sorted list it is simple to do.

 

http://realtimecollisiondetection.net/blog/?p=86 covers some more details,

Share this post


Link to post
Share on other sites
cozzie    5029

Hi Phantom,

Thanks, this clears things up. The article covers recognizable stuff, that's the good thing :)

 

I thought about making indices of entities (with corresponding mesh ID) who share the same material.

This way I could batch up and reduce the material changes to the number of materials in the scene.

 

Thinking about it, with what your saying, this saves a lot of state changes.

Assuming that I don't need to 'CommitChanges' when setting another streamsource and indexbuffer, is this correct?

 

If so, this will be my first step.

Maybe later on bundle the streams/indexbuffer to improve further.

Share this post


Link to post
Share on other sites
cozzie    5029

@Phantom; just remembered that this probably wont help/ work, because when changes the streamsource/vertexbuffer (mesh) I also need to set the mesh's world and worldinvtranspose matrix. Which probably means I need to commitchanges anyway...

Share this post


Link to post
Share on other sites
Helo7777    693

Yeah Phantom is spot on, your doing alot of redundant searching there.

 

The way I currently do it is via render "buckets". So as you loop over your scene objects, after you have determined an objects visible, you determine if its solid or transparent (you might have a boolean in your material for this) then add that object to the solid or transparent list. Each Object also has a key based on its render state (this can include its rasteriser state, blend state, texture ids, etc...) and each list is sorted using that key. So objects with similar states will be next to each other thus state changes should be kept to a minimum. Then you just render each object in those lists, setting world matrix values as you go.

 

Keep in mind that all this sorting needs to be done in an efficient way else it ends up being faster to simply do all the state changes anyway! You could also add to the above method a way of altering these lists only when an object moves in/out of the viewing frustum. That way they don't need to be rebuilt each frame!

 

I should add...I'm currently working in c++, but the same theory should translate in to c# smile.png

Edited by Nyssa

Share this post


Link to post
Share on other sites
cozzie    5029
Thanks all. I understand the principle of the index lists. I can definately apply this to reduce material state changes. Although within such a list i would have multiple meshes with own vtx/indexbuffers and world matrices, meaning i need to set these too, these are also statechanges right? (unless i combine meshes into big vtx/indexbuffers) I this is through i can only reduce State changes (and commitchanges) to at least the nr of meshes multiplied by the number of meshes, or am i overseeing something?

Share this post


Link to post
Share on other sites
Krohm    5030

[quote name='cozzie' timestamp='1357600643' post='5018799']
Maybe later on bundle the streams/indexbuffer to improve further.
[/quote]I'm not sure I understand what you plan to do with data layout... to improve what?

To re-order mesh drawing, just reorder the drawcalls.

Share this post


Link to post
Share on other sites
cozzie    5029
If i reorder the entities based on materials, i would still need to switch between the meshes (set world matrix, streamsource and indexbuffers) meaning i still need all those State changes. Is this correct? If so, i might be able to merge the entities/meshes in one or two big vtx and indexbuffers, with a common/ shared world matrix. Or am i thinking to difficult and is there an easier way to reduce the number of change states, other then meshes multiplied by materials?

Share this post


Link to post
Share on other sites
Niello    130

1. Material should store shader along with constant shader params (that don't change from object to object made of this material)

Other params will be defined in object itself as personal.

2. Sort objects by shader technique (or vertex + pixel shader), then by material, then by geometry

3. When you render:

* set first tech, process all objects of this tech, set second tech etc

* inside the tech, apply constant material params once and process all objects of this material

* as they sorted by geometry, you can render instanced, if you write all differences (ideally only World matrix) to the vertex buffer

if two objects have different personal shader parameters, you can't instance.

Note: for instanced rendering you will switch tech, but it likely won't be redundant

 

 

If you want some code, I have it:

http://code.google.com/p/deusexmachina/source/browse/branches/Dev/DEM/Src/L1/Render/Renderers/ModelRendererNoLight.cpp

 

This is my endless work-in-progress :) Feel free to read, use and abuse. If there will be questions, I'll try to answer.

Share this post


Link to post
Share on other sites
Niello    130

Oh, forgot one thing. Think of object's World (or of WorldViewProjection) matrix as of just another personal shader parameter. It simplifies things.

Share this post


Link to post
Share on other sites
cozzie    5029

Thanks Niello. I just went through your code and I think I partially understand it :)

I might need a little more help, if you have time to look at it.

 

As far as I read your comments, I think I'm on the right way in getting there. Honestly don't know what I can do as a next/ other step of improvement

(other then combining meshes into one vertex/indexbuffer and combined world matrix).


Do you see any other ways to decreasing setting render states, with the current low number of techniques I have up till now?

 

I added a comment everywhere I set a renderstate or set a parameter of an effect.

The 2 parts where I can make a better index will improve on if statements/ for loops (CPU), but not on renderstates I think.

 

Main render function:

 

bool CD3d::RenderFrame(CD3dscene *pD3dscene, CD3dcam *pCam)
{
	if(!CheckDevice()) { mDeviceLost = true; return true; }
	mEntitiesRendered = 0;
	pCam->Update();

	mD3ddev->Clear(0, NULL, D3DCLEAR_TARGET | D3DCLEAR_ZBUFFER, D3DCOLOR_XRGB(0, 0, 0), 1.0f, 0);
	mD3ddev->BeginScene();

	// SHADER rendering
	if(!RenderScene(pD3dscene, pCam, "OpaqueShader", pD3dscene->mMeshIndexOpaque, pD3dscene->mNrD3dMeshesOpaque)) return false;
	
	if(!pD3dscene->SortBlendedMeshes(pCam->mPosition)) return false;
	if(!RenderScene(pD3dscene, pCam, "BlendingShader", pD3dscene->mMeshIndexBlended, pD3dscene->mNrD3dMeshesBlended)) return false;
	
	if(pD3dscene->mSkyBoxInScene) if(!pD3dscene->mSkyBox.Render(pCam->mPosition, pCam, mD3ddev)) return false;

	// FFP rendering
	if(!SetDefaultRenderStates()) return false;			// 6x dev->SetRenderState();
	PrintSceneInfo(pCam, pD3dscene->mNrMaterials);		// draw 2d text with a D3DXFONT

	mD3ddev->EndScene();
	mD3ddev->Present(NULL, NULL, NULL, NULL); 
	return true;
}

The function to render the scene with a specific technique (for all effects in the scene, for now not more techniques)

 

bool CD3d::RenderScene(CD3dscene *pD3dscene, CD3dcam *pCam, char *pTechnique, int *pMeshIndex, int pNrMeshes)
{
	for(ec=0;ec<pD3dscene->mNrEffects;++ec)		// most of time 1 per scene, today
	{
		if(!SetVertexShader(pD3dscene, ec, pTechnique, pCam)) return false;		// 1x SetTechnique, 1x SetMatrix viewproj
		pD3dscene->mEffect[ec]->Begin(&pD3dscene->mEffectNumPasses[ec], D3DXFX_DONOTSAVESTATE);		// no SetRenderStates
		for(unsigned int i=0;i<pD3dscene->mEffectNumPasses[ec];++i)
		{
			pD3dscene->mEffect[ec]->BeginPass(i);
			for(oc=0;oc<pNrMeshes;++oc)
			{
				if(pD3dscene->mD3dMeshes[pMeshIndex[oc]].mEffectIndex == ec)
				{
					if(pCam->SphereInFrustum(&pD3dscene->mD3dMeshes[pMeshIndex[oc]].mWorldPos, 
											 pD3dscene->mD3dMeshes[pMeshIndex[oc]].mBoundingRadius))
					{
						if(pD3dscene->mD3dMeshes[pMeshIndex[oc]].mDynamic) pD3dscene->mD3dMeshes[pMeshIndex[oc]].UpdateWorldMatrix();
						if(!pD3dscene->PreSelectMesh(pMeshIndex[oc], mD3ddev)) return false;	
						// 2x SetMatrix, world/worldinvtransp, 1x SetStreamSource, 1x SetIndices

						for(mc=0;mc<pD3dscene->mNrMaterials;++mc)
						{
							if(!pD3dscene->PreSelectMaterial(mc, ec)) return false;	// 2x SetFloatArray, 1x SetTexture													   
							pD3dscene->mEffect[ec]->CommitChanges();
							{
								for(DWORD att=0;att<pD3dscene->mD3dMeshes[pMeshIndex[oc]].mAttrSize;++att) // index needed
								{
									if(pD3dscene->mD3dMeshes[pMeshIndex[oc]].mMatIdPerAttr[att] == mc) // index needed
									{
										if(pCam->SphereInFrustum(&pD3dscene->mD3dMeshes[pMeshIndex[oc]].mAttrWorldPos[att], 
											                     pD3dscene->mD3dMeshes[pMeshIndex[oc]].mAttrBoundingRadius[att]))
										{						
											pD3dscene->mD3dMeshes[pMeshIndex[oc]].RenderAttr(mD3ddev, att, LIST); // the draw call
											mEntitiesRendered++;
										}
									}
								}
							}
						}
					}
				}
			}
			pD3dscene->mEffect[ec]->EndPass();
		}
		pD3dscene->mEffect[ec]->End();
	}
	return true;
}

Share this post


Link to post
Share on other sites
Niello    130

Glad to read that my code comes useful for someone but me smile.png
So, I'll give you a couple of advices, but before is the most important one. Don't spend time on writing The Fastest Possible Code if you don't have a performance bottleneck or if it isn't your aim. While the performance is acceptable (say, 30-60 FPS), develop new functionality without micro-optimization.

Ok, now let's switch from boring lectures to what you want to read:

if(!SetVertexShader(pD3dscene, ec, pTechnique, pCam)) return false;        // 1x SetTechnique, 1x SetMatrix viewproj

You can use shared shader constants (google "HLSL shared") and effect pool and set cross-tech variables like ViewProjection once per frame.

shared float4x4 ViewProj;

In my code it works this way.
Here you save (NumberOfTechniques - 1) * SetMatrix
Also note, that you can pre-multiply World * ViewProj on CPU, if your shaders don't require separate World matrix.

pD3dscene->mEffect[ec]->BeginPass(i);

Each pass sets render states you described for it. VertexShader, PixelShader, ZEnable, ZFunc and others. Also here shader constants are filled. Use PIX from DX SDK to dig deeper into the D3DXEffect calls. Here you can reduce state changes by writing passes effectively, especially when using D3DXFX_DONOTSAVESTATE. There is a good article: http://aras-p.info/texts/d3dx_fx_states.html

Instead of iterating through all meshes for all techs, you can (and probably should) sort your models. Using qsort or std::sort it is a trivial task and takes about 5 minutes.
Also for big scenes you may want to use spatial partitioning for visibility checks and avoid testing visibility of every object. Renderer will receive only visible objects, which leads to performance improvement (especially sorting, which depends on number of objects being sorted).

if(!pD3dscene->PreSelectMesh(pMeshIndex[oc], mD3ddev)) return false;// 2x SetMatrix, world/worldinvtransp, 1x SetStreamSource, 1x SetIndices

If you sort your models by geometry, you can do 1x SetStreamSource, 1x SetIndices once per all objects of this geometry (inside the same shader, but often objects of the same geometry DO use the same shader).

Again, shader is tightly coupled with material. Material is just a shader tech + shader variable values for this tech. So, set as much shaders param as you can after setting technique, and don't reset them for each mesh. Say, all golden objects have the same DiffuseColor. Use material "Gold" of shader "metal" and yellow DiffuseColor, set it once and render all golden objects. Sorting by material will help you a lot. Now you have to reset material for each mesh, even if it is the same for half of them.

Check for redundant sets. In my code you can see

RenderSrv->SetVertexBuffer(0, pMesh->GetVertexBuffer());RenderSrv->SetIndexBuffer(pMesh->GetIndexBuffer());

called for each object, but inside these methods you will find:

if (CurrVB[Index].get_unsafe() == pVB && CurrVBOffset[Index] == OffsetVertex) return;

Early exits may save you a couple of sets the renderer didn't take care of.

Hope this helps.

Edited by Niello

Share this post


Link to post
Share on other sites
cozzie    5029

Wow, thanks for both the lecture and the pointers.

In the mean time I read some articles did some thinking and when through your suggestions one by one.

To begin with I agree with your remark on micro optimizations, I honestly don't now if I need them. I'm a bit anctious because of the specs of my own system and not yet reference tests on older CPU/GPU's (I have I5 2320, 660GTX 2GB, 8GB ram, Win7).

 

Here's what I'm gonna do/ and a few questions. If you have another minute.... really appreciated :)

 

Actions;

1 - I will give meshes an ID to be able to render meshes with the same vertex/ index buffer contents and material (will save some state changes definitely)

(although in memory they still have individual buffers.. hm)

2 - Shared parameters; I believe in my situation 'ViewProj' matrix is the only one thats shared, will implement that (quick win)

3 - Will dig into renderstates setting/ changes with PIX, not sure what's going on. I use D3DXDONOTSAVECHANGES and after shader rendering set my default renderstates (six of them). Although commenting this function/ not doing this, gives the same end result (?). I'll look into the article link you posted

4 - save lots of "if statements"/ CPU load by making indexes with meshes/entities per material (already have it, only needs to be sorted and moved into arrays with more columns)

5 - I just 'fixed' metrics/scaling and now have a scene of 70x70 meters (small desert village), I'll add 8 sand hill instances around it (with some trees), so I have 9 'subscenes'/partitions or how you'd call it.

6 - prefer looping through materials firsts and afterwards on meshes. This will save setting parameters for materials, but increase setting the meshes (world matrix, streamsource etc.), since one mesh might have entities with different materials). Is it correct to assume material setting in an effect is less performance eating then setting a mesh with it's parameters?

 

Questions:

1 - what's the advantage of multiplying world matrix for each mesh, with viewprojection and then pass in only the endresult to the shader?

(compared to doing the multiplication in the shader), does this take 'CPU' time and free 'GPU' time?

I know do this and could change it accordingly (depending on the gain);

* float4 worldPosition = mul(input.Pos, World);
* Out.Pos = mul(worldPosition, ViewProj);
 

2 - spatial devision.

I see a few options/ ideas I have:

* build up the 'subscenes'/areas while loading a scene, for example 100x100m is a scene

* check camera position against areas/ spaces and cull on this VERSUS cull the areas based on camera lookat vector and frustum

 * render only the active area versus this one + the next one facing the camera

(1st option asks from modelling that I 'block' the views to the next area's.

 

3 - sorting models by geometry.

How you explain it, I could set streamsource and indices just once for multiple meshes (sharing parameters like effect, technique and texture/ material).

Most meshes have their own world matrix, I therefor don't see how to do this. Because I need to set the world matrix anyhow (unless I combine mesh vertexbuffers and indices and one 'general' world matrix for this set of meshes in one buffer? (sounds way to complex for me looking at the possible not necessary micro optimizations :))

 

4 - checking by redundant vertexbuffer (/indexbuffer) setting; this sounds like not necessary when sorting meshes is correct.

Is this correct or are there other reasons to do this?

 

5 - batching; I'm gonna check how much triangles I render per draw call, just out of curiosity. I read that drawcalls should be reduced much as possible, with more triangles per draw call (because a draw call will relatively take the same time with more triangles, thus increasing performance). Might this also be a reason why to combine meshes into combined vertex/indexbuffers and shared world matrix?

 

Looking forward to your answers and ideas.

I'm also curious what hardware/ specs you have, maybe to do a reference tests after my optimizations.

Share this post


Link to post
Share on other sites
Niello    130

Not at all. My profit is that I systematize and refine my knowledge writing this. Also maybe someone else will point me if I'm wrong.

If you write under DX9, remember that it exists since the beginning of the past decade, near 10 years. All modern (and many obsolete) hardware supports DX9. It is never too late to optimize if you discover that your scenes are too big. Moreover, at that point you will know why your scenes are rendered slow, and choose optimizations accordingly. Now we discuss about techniques good in general.

Actions:

1) Don't associate mesh with model instance. You may use the same mesh for many objects in a scene and store vertex & index data once. You even can render the same mesh under different materials and with different World matrix.

3) Do you mean D3DXFX_DONOTSAVESTATE? Docs claim that it prevents saving state in Begin() and restoring in End(). BeginPass() sets states anyway. Can't say more without seeing what's going on in your PIX.

6) World matrix will be set the same number of times anyway, cause it is per-object and set for eac object despite of sorting. AFAIK changing shader tech its the most costly operation. Setting shader constants is less costly. Setting textures and VBs/IBs depends on memory pool and total amount of GPU memory. This is not exact, you should profile. PIX has some profiling functionality.

Questions:

1) You perform operation World * ViewProj. If you do this in a vertex shader, you have one GPU mul (4 x dp4) per VERTEX. If you do this on CPU, you have 1 matrix multiply (some CPU cycles or, better, fast inlined SSE function) per OBJECT. Given your object has 3 to 15000 vertices...
But if you want to implement per-pixel lighting in shader, you must supply World matrix to it, and perform at least 2 matrix multiplications anyway. Here shared ViewProj helps. Send World matrix to shader, get world position, use it, multiply it by ViewProj and get projected position.

2) Spatial partitioning is a mature conception with many methods developed and information available. Spend some time in reading and googling. As of me, I preferred "loose octree" as a spatial partitioning structure, but now use simple "quadtree", because there are another interesting things to implement and I have no free time to be spread over secondary tasks (not sure there is such idioma in english, hm...).

In a couple of words, spatial partitioning is based on "If I don't see half of level, I don't see any half of that half, etc etc, and I don't see any object there. But if I completely see the half of level, I definitely see all things there".

Some code:
http://code.google.com/p/deusexmachina/source/browse/branches/Dev/DEM/Src/L1/Data/QuadTree.h
http://code.google.com/p/deusexmachina/source/browse/branches/Dev/DEM/Src/L1/Scene/SPS.h
http://code.google.com/p/deusexmachina/source/browse/branches/Dev/DEM/Src/L1/Scene/Scene.cpp (line 173, SPSCollectVisibleObjects)

3) As I already wrote, the World matrix is just a shader parameter. You can do this:
 

SetStreamSource
SetIndices
for all models that are represented by this mesh
	SetMatrix(World)
	DrawIndexedPrimitive



Moreover, you can use instancing here and render many objects with only World matrix different in one DIP.

One World matrix will set position and orientation of all your meshes the same, so all them will be rendered at one point, looking like a junkyard after a nuclear explosion. You can pre-multiply each mesh by its world matrix, and then save to the vertex buffer. It may worth it if you have static scene of different geometries, rendered with one texture and material, but in general it is work in vain, and completely unacceptable for dynamic (moving or skinned) objects. Don't spend your time on it, setting world matrix for each object is cheap enough. Also read this:
http://msdn.microsoft.com/en-us/library/windows/desktop/bb173349(v=vs.85).aspx

It also was an answer to 5)

4) Check for all redundant sets (except, maybe, shader constants), not only for IB, VB. It is very easy to implement.
If we have objects sorted by material, then by geometry:

M1 G1
M1 G2
M2 G2

for each material
SetShader
for each geometry
SetVB
Render

Without redundancy checks we have:

SetShader(M1)
SetVB(G1)
Render
SetVB(G2)
Render
SetShader(M2)
SetVB(G2)
Render

And with it:

SetShader(M1)
SetVB(G1)
Render
SetVB(G2)
Render
SetShader(M2)
[WE DON'T RESET G2 AS IT IS ALREADY SET]
Render

It has occasional effect, but since it comes almost for free, use it.

My HW is a notebook with Core i7 2630QM + Radeon HD6770M. There is also integrated Intel Mobile HD3000(?) graphics chip.

Share this post


Link to post
Share on other sites
cozzie    5029

Thanks, I'll go work on it and keep you posted in a few days after making quite some changes.

 

One last thing I read in an MSDN article (Accurately profiling Direct3D API calls), is that:

- Setvertexshadercontant = avg. 1000 - 2700 cycles (I assume float4 arrays, matrices etc.)

- SetTexture = avg. 2500 - 3100 cycles

- SetStreamsource = avg. 3700 - 5800

- SetIndices = avg. 900 - 5600

 

When I use, say the minimum averages and compare switches a mesh with switching material:

Mesh change => 5.600 (3700 + 900 + 1000 (world matrix) + 1000 (inv trans world matrix))

Material change => 4.500 (2.500 + 1000 mat amb + 1000 mat diff)

 

Material switching seems just a little lower, or am I overseeing something?

(or one of both does things "under water" which I could find out with PIX)

Share this post


Link to post
Share on other sites
Niello    130

Hi again!

 

First, you didn't take into account ID3DXEffect calls like BeginPass, where SetVertexShader & SetPixelShader are called. It may not be actual when you use one tech for all, but it isn't practical, in any game you will use more, and if no, don't think too much about renderer at all.

 

Second. Since SetIndices is 900-5600, you can't just substitute 900 and make any assumptions. Why not, say, 4200? Or even 5600? It greatly changes things, isn't it? :) The answer is easy. Profile by yourself. Hardware changes, many other circumstances change, and more or less accurate profiling results can be gathered only on your target platform.

 

But the most significant my advice remains the same: write new features, expand your scene's quality and complexity, and start optimizing only when it comes necessary. Profiling has no real meaning in a synthetic environment. You should profile the things you user will receive or special test scenes where some bottlenecks are reproduced (like scene with lots of different particles to optimize particle systems).

Share this post


Link to post
Share on other sites
cozzie    5029

Hi Niello.
Just starting working on all the changes.

Using a shared parameter and 3dxeffectpool for my 'view projection matrix' is working (leaving world matrix calculation as parameter alone for now, looking at future plans with per pixel lighting). What is maybe strange is that it both works with: "float4x4 viewProj"; as with "shared float4x4 viewProj";

I simply created a LP3DXEFFECTPOOL and set the viewProj matrix only once per frame, result is fine (with and without 'shared' in the FX file/shader.
For now I'll keep it in, although don't understand why it works without.

Short version of the code:

// changed part of shader/effect creation function at startup

		D3DXCreateEffectPool(&mEffectPool);
			if(D3D_OK != D3DXCreateEffectFromFileA(pD3ddev, pScene->mEffectFilenames[ec].c_str(), NULL, NULL, 0, mEffectPool, &mEffect[ec], &errorBuffer))

// new function that now only sets technique, instead of also viewProj matrix

bool CD3d::SetShaderTechnique(CD3dscene *pD3dscene, int pEffectIndex, char *pTechnique)
{
	if(D3DERR_INVALIDCALL == pD3dscene->mEffect[pEffectIndex]->SetTechnique(pTechnique)) return false;
	return true;
}

// new part of render function

	// SHADER rendering
	// Set shared parameters first
	if(D3DERR_INVALIDCALL == pD3dscene->mEffect[0]->SetMatrix("ViewProj", &pCam->mMatViewProjection)) return false;	// SHARED PAREMETER IN POOL
	
	if(!RenderScene(pD3dscene, pCam, "OpaqueShader", pD3dscene->mMeshIndexOpaque, pD3dscene->mNrD3dMeshesOpaque)) return false;
	
	if(!pD3dscene->SortBlendedMeshes(pCam->mPosition)) return false;
	if(!RenderScene(pD3dscene, pCam, "BlendingShader", pD3dscene->mMeshIndexBlended, pD3dscene->mNrD3dMeshesBlended)) return false;
	
	if(pD3dscene->mSkyBoxInScene) if(!pD3dscene->mSkyBox.Render(pCam->mPosition, pCam, mD3ddev)) return false;


 

Will go into splitting my mesh class into a 'real' mesh class and new meshinstance class (including all changes necessary with this).

Share this post


Link to post
Share on other sites
cozzie    5029
Just a short update;
Going from meshes to mesh and meshinstance is quite a job, but a big improvement for sure.
When I think about it, I had about 20 or so tree meshes eating memory and buffers, while beeing all the same.

Short update;
Rough implementation done, nice side effect is that loading time is decreased but a couple of thousand % :)
Next step is clean indices..

Will keep you posted rolleyes.gif Edited by cozzie

Share this post


Link to post
Share on other sites
cozzie    5029
@Niello; still there?

In the middle of next steps for using mesh instances instead of full mesh for every object right now.
I'm now starting to take the following approach:

- create index table in mesh class containing a list with ID's of the instances of that mesh
(or maybe do this in my scene class, like 2 dimensional array, not sure if this works memory allocation wise?)
Do this trick 2 times, one for blended and one for opaque instances
- create index table in scene cass with array per material, containing the mesh ID's of meshes using this material
(create at startup for all static objects, no solution yet for dynamic objects)

- at rendertime I split rendering into a few main steps:

1a. culling; loop through all mesh instances and check against frustum. Mark with bool visible true/false
(in the future in this step I could add binary space checking, tree's, portals or whatever)
1b. sort blended meshinstances

2. main rendering loop:

a* loop through all materials
b* select material (state changes)
c* loop through mesh index that contains which meshes contain active material
d* select mesh (state changes, set buffers)
e* for each mesh, loop through the meshinstances index
f* if meshinstance visible true/false
h* if in frustum select meshinstance (state changes)
i* for each submesh of meshinstance do 'live' check boundingsphere in frustum
j* do draw call
... till end of scene

All steps above 2x, one for opaque and one for blended.
On state changes I will definitely save quite few setstreamsources/ setindices.

I'll also do some profiling on the number of batches/ draw calls I do per frame and how many triangles they include.

What's your advice on this, am I shooting myself in the foot for expansions in the future? (i.e. combining buffers, binary space positioning etc.).
Also curious what you think about the 'shared float' thingie above.

update 21-1;
still working on it and making nice steps, just decided I want a renderqueue class to handle all this. To be able in having a flexible 'render bucket'. In the class I'll have all indices for meshes, materials, submeshes, save depths, sorting functions etc.

Still curious though on your thoughts/questions on the last updates Edited by cozzie

Share this post


Link to post
Share on other sites
Niello    130

Hi. Here I am again. Btw, happy birthday to both you and me)

 

Shared params in effects are shared between different effects. While you use 1 effect you won't see any difference, but when there are different ID3DXEffect objects, that are created with the same pool, setting shared variable to one of them sets it in them all.

 

Your mesh refactoring is a good news. Also, if you use .x mesh files in ASCII format, moving to binary files will result in another big loading time win. And the third could be using precompiled .fx shaders.

 

As of your indexing system, I prefer sorting each frame. My advice on it all - download a couple of popular 3D engines and explore them. There are different advanced techniques that had prove their efficiency. My teacher, for example, is The Nebula Device of versions 2 and 3, but I don't recommend to copypaste them, instead you can gather ideas from. After all I faced the need of reimplementing the whole Nebula scene graph and renderer. Irrlicht or Ogre are also a good starting point, not sure about architecture, but render techs - definitely.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this