Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 14 Nov 2007
Offline Last Active Feb 01 2013 06:02 PM

Posts I've Made

In Topic: "Sorting out" render order

01 February 2013 - 05:59 PM


I was working hard this week, so there was no time to post.


Now you are at the point where I can't see obvious problems in your code. Yes, it isn't perfect and may cause problems in the future, and, moreover, I would wrote (and I actually wrote) the whole scene graph + renderer differently. You are encouraged to dig into my code (there were links) if you want to know what I prefer :) I see no point in copying the same renderer in all projects around the world, and it is good that you try to architect your one by yourself.


And, definitely, implement spatial culling!


Hope to hear from you when you begin to implement new features. This always makes to rethink and improve rendering codebase.

In Topic: "Sorting out" render order

22 January 2013 - 12:16 AM

Hi. Here I am again. Btw, happy birthday to both you and me)


Shared params in effects are shared between different effects. While you use 1 effect you won't see any difference, but when there are different ID3DXEffect objects, that are created with the same pool, setting shared variable to one of them sets it in them all.


Your mesh refactoring is a good news. Also, if you use .x mesh files in ASCII format, moving to binary files will result in another big loading time win. And the third could be using precompiled .fx shaders.


As of your indexing system, I prefer sorting each frame. My advice on it all - download a couple of popular 3D engines and explore them. There are different advanced techniques that had prove their efficiency. My teacher, for example, is The Nebula Device of versions 2 and 3, but I don't recommend to copypaste them, instead you can gather ideas from. After all I faced the need of reimplementing the whole Nebula scene graph and renderer. Irrlicht or Ogre are also a good starting point, not sure about architecture, but render techs - definitely.

In Topic: "Sorting out" render order

16 January 2013 - 03:25 PM

Hi again!


First, you didn't take into account ID3DXEffect calls like BeginPass, where SetVertexShader & SetPixelShader are called. It may not be actual when you use one tech for all, but it isn't practical, in any game you will use more, and if no, don't think too much about renderer at all.


Second. Since SetIndices is 900-5600, you can't just substitute 900 and make any assumptions. Why not, say, 4200? Or even 5600? It greatly changes things, isn't it? :) The answer is easy. Profile by yourself. Hardware changes, many other circumstances change, and more or less accurate profiling results can be gathered only on your target platform.


But the most significant my advice remains the same: write new features, expand your scene's quality and complexity, and start optimizing only when it comes necessary. Profiling has no real meaning in a synthetic environment. You should profile the things you user will receive or special test scenes where some bottlenecks are reproduced (like scene with lots of different particles to optimize particle systems).

In Topic: "Sorting out" render order

15 January 2013 - 06:04 PM

Not at all. My profit is that I systematize and refine my knowledge writing this. Also maybe someone else will point me if I'm wrong.

If you write under DX9, remember that it exists since the beginning of the past decade, near 10 years. All modern (and many obsolete) hardware supports DX9. It is never too late to optimize if you discover that your scenes are too big. Moreover, at that point you will know why your scenes are rendered slow, and choose optimizations accordingly. Now we discuss about techniques good in general.


1) Don't associate mesh with model instance. You may use the same mesh for many objects in a scene and store vertex & index data once. You even can render the same mesh under different materials and with different World matrix.

3) Do you mean D3DXFX_DONOTSAVESTATE? Docs claim that it prevents saving state in Begin() and restoring in End(). BeginPass() sets states anyway. Can't say more without seeing what's going on in your PIX.

6) World matrix will be set the same number of times anyway, cause it is per-object and set for eac object despite of sorting. AFAIK changing shader tech its the most costly operation. Setting shader constants is less costly. Setting textures and VBs/IBs depends on memory pool and total amount of GPU memory. This is not exact, you should profile. PIX has some profiling functionality.


1) You perform operation World * ViewProj. If you do this in a vertex shader, you have one GPU mul (4 x dp4) per VERTEX. If you do this on CPU, you have 1 matrix multiply (some CPU cycles or, better, fast inlined SSE function) per OBJECT. Given your object has 3 to 15000 vertices...
But if you want to implement per-pixel lighting in shader, you must supply World matrix to it, and perform at least 2 matrix multiplications anyway. Here shared ViewProj helps. Send World matrix to shader, get world position, use it, multiply it by ViewProj and get projected position.

2) Spatial partitioning is a mature conception with many methods developed and information available. Spend some time in reading and googling. As of me, I preferred "loose octree" as a spatial partitioning structure, but now use simple "quadtree", because there are another interesting things to implement and I have no free time to be spread over secondary tasks (not sure there is such idioma in english, hm...).

In a couple of words, spatial partitioning is based on "If I don't see half of level, I don't see any half of that half, etc etc, and I don't see any object there. But if I completely see the half of level, I definitely see all things there".

Some code:
http://code.google.com/p/deusexmachina/source/browse/branches/Dev/DEM/Src/L1/Scene/Scene.cpp (line 173, SPSCollectVisibleObjects)

3) As I already wrote, the World matrix is just a shader parameter. You can do this:

for all models that are represented by this mesh

Moreover, you can use instancing here and render many objects with only World matrix different in one DIP.

One World matrix will set position and orientation of all your meshes the same, so all them will be rendered at one point, looking like a junkyard after a nuclear explosion. You can pre-multiply each mesh by its world matrix, and then save to the vertex buffer. It may worth it if you have static scene of different geometries, rendered with one texture and material, but in general it is work in vain, and completely unacceptable for dynamic (moving or skinned) objects. Don't spend your time on it, setting world matrix for each object is cheap enough. Also read this:

It also was an answer to 5)

4) Check for all redundant sets (except, maybe, shader constants), not only for IB, VB. It is very easy to implement.
If we have objects sorted by material, then by geometry:

M1 G1
M1 G2
M2 G2

for each material
for each geometry

Without redundancy checks we have:


And with it:


It has occasional effect, but since it comes almost for free, use it.

My HW is a notebook with Core i7 2630QM + Radeon HD6770M. There is also integrated Intel Mobile HD3000(?) graphics chip.

In Topic: "Sorting out" render order

15 January 2013 - 12:05 AM

Glad to read that my code comes useful for someone but me smile.png
So, I'll give you a couple of advices, but before is the most important one. Don't spend time on writing The Fastest Possible Code if you don't have a performance bottleneck or if it isn't your aim. While the performance is acceptable (say, 30-60 FPS), develop new functionality without micro-optimization.

Ok, now let's switch from boring lectures to what you want to read:

if(!SetVertexShader(pD3dscene, ec, pTechnique, pCam)) return false;        // 1x SetTechnique, 1x SetMatrix viewproj

You can use shared shader constants (google "HLSL shared") and effect pool and set cross-tech variables like ViewProjection once per frame.

shared float4x4 ViewProj;

In my code it works this way.
Here you save (NumberOfTechniques - 1) * SetMatrix
Also note, that you can pre-multiply World * ViewProj on CPU, if your shaders don't require separate World matrix.


Each pass sets render states you described for it. VertexShader, PixelShader, ZEnable, ZFunc and others. Also here shader constants are filled. Use PIX from DX SDK to dig deeper into the D3DXEffect calls. Here you can reduce state changes by writing passes effectively, especially when using D3DXFX_DONOTSAVESTATE. There is a good article: http://aras-p.info/texts/d3dx_fx_states.html

Instead of iterating through all meshes for all techs, you can (and probably should) sort your models. Using qsort or std::sort it is a trivial task and takes about 5 minutes.
Also for big scenes you may want to use spatial partitioning for visibility checks and avoid testing visibility of every object. Renderer will receive only visible objects, which leads to performance improvement (especially sorting, which depends on number of objects being sorted).

if(!pD3dscene->PreSelectMesh(pMeshIndex[oc], mD3ddev)) return false;// 2x SetMatrix, world/worldinvtransp, 1x SetStreamSource, 1x SetIndices

If you sort your models by geometry, you can do 1x SetStreamSource, 1x SetIndices once per all objects of this geometry (inside the same shader, but often objects of the same geometry DO use the same shader).

Again, shader is tightly coupled with material. Material is just a shader tech + shader variable values for this tech. So, set as much shaders param as you can after setting technique, and don't reset them for each mesh. Say, all golden objects have the same DiffuseColor. Use material "Gold" of shader "metal" and yellow DiffuseColor, set it once and render all golden objects. Sorting by material will help you a lot. Now you have to reset material for each mesh, even if it is the same for half of them.

Check for redundant sets. In my code you can see

RenderSrv->SetVertexBuffer(0, pMesh->GetVertexBuffer());RenderSrv->SetIndexBuffer(pMesh->GetIndexBuffer());

called for each object, but inside these methods you will find:

if (CurrVB[Index].get_unsafe() == pVB && CurrVBOffset[Index] == OffsetVertex) return;

Early exits may save you a couple of sets the renderer didn't take care of.

Hope this helps.