• Content count

  • Joined

  • Last visited

Community Reputation

130 Neutral

About Niello

  • Rank
  1. "Sorting out" render order

    Hi. I was working hard this week, so there was no time to post.   Now you are at the point where I can't see obvious problems in your code. Yes, it isn't perfect and may cause problems in the future, and, moreover, I would wrote (and I actually wrote) the whole scene graph + renderer differently. You are encouraged to dig into my code (there were links) if you want to know what I prefer :) I see no point in copying the same renderer in all projects around the world, and it is good that you try to architect your one by yourself.   And, definitely, implement spatial culling!   Hope to hear from you when you begin to implement new features. This always makes to rethink and improve rendering codebase.
  2. "Sorting out" render order

    Hi. Here I am again. Btw, happy birthday to both you and me)   Shared params in effects are shared between different effects. While you use 1 effect you won't see any difference, but when there are different ID3DXEffect objects, that are created with the same pool, setting shared variable to one of them sets it in them all.   Your mesh refactoring is a good news. Also, if you use .x mesh files in ASCII format, moving to binary files will result in another big loading time win. And the third could be using precompiled .fx shaders.   As of your indexing system, I prefer sorting each frame. My advice on it all - download a couple of popular 3D engines and explore them. There are different advanced techniques that had prove their efficiency. My teacher, for example, is The Nebula Device of versions 2 and 3, but I don't recommend to copypaste them, instead you can gather ideas from. After all I faced the need of reimplementing the whole Nebula scene graph and renderer. Irrlicht or Ogre are also a good starting point, not sure about architecture, but render techs - definitely.
  3. "Sorting out" render order

    Hi again!   First, you didn't take into account ID3DXEffect calls like BeginPass, where SetVertexShader & SetPixelShader are called. It may not be actual when you use one tech for all, but it isn't practical, in any game you will use more, and if no, don't think too much about renderer at all.   Second. Since SetIndices is 900-5600, you can't just substitute 900 and make any assumptions. Why not, say, 4200? Or even 5600? It greatly changes things, isn't it? :) The answer is easy. Profile by yourself. Hardware changes, many other circumstances change, and more or less accurate profiling results can be gathered only on your target platform.   But the most significant my advice remains the same: write new features, expand your scene's quality and complexity, and start optimizing only when it comes necessary. Profiling has no real meaning in a synthetic environment. You should profile the things you user will receive or special test scenes where some bottlenecks are reproduced (like scene with lots of different particles to optimize particle systems).
  4. "Sorting out" render order

    Not at all. My profit is that I systematize and refine my knowledge writing this. Also maybe someone else will point me if I'm wrong. If you write under DX9, remember that it exists since the beginning of the past decade, near 10 years. All modern (and many obsolete) hardware supports DX9. It is never too late to optimize if you discover that your scenes are too big. Moreover, at that point you will know why your scenes are rendered slow, and choose optimizations accordingly. Now we discuss about techniques good in general. Actions: 1) Don't associate mesh with model instance. You may use the same mesh for many objects in a scene and store vertex & index data once. You even can render the same mesh under different materials and with different World matrix. 3) Do you mean D3DXFX_DONOTSAVESTATE? Docs claim that it prevents saving state in Begin() and restoring in End(). BeginPass() sets states anyway. Can't say more without seeing what's going on in your PIX. 6) World matrix will be set the same number of times anyway, cause it is per-object and set for eac object despite of sorting. AFAIK changing shader tech its the most costly operation. Setting shader constants is less costly. Setting textures and VBs/IBs depends on memory pool and total amount of GPU memory. This is not exact, you should profile. PIX has some profiling functionality. Questions: 1) You perform operation World * ViewProj. If you do this in a vertex shader, you have one GPU mul (4 x dp4) per VERTEX. If you do this on CPU, you have 1 matrix multiply (some CPU cycles or, better, fast inlined SSE function) per OBJECT. Given your object has 3 to 15000 vertices... But if you want to implement per-pixel lighting in shader, you must supply World matrix to it, and perform at least 2 matrix multiplications anyway. Here shared ViewProj helps. Send World matrix to shader, get world position, use it, multiply it by ViewProj and get projected position. 2) Spatial partitioning is a mature conception with many methods developed and information available. Spend some time in reading and googling. As of me, I preferred "loose octree" as a spatial partitioning structure, but now use simple "quadtree", because there are another interesting things to implement and I have no free time to be spread over secondary tasks (not sure there is such idioma in english, hm...). In a couple of words, spatial partitioning is based on "If I don't see half of level, I don't see any half of that half, etc etc, and I don't see any object there. But if I completely see the half of level, I definitely see all things there". Some code: (line 173, SPSCollectVisibleObjects) 3) As I already wrote, the World matrix is just a shader parameter. You can do this:   SetStreamSource SetIndices for all models that are represented by this mesh SetMatrix(World) DrawIndexedPrimitive Moreover, you can use instancing here and render many objects with only World matrix different in one DIP. One World matrix will set position and orientation of all your meshes the same, so all them will be rendered at one point, looking like a junkyard after a nuclear explosion. You can pre-multiply each mesh by its world matrix, and then save to the vertex buffer. It may worth it if you have static scene of different geometries, rendered with one texture and material, but in general it is work in vain, and completely unacceptable for dynamic (moving or skinned) objects. Don't spend your time on it, setting world matrix for each object is cheap enough. Also read this: It also was an answer to 5) 4) Check for all redundant sets (except, maybe, shader constants), not only for IB, VB. It is very easy to implement. If we have objects sorted by material, then by geometry: M1 G1 M1 G2 M2 G2 for each material SetShader for each geometry SetVB Render Without redundancy checks we have: SetShader(M1) SetVB(G1) Render SetVB(G2) Render SetShader(M2) SetVB(G2) Render And with it: SetShader(M1) SetVB(G1) Render SetVB(G2) Render SetShader(M2) [WE DON'T RESET G2 AS IT IS ALREADY SET] Render It has occasional effect, but since it comes almost for free, use it. My HW is a notebook with Core i7 2630QM + Radeon HD6770M. There is also integrated Intel Mobile HD3000(?) graphics chip.
  5. "Sorting out" render order

    Glad to read that my code comes useful for someone but me So, I'll give you a couple of advices, but before is the most important one. Don't spend time on writing The Fastest Possible Code if you don't have a performance bottleneck or if it isn't your aim. While the performance is acceptable (say, 30-60 FPS), develop new functionality without micro-optimization. Ok, now let's switch from boring lectures to what you want to read: if(!SetVertexShader(pD3dscene, ec, pTechnique, pCam)) return false; // 1x SetTechnique, 1x SetMatrix viewproj You can use shared shader constants (google "HLSL shared") and effect pool and set cross-tech variables like ViewProjection once per frame. shared float4x4 ViewProj; In my code it works this way. Here you save (NumberOfTechniques - 1) * SetMatrix Also note, that you can pre-multiply World * ViewProj on CPU, if your shaders don't require separate World matrix. pD3dscene->mEffect[ec]->BeginPass(i); Each pass sets render states you described for it. VertexShader, PixelShader, ZEnable, ZFunc and others. Also here shader constants are filled. Use PIX from DX SDK to dig deeper into the D3DXEffect calls. Here you can reduce state changes by writing passes effectively, especially when using D3DXFX_DONOTSAVESTATE. There is a good article: Instead of iterating through all meshes for all techs, you can (and probably should) sort your models. Using qsort or std::sort it is a trivial task and takes about 5 minutes. Also for big scenes you may want to use spatial partitioning for visibility checks and avoid testing visibility of every object. Renderer will receive only visible objects, which leads to performance improvement (especially sorting, which depends on number of objects being sorted). if(!pD3dscene->PreSelectMesh(pMeshIndex[oc], mD3ddev)) return false;// 2x SetMatrix, world/worldinvtransp, 1x SetStreamSource, 1x SetIndices If you sort your models by geometry, you can do 1x SetStreamSource, 1x SetIndices once per all objects of this geometry (inside the same shader, but often objects of the same geometry DO use the same shader). Again, shader is tightly coupled with material. Material is just a shader tech + shader variable values for this tech. So, set as much shaders param as you can after setting technique, and don't reset them for each mesh. Say, all golden objects have the same DiffuseColor. Use material "Gold" of shader "metal" and yellow DiffuseColor, set it once and render all golden objects. Sorting by material will help you a lot. Now you have to reset material for each mesh, even if it is the same for half of them. Check for redundant sets. In my code you can see RenderSrv->SetVertexBuffer(0, pMesh->GetVertexBuffer());RenderSrv->SetIndexBuffer(pMesh->GetIndexBuffer()); called for each object, but inside these methods you will find: if (CurrVB[Index].get_unsafe() == pVB && CurrVBOffset[Index] == OffsetVertex) return; Early exits may save you a couple of sets the renderer didn't take care of. Hope this helps.
  6. "Sorting out" render order

    Oh, forgot one thing. Think of object's World (or of WorldViewProjection) matrix as of just another personal shader parameter. It simplifies things.
  7. "Sorting out" render order

    1. Material should store shader along with constant shader params (that don't change from object to object made of this material) Other params will be defined in object itself as personal. 2. Sort objects by shader technique (or vertex + pixel shader), then by material, then by geometry 3. When you render: * set first tech, process all objects of this tech, set second tech etc * inside the tech, apply constant material params once and process all objects of this material * as they sorted by geometry, you can render instanced, if you write all differences (ideally only World matrix) to the vertex buffer if two objects have different personal shader parameters, you can't instance. Note: for instanced rendering you will switch tech, but it likely won't be redundant     If you want some code, I have it:   This is my endless work-in-progress :) Feel free to read, use and abuse. If there will be questions, I'll try to answer.
  8. Sorry for necroposting, but this topic is still on the first page of google results if you search "ode ray heightfield" Issue described here is still present in ODE. I wrote my own ray-heightfield collider to fix it. Don't know if ODE developers will add it to their codebase, but you always can get it here: [url=""][/url] 2moderator: If placing this URL here is unacceptable, please offer me other good way to share the code. Thanks. Base of these files is 1846 revision (the most recent for today). The reason of so strange behaviour of default collider is that it checks all triangles overlapping with second geometry AABB. Or smth like this. Long ray can have very huge AABB, which sometimes overlaps more than 100k heightfield triangles. Algorithm allocates too much and too big temporary buffers and runs out of memory. That's why you all get crashes, freezes and allocation failures. Hope this will help to someone. If you have any questions or noticed some bugs, feel free to email me through the profile.
  9. Follow target behaviour tree

    I try to avoid code duplication if possible. Since I already have FindPath and FollowPath actions I want to reuse them.
  10. Hi. Now I'm trying to add behaviour trees into my project. Today I've finished implementing simple movement and started the "FollowEntity" behaviour. It's algorithm is: 1. Actor receives "Follow" order and target entity. 2. Actor's DesiredDestination is set from the position of the target (for simple movement DesiredDestination is set from input). 3. "MoveTo" behaviour is executed which is Sequence of 2 Actions - "FindPath" and "FollowPath" 4. If target entity starts moving, we should update DesiredDestination at some frequency and re-run step 3 from the beginning. 5. If we reached the target (distance < threshold), actor should stop and optionally face it. 6. If target is lost behaviour fails and decision maker chooses other one to execute. Can anyone explain me how to construct tree that implements this algorithm? Steps 1-3 are easy but I'm not so familiar with tech yet to understand how to make tree do steps 4-6. Also I have few points to discuss about BT data storage and Action->Action communications through Actor variables. Need I to start a new thread or it will be normal to ask here? Thanks in advance.
  11. Behavior Trees

    Hi. Can anyone explain in which things decision trees and behaviour trees are different? After reading Dave's post I'm disoriented a bit.
  12. I need to convert very big texture of some standard format (24bit RGB, 32bit ARGB, 8bit grayscale) to the array of small DXT-compressed clusters stored in ONE file automatically. In fact I need only converted pixel data for _parts_ of my large image. Then I'll simply save it with fwrite() or something like this. Cluster size must be tuneable. I think there is no utilities that do this exactly as I need. I can use any library for (de)compression from/to DXT, but if D3DX can do it, I prefer to avoid additional dependencies. If its usage is really so difficult for my purposes, can you say me what library is better?
  13. Hello all. I'm trying to cut very big texture (16384*16384, D3DXFMT_P8 - grayscale bmp for memory usage reduction while debugging, later I will use different common formats such as A8R8G8B8) into small rectangle regions and then save them converted to DXT1 or DXT5. My idea is to load whole texture with //Source texture D3DXCreateTextureFromFileEx(pDevice, InputFile, 0, 0, 1, 0, D3DFMT_UNKNOWN, D3DPOOL_SCRATCH, D3DX_DEFAULT, D3DX_DEFAULT, 0, &TextureInfo, NULL, &pTexture); pTexture->LockRect(0, &Src, NULL, D3DLOCK_READONLY); if (!Src.Pitch||!Src.pBits) FAIL; //Destination texture pDevice->CreateTexture(clusterSize.X, clusterSize.Y, 1, 0, DestFormat, D3DPOOL_SCRATCH, &pCompressedCluster, NULL); pCompressedCluster->GetSurfaceLevel(0, &pClusterSurf); then in loop set rect for current cluster and perform conversion using D3DX, like here: CreateRect(X, Y, ClusterW, ClusterY, ClusterRect); D3DXLoadSurfaceFromMemory(pClusterSurf, NULL, NULL, Src.pBits, TextureInfo.Format, Src.Pitch, NULL, &ClusterRect, D3DX_FILTER_NONE, 0); I checked that source texture is loaded right. Then I convert its rect with D3DXLoadSurfaceFromMemory, when DestFormat is D3DFMT_DXT1 or source format (for test). Then I lock destination texture and manually check pixels, they look right at least when compression is not performed & rect is simply copied. After all I save cluster in both DDS & BMP formats (for test too). But in file only white rectangle appears, & when I load texture just saved, all 0xffffffff pixels loads. Did I anything wrong? Can anyone tell me how to solve my task better or simply RIGHT way? Thanks.
  14. Octree for dynamic objects

    Quote:Original post by Aressera the beauty of this system is that there is only one copy on an object in the tree at any one time, which makes for fast removal, and that object re-appropriation is very fast. Thanks a lot. I understood your algorithms and will try to implement it. One question. Do you use standard or loose octree?
  15. Octree for dynamic objects

    Quote: It's quite possible to have a suboptimal octree <skip> eg every branch of the tree has a depth of 4 and those depth 4 elements are the only place that dynamic objects are placed So I must allocate all the 4-th level nodes including empty ones at octree creation time. Yes? And, say me please, how to determine nodes which contains specified object (bruteforce, node pointers list in object class or any better way?). Quote: If you have more specific info about the nature of the application, then maybe you can get some more specific advice. Yes, I can say several things more about my application. It's non-professional engine that's developing for outdoor RPG. So: speed of objects and the size of the scene - Hmmm... Let the scene will be 1000*1000*200 (x*y*z) units and object's size is near 2*2*2 - 10*10*10. Movement speed will be varying from really small values up to 5-15 units\sec, but teleportation is possible too (in fact it's not a movement, and it's a bit confusing me). number of dynamic objects in the scene - I think, up to 200, but now I can't say exactly. 200 seems to be more than enough now (excluding static animated geometry) density of objects in the scene - up to 25-30 objects in group (dynamic, nearly placed), but groups from 1 to 7-8 will be much more common. These groups are at least 10 units from each other. If you need static objects density too, than add up to 25 static objects (eg trees in the forest) to the values above. I hope you can imagine what I try to describe. If it's not enough, ask me more and I'll try to answer. Thanks.