Sign in to follow this  

Optimizing Meshes

This topic is 2658 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi guys,

Without BLOWING my mind, can someone gently introduce me to how you go about optimizing a mesh when rendering? I've heard of octrees and quadtrees and all kind of trees bar oak trees and I was wondering if these can be applied to a model 5000 faces in size, as that's how big my main character is in face sizes. When I say main character I mean the one all humanoids will derive from, so there might be a few on screen. Damn I'd only need 5 to be hitting 25000+ faces per frame yikes!

Clearly something else must happen. It musn't be a simple case of chucking out a 5000 face model all the time and I'm not tooooo comfortable modelling LODS although I might just have to learn.

Is there a practical mathematical or code algorithm that can do this? Thanks ;o)

(Please try to keep it simple and always remember I am grateful for every reply).

Share this post


Link to post
Share on other sites
Well other than the line.

Mesh->OptimizeInplace( D3DXMESHOPT_ATTRSORT | D3DXMESHOPT_COMPACT | D3DXMESHOPT_VERTEXCACHE, (DWORD*)MeshBuffer->GetBufferPointer(), 0, 0, 0);

My knowledge of how to optimize models is hazy. Having one mesh and lots of animation tracks (per instance) is a decent means of reducing memory consumption but does not lower the CPU to GPU passing involved.

In DX10 I imagine you can pass the model in once and then pass in coordinates and an animation value and make the geometry buffer "build" the instance on the GPU for you cutting down the load between the two devices alot. Although of course I have never done that or I would be more specific or say you can do that.

I am aware of a VCache being a good way to increase rendering speeds down to the DX SDK although I do not know how to use it for all mesh allocation as the DX SDK tutorials haunt my nightmares with their convoluted code. All I can do to help is recommend you give OptimizeInplace a read over on the MSDN. I will watch this thread to see if anyone gives better advice I can follow too :p

As for management trees I think they come down to optimizing calculations for LOD in terrain and shadow lighting calculations. The only other instance you might be thinking of is a management system for ignoring models and objects offscreen in all drawing calculations.

Share this post


Link to post
Share on other sites
Nice reply thanks! It triggered off some ideas new and old. I'm using my own format so no access to hard coded DX functions and stuff sadly. I wonder if I can run something on a vertex buffer from hard coded functions? Unlikely probably.

I was thinking of uploading one set of vertices onto the GPU and then transforming these for each model using those vertices. So for example I upload the full set for the humanoid class. Then when animating I do the animation (vertex position changing) on the GPU. But I *somehow* keep an unaltered copy of the vertices on the GPU so that each model applies its own transform to this 'reference' vertex list (am I talking carp again?) so only one set ever has to be copied across.

Am I making enough sense for anyone to understand what I'm talking about?

Thanks for your reply sir/madam.

Share this post


Link to post
Share on other sites
I am a sir. You are talking sense but the idea of using the meshes from one models code and drawing multiple instances of it would probably require a geometry shader (DX10+) I am not sure what the OpenGL equivlent version is.

Ok so I have never done this before but I know that if you have data in a vertex buffer of any description you can basically transfer it around so long as your confident of your vertex definition (I dont recommend FVF). What I am trying to do to clarify is take a vertex buffer dump its data into a mesh optimise it and then extract the vertex buffer. This is my first ever attempt at brainstorming a solution so dont hold me to this code because I have never done this before.



//This by no account is the correct means to do it but if I was trying to do it I would be doing something like.
ID3DXMesh* meshGeneric; //Create a standardised mesh
meshGeneric->LockVertexBuffer(D3DLOCK_DISCARD, *YourVertexBufferData); //Attempt to lock and throw data into it from your vertex buffer.

//Attempt to run the optimisation based on your index buffers data. (might make the above line invalidated by the new data (I dont think so tho)
meshGeneric->OptimizeInplace(D3DXMESHOPT_ATTRSORT | D3DXMESHOPT_COMPACT | D3DXMESHOPT_VERTEXCACHE, (DWORD*)YourIndexBufferData->GetBufferPointer(), 0, 0, 0);

//Extract the optimised code out into your vertex buffer for your own intent.
meshGeneric->GetVertexBuffer(YourIndexBufferData);

//While I cannot be sure this is the right way to do this I can be sure its possible and that you wont be likely to out optimise this process.





Not sure thats how you do it but I am confident you could do it somehow. In retrospect you might need to perform lock and unlock commands during some of those processes.

Share this post


Link to post
Share on other sites
I'm sure I won't be the only one who will warn you about premature optimization, but mind the warning signs. If you don't know how much it will help things, than it's probably not worth it at the moment.

5000 faces is not much at all for remotely modern hardware... I've had 100 models(single texture, though for sure I'm not PS bound yet), rendering 650 verts in 400 faces through 4 DIP calls, rendered from 2 cameras... and even with the 800 DIP calls per frame(no instancing) I had a very happy 60 fps with no optimizations at all, and my rig is from the pre-vista days...


According to the research i've seen, which is by no means implied to be correct, a settexture(2500-3500cpu cycles) or worse setstreamsource(3500-6000) call is likely to stall you faster than 1200 - 1400 for DrawIndexedPrimitive. So having your tris ordered according to the texture/shader, will do wonders to help speed things up..

With that being said, the goal is to keep data in the cache as long as possible.
This means drawing nearby faces together. If I have to pull 3 verts for a tri in your nose, and then 3 more for a tri in your toes, and so on, it won't be long before the GPU says enough and kills your Framerate.

Look forward to seeing your work... I'm ready to tackle my own animation, but decided to stick around in GUI hell for a while...

Share this post


Link to post
Share on other sites
I can show 16 seconds of boredom in the video below of a repeated arm wave with my own mesh format:

http://www.youtube.com/watch?v=TK4m32vqhyo

It's just loaded out of a binary file with arrays corresponding to vertices/indices/normals/textures and the like laid out in a pre-defined way.

Every time I ask a question on here a whole new universe opens up. It will take me a while to digest what you have both said I'll reply properly shortly.

**Update**

EnlightenedOne:

Hi, is that only for .x files or can I use it on any vertexbuffer? Maybe a dumb question sorry. Very good reply! Lots of research needed there thanks alot for a great reply ;o)

Share this post


Link to post
Share on other sites
Quote:
Original post by Burnt_Fyr
I'm sure I won't be the only one who will warn you about premature optimization, but mind the warning signs. If you don't know how much it will help things, than it's probably not worth it at the moment.


Ok thanks mate I'll keep that in mind!

Quote:
Original post by Burnt_Fyr
5000 faces is not much at all for remotely modern hardware... I've had 100 models(single texture, though for sure I'm not PS bound yet), rendering 650 verts in 400 faces through 4 DIP calls, rendered from 2 cameras... and even with the 800 DIP calls per frame(no instancing) I had a very happy 60 fps with no optimizations at all, and my rig is from the pre-vista days...


Sounds good!

Quote:
Original post by Burnt_Fyr
According to the research i've seen, which is by no means implied to be correct, a settexture(2500-3500cpu cycles) or worse setstreamsource(3500-6000) call is likely to stall you faster than 1200 - 1400 for DrawIndexedPrimitive. So having your tris ordered according to the texture/shader, will do wonders to help speed things up..


So this means that I need to keep setStreamSource calls down right? So can I use one setStreamSource call and draw all my models off it, yet still animate them independently?

Quote:
Original post by Burnt_Fyr
With that being said, the goal is to keep data in the cache as long as possible.
This means drawing nearby faces together. If I have to pull 3 verts for a tri in your nose, and then 3 more for a tri in your toes, and so on, it won't be long before the GPU says enough and kills your Framerate.


I didn't understand that bit. N00b time - what is the cache? Why does drawing verts in different places wreck the GPU's output?

Thanks ;o)

Share this post


Link to post
Share on other sites
The Cache as far as I understand it is your RAM although he might be refering to the tiny Cache of RAM on the CPU. D3DX calls might look specifically like they are for one purpose which is .x mesh format but you use D3DXMATRIX and D3DX all sorts whenever your using DX. That container I believe is just a generic form of storage for verticies which can hold any elaborate combination of vertex data you tell it too (to my knowledge). If that doesn't serve you true MSDN it.

HOLD IT!

You mesh looks nice but when I realised that was wireframe I was abit worried why on earth do you need such a high poly mesh? If you reduce the poly count to 2000 you could probably get a near identical looking character provided you used a shader to do per pixel lighting calculations so the light got distributed as it should rather than shading being done per vertex. It looks like your using the Fixed Function Pipeline to draw that and you have only upped the poly count to smooth the light/shadow appearance, am I wrong?

If not you need to stop worrying about optimisation and go straight to here and assimilate some critical code. My capabilities would be dead in the water without these invaluable examples.

Go here and click on HLSL and start learning the beauty of what the GPU can do for you.

Share this post


Link to post
Share on other sites
When drawing a triangle, the relevant vertices are fetched from the vertex-buffer, and are processed with the vertex shader. The outputs from the vertex shader are stored in the vertex-cache (but it has limited space, so new results will overwrite old results).

If another triangle is drawn soon after, and shares some vertices with the previous triangles, then it won't have to run the vertex shader 3 times - instead it can fetch some of the results from the cache.

Share this post


Link to post
Share on other sites
Quote:
Original post by EnlightenedOneHOLD IT!

You mesh looks nice but when I realised that was wireframe I was abit worried why on earth do you need such a high poly mesh? If you reduce the poly count to 2000 you could probably get a near identical looking character provided you used a shader to do per pixel lighting calculations so the light got distributed as it should rather than shading being done per vertex. It looks like your using the Fixed Function Pipeline to draw that and you have only upped the poly count to smooth the light/shadow appearance, am I wrong?


Spot on. I am using the fixed function pipeline. No shaders no HLSL. The model was mainly just to demo my animation explorations. I've found a much more suitable mesh of only 2000 polys for the body on TurboSquid. It will be being integrated in when I have some more cash.

I have used HLSL before but never got too deep into it although doing animation stuff on the GPU is something I consider a must before any release.

Thanks for a great reply time to check out that link. This kind of stuff is just what I wanted to know so I get any nasty stuff out of my head, and my code early on ;o)

Thanks for your reply too Hodgman ;o)

**Edit**

If I seem a bit daft it's because I'm quite n00b about all this to be truthful. I've book marked that page EnlightenedOne thanks alot.

Share this post


Link to post
Share on other sites
Quote:
Original post by EnlightenedOne
Your mesh looks nice but when I realised that was wireframe I was abit worried why on earth do you need such a high poly mesh? If you reduce the poly count to 2000 you could probably get a near identical looking character provided you used a shader to do per pixel lighting calculations so the light got distributed as it should rather than shading being done per vertex.


Whoa! Wait a minute, does that mean I wouldn't need 3 vertices per face? At the moment when my models come out of blender I duplicate stuff so there's 3 vertices per face. Blender uses duplications of vertices in the index array which makes per vertex lighting impossible.

Can I do *ALL* my normals and lighting on the GPU and get it to look smooth and nice and drop the 3 verts per face approach? Help! Sounds uber but am I understanding this correct or am I barking up the wrong tree again? Sheesh that would make life easier. Would mean a complete re-write somewhere down the line but boy it would be something to do!

Share this post


Link to post
Share on other sites
If you smooth the mesh in Blender it should share vertices, and export you a much smaller mesh :) I had the same problem this weekend until I realised where it could be changed in Blender (was my first time using it - I think it's in the Object menu somewhere).

Rendering with per-pixel lighting should then result in a smooth mesh, less verts, but the same number of triangles.

Share this post


Link to post
Share on other sites
Thanks for the info ;o)

**Edit**

I'm still not sure to be honest how I can change the way I do it for this without knowing that command you mention. Anyways provided I shove as much of the responsibility to the GPU I should be alot better off anyway.

I've got alot of learning to do with that like TELO provided ;o)

Thanks to all who helped in this thread I got the info I needed for now cheers.

[Edited by - adder_noir on August 31, 2010 6:09:42 PM]

Share this post


Link to post
Share on other sites
Ok now I remember why I paid for GD+ account. I knew I would have to be an annoying a$$ somewhere and paying for the site's upkeep makes me feel a bit better about it. The thread isn't quite done yet *sigh* :o) I have one major question because I'm getting confused here:

1)I smoothed my normals in blender and exported per vertex normals thus requiring 3 verts per face. No vert sharing allowed between faces or the model looks blocky as hell. So is there some way I can use fewer vertices in the model and have the GPU take care of ALL the lighting so the model lights correctly AND looks smooth?

That's my question. Sorry ;o)

P.S. if there's some crucial point with models normals and faces I'm missing here can someone please point it out thanks. Do I really need 3 verts per face?

Share this post


Link to post
Share on other sites
It is impossible(technically not) to represent a triangle with less than 3 vertices. Also, verts are shared through faces... this is what the indices represent.

Smoothing normals takes the normal of each face that a vertex is part of, and averages(read: divides by number of faces) them to produce the smoothed vertex normal. This works good on organic shapes, as a vertex can be used for more than one face. It doesn't hold up however, when we want hard edges in our model(like a cube). But as long as the face normal of each face the vertex is shared by is the same, we don't need the vertex duplicated, as it can be reused. So a cube will require 24 verts. Now if we use draw primitive, we have to push 3 verts for each face, which results in 36 vertices moving in and out of the gpu. Indices help us by allowing 2 of the vertices of each quad's first triangle being reused. This explains the reason drawing indexed primitives is faster than non indexed.

Now in a large mesh, you may have many triangles sharing one vertex. With indexed primitives, this shared vertex should only need to be calculated once, in theory, and the results saved to a cache. Next time the vert is needed, rather than running the shader again, the results stored in the cache are used.

Back to my previous post:
If the indices are just randomly ordered, this may mean missing out on the previously mentioned optimization. If VertA is used, it is stored, but if enough verts are processed, this result may be over written, which means that another face using VertA will need to run the VShader again. This can be reduced by rendering neighboring triangles together, so that the GPU can use cached results as much as possible.

One last peice of advice: Make the Move to shaders... I held out with FFP for a long time, but now would never trade in the functionality of shaders for the security of the FFP.

Share this post


Link to post
Share on other sites
Thanks great post I rated you up again. I'm making perhaps a fundemental error here by being far too lazy upgrading to HLSL. It seems I've got more work to do again. Thanks so much for a great reply. I wish I could return the favour. You could/can have my source when I'm done if you want?

Share this post


Link to post
Share on other sites

This topic is 2658 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this