Hardware Instancing - Optimization Query
#1 Members - Reputation: 115
Posted 15 November 2012 - 07:08 PM
I am writing a structural CAD/Modelling type of application that utilizies hardware instancing extensively for rendering 3D models.
Basically there are typically thousands of beams/girders to draw each frame. These beams are generally comprised of standard structural sections, but for the purpose of this post, lets say that all my beams sections are made of I-sections (for the non-structural savvy folks, just picture a steel beam that is shaped like the capital letter 'I')
Now at first glance, it may seem that hardware instancing is a relatively straightforward choice when you are dealing with thousands of meshes that are geometrically very similar. As such, thats the approach i adopted. My application is performing fairly well for the most part, but when im dealing with large models that have hundreds of different sections, i run into performance issues.
Basically because there a lots of different types of I sections that exist. Each section has differences in flange width and thickness, as well as web depth and thickness. I am having to create a new mesh for each different type of I section, each frame. The reason i am doing it each frame is that i am concerned about the memory cost of storing hundreds of meshes, not to mention having to recreate them every time the graphics device needs to be reset. Having said that i have a feeling thats the way im going to have to go, unless someone more knowledgable than me can help me out with an alternative solution. Which brings me to my question...
Can you locally scale different components of a mesh? When i create my mesh, I'm basically retrieving the cross-section geometry data from a database, and then creating the mesh using that. The mesh has a standard length of 1 metre. When it comes to rendering the meshs, i use a world transform to 'stretch' the mesh to the right length. If i can somehow do something similar on a local scale, i could adjust things like flange thickness, width etc without having to create a new mesh for each type of I section. According to PIX, all my performance issues are steming from the constant locking and unlocking of buffers when im creating my meshes each frame, which is very understandable!
Can anyone suggest a better way to do what i want in a more efficient manner?
Thanks in advance.
Aaron.
#2 GDNet+ - Reputation: 2398
Posted 15 November 2012 - 08:11 PM
In this case you could apply the scale in one of two ways - either as a constant buffer parameter (which allows you to batch all similar meshes together) or as an instance level attribute (which would let you batch all meshes that can use the base mesh as its representation).
In the latter case you will have to lock and update and unlock the instance buffer, while in the first case you will just have to update the constant buffer for each batch.
I hope that helps!
Check out our (now available) D3D11 book: Practical Rendering and Computation with Direct3D 11
Check out my Direct3D 11 engine on CodePlex: Hieroglyph 3
Check out our free online D3D10 book: Programming Vertex, Geometry, and Pixel Shaders
Lunar Rift :: Dual-Paraboloid Mapping Article :: Parallax Occlusion Mapping Article :: Fast Silhouettes Article
#3 Members - Reputation: 115
Posted 15 November 2012 - 08:28 PM
I think i understand what your saying, although it may be hard to implement in my case. Basically every vertex has to be scaled. Every type of I-section has different depths, widths, flange thinkness, and web thicknesses. Scaling the height, width and length of the mesh can be done easily with a scaling matrix, its more the flange and web thicknesses that i am struggling with.
So really i would have to do something like this:
1. Scale mesh with a scaling matrix to give correct height, width and length.
2. Flag all vertices save for a few to not be affected by further scaling.
3. Apply additional vertical scale to adjust flange thickness
4. Flag all vertices save for a few to not be affected by further scaling.
3. Apply additional horizontal scaling matrix that will adjust web thickness.
Is this possible with your suggestion?
#4 GDNet+ - Reputation: 2398
Posted 15 November 2012 - 08:50 PM
Can you post a pic of the various scaling operations, or is it sensitive info?
Check out our (now available) D3D11 book: Practical Rendering and Computation with Direct3D 11
Check out my Direct3D 11 engine on CodePlex: Hieroglyph 3
Check out our free online D3D10 book: Programming Vertex, Geometry, and Pixel Shaders
Lunar Rift :: Dual-Paraboloid Mapping Article :: Parallax Occlusion Mapping Article :: Fast Silhouettes Article
#5 Members - Reputation: 115
Posted 15 November 2012 - 09:47 PM
Is that essentially what your saying?
I've attached a pic below showing the scaling operations. I'm a bit reluctant to post code, but i dont think the code will tell you anything anyway. I'm just passing a transformation matrix to my shader at the moment, on a per-instance basis.
#6 Members - Reputation: 1140
Posted 16 November 2012 - 08:42 AM
I am having to create a new mesh for each different type of I section, each frame. The reason i am doing it each frame is that i am concerned about the memory cost of storing hundreds of meshes, not to mention having to recreate them every time the graphics device needs to be reset.
Have you considered that this is the part where you loose your performance (ie. recreating the data every frame). Your beam section has 12 vertices (if the corners aren't rounded) so one beam element has 24 vertices, which is pretty much next to nothing. Now if your vertex size is like 32 bytes and you have 1000 different beam meshes, your data amount is still less than 1mb!
So, why not try to use static meshes, which you don't recreate every frame?
Otherwise, the beams are typically defines as a set of parameters (4 in your case?), so it totally possible to define how each vertex is affected by those parameter. Probably you'd need to give each vertex a set of weights which tell how each parameter affects the position.
Cheers!
#7 Members - Reputation: 1126
Posted 16 November 2012 - 01:00 PM
Edited by eppo, 16 November 2012 - 01:04 PM.
#8 Members - Reputation: 3831
Posted 16 November 2012 - 07:04 PM
You may be able to construct a girder in its entirely in a vertex shader, without using vertex buffers. If you pass the web,flange etc. data in as per instance data (e.g. read from a texture buffer), you can procedurally build a girder based on the SV_VertexID system value semantic. Either that, or tag parts of a mesh using a vertex weight map and scale and extrude based on that.
Yup, that's the way I'd do it. Strikes me that if the position component is based on a formula anyway, then moving that formula to GPU-side is the way to go.
It appears that the gentleman thought C++ was extremely difficult and he was overjoyed that the machine was absorbing it; he understood that good C++ is difficult but the best C++ is well-nigh unintelligible.
#9 Members - Reputation: 115
Posted 16 November 2012 - 10:04 PM
@kuana - yes I realize that's where the performance problems are steming from, as I mentioned in my earlier post. Thanks for the response.
#10 GDNet+ - Reputation: 472
Posted 18 November 2012 - 06:22 AM
You shouldn't have to recreate your mesh data every frame anyway, hundreds of meshes might not take up that much memory, if they aren't complex. I don't get why you are worried about that ? Nor why you are worried about device resets. If you need to rebuild your mesh data on a device reset then so be it. That is a rare event, where-as you say you would rather rebuild your mesh data ever frame to avoid having to do it occasionally. Your reasoning seems backwards to me.I am having to create a new mesh for each different type of I section, each frame. The reason i am doing it each frame is that i am concerned about the memory cost of storing hundreds of meshes, not to mention having to recreate them every time the graphics device needs to be reset.






