Best Direct3D optimization techniques for 2002 and beyond?

Started by
6 comments, last by Hewson 22 years, 3 months ago
I know this issue has been discussed before many times, but I'd like to re-present it with an eye towards the next generation 3D hardware scenarios that continue to emmerge. I raise a lot of issues below hoping to come up with a better understanding on how best to focus time and energy on DX8+ optimizations going forward. Formally stated, what is the best approach to optimizing Direct3D applications written with DX8 and up, and targeted at NVidia GF3/ATI Radeon 8500 and forward looking hardware? What I'm wondering is, is it worth it anymore to focus much effort on optimising the rastorization/rendering phase? One of the main approaches in the past has always been to sort via texture and then renderstate. This used to be relatively consistent to achieve because games rarely used much multitexturing or high falutent effects like bump mapping and the like. But with the power available now, one can render a mesh with several textures and complicated render states setup to achieve a very particular effect. It's just a bit too complicated to try to sort all that out, because one mesh can be using the same texture as another, but two additional textures that are different. Because these cards do single pass multi texturing, I don't see how you can really sort via texture accept on the more simplisticly textured/rendered meshes. Follow my drift? I remember hearing that render state changes are becoming less and less expensive as DX versions progress. Is this true, and what about texture changes? And now with pixel and vertex shaders in the mix, things get even more complicated. Should one sort based on which shaders are used? What's the cost associated with this? Because shaders can obviate the need for many of the renderstates, so would moving to a predominantly shader based engine approach minimize the need for optimized sorting? On a related note, has anyone worked with the D3DXEffect technology in DX8? From what I've gathered from the brief coverage in the docs, it seems to be a new approach to the old renderstate blocks, but on steroids. Incidently, its very similar to what I was about to sit down and implement myself in my engine. Are people using this in commercial code, and would rolling my own version offer any speed/performance advantage, or is this feature pretty well optimized? I would like to be able to just fall back on this and sort only based on the Effects used by a mesh, rendering all the meshes which use a particular Effect in order, and then moving onto the meshes using the next Effect. Well I hope I have given you some food for thought. What I'm hoping the general reality is, is that we should not worry so much about optimizing the render stage, and focus more on culling and occlusion algorithms. (Just one more note added) - As I see it, as hardware will get more and more powerfull, enabling increasingly varied and impressive effects. Given this, objects in a scene will share increasingly less in common with each other, and there will be very little common ground to sort on. Am I making sense here? Edited by - Hewson on December 30, 2001 6:38:48 PM
Advertisement
Those who choose to optimize will always open themselves to the speed and functionality they are looking for. Those who choose not to optimize are thrown into ''the'' group: ''everyone else''. It depends on what your trying to accomplish. If you dont need to optimize anything anymore, you probably never did optimize to begin with, understand its usefulness, or implications- why not use Java or some other crappy ''gameAPI''-?

If I can squeeze 1 more feature and keep framerate because I optimize, and its 1 more than other apps have, than that means something to some people - even if my framerates are better with the same feature set, it means something.

Optimization will always be a choice.. its not the easy path.

New features and technologies will many times require certain optimizations to be implemented correctly, shaders, etc.

Are you looking to push limits and boundries, or just get by. Many times you can accomplish your goals by doing the minimum, if everything works fine, optimization is a choice.

-Ozz
Hmmm. You haven't really answered my question bro. I am not afraid to spend time optimizing, I'm asking if people can demonstrate a usefull technique for optimizing a portion of the pipeline that is growing increasingly less common between mesh objects.

As objects in a scene seek to become more realistic, they share less and less in their visual properties, just like in real life. There remains less and less to sort by. If you can show me a way which to optimize despite this fact, please do so. Can you suggest a useful optimization strategy for textures/renderstates/shaders that balances between better performance while maintaining a flexible structure to the engine?

Or for that matter, can you suggest any optimization at all given a scene made up mostly of objects with completely different visual properties, forgoing the engine's elegance?

Just like asm optimizations have becomg increasingly rare, its looking to me like the render/rastorization portion of the pipeline is becoming less of a candidate for sorting optimizations.

Its looking like visibility detection is become the best bang for your buck in terms of where to spend time optimizing an engine, without losing elegance in the engine's structure and interfaces. In fact, the DirectX docs even state this in the optimizations section, albeit without much detail.

If someone can show me this is not the case, I'm all ears, but please provide actual technical details.

Edited by - Hewson on December 30, 2001 11:42:25 PM
Ok I see what your saying in that respect man.

Aside from added functionality to vertex and pixel shaders with greater ease and use, there aren''t any ground breaking opts that I use off hand. Optimizations in the low level arena for significant preformance gains are becoming less common, with each advancement forthcoming. Unless your going to write your own implementation for a rendering pipeline, so to speak, your not going to gain significant boosts, like you mention.

As far as visibility is concerned I am going to agree with your general proposal, and I think the shader approach appears to be the wiser route at this point.

I wouldnt really worry much about optimizing the render stage itself. It has come to a point where you can really focus on optimizing algorithms. Im not sure exactly where any kind of optimization can take place.. DX itself has kind of breached the gaps in the render stage itself. The only real optimizations I see taking place are apps built specifically for a piece of hardware, as opposed to a general set of instructions that may not take full advantage of given capabilities..

I was on the wrong track..

I guess I was trying to say that really there should be a way, and for lack of an example I made general statements. Regardless, I would just avoid optimizing in that area if everything is cosier. I just dont see any significant gain to be had in that area as it stands.


-Ozz
why don''t you use a profiler (like v-tune from intel)???
seek how part of your code is taking the mayor part of the excution time and optimese only that part of the code
Oz: yeah that''s more on the lines of what I was talking about. So I think that at least you and I agree that the render/rastorization stage in a DX8 app with highly varied visual properties probably doesn''t offer as many optimization routes as it used to, and that visibility is even more important an area for optimization. But please, I would love to hear other''s opinions so I know moving forward that we''re on solid ground here.

psicor: of course running a profiling tool will be on the list for any app''s development, but considering that i''m writing a lower level engine architecture from which my apps will be built on, and also considering that what i''m doing has been done tons before, there should be a general concensus by this point on this issue, one which will help serve as the starting ground for an engine architecture.

running vtune would be good to do on top of this, but it will probably give me more useful information on the portion of an engine running on the host CPU, like the visibility algorithms. while it can tell me how long my engine is in any render optimization path, it won''t give me much details because a good portion of this stage is being done on the GPU.

i think oz helped round out my estimation by saying that DX itself has made most of the opt strides itself in its rendering phase, making it less of an issue than it used to be.

-Hewson
The DX FAQs (from like 5 on) describe DX as a state engine in which certain state changes must be minimized.

Specifically, the cost of changing textures is still reportedly high in DX8. This warning existed in the optimization documentation for game consoles like Dreamcast as well, so unless your hardware documents it''s not the case, I''d go with Microsoft''s advice. On the other hand, moving between vertex buffers (warned against under previous versions) is no longer considered expensive in DX8.

There was a great article from an MS lead programmer on why certain optimization techniques succeed and some fail in DX but I was unable to locate it. The article is VERY specific about what the rendering engine likes as far as the way polygons are listed and the order they''re passed to the pipeline.

I''d consider the MS articles on optimization a must read no matter what new hardware comes out, and will see if I can track down my paper copy, and post the URL.

Sorry I have no advanced knowledge of what the chipsets will handle in the future, but keep in mind that MS wrote DirectX to take advantage of features that may never have hardware implementations, so the MS optimization advice is still our best bet.
hundel: I agree, that poly sorting considerations also are a good place to opt, as described in the DX docs (somewhere if I recal). Regarding the Texture changes, if this is a thing that''s still good to minimize, that''s all fine and dandy, but can you see that in cases where object''s share less visual properties, especially where using special effects, that it will be impossible to do this?

Especially on the current and forward going crop of nexgen cards, which offer single pass multitexturing. Even if two different meshes share one texture in their visual style, they may still need to be set twice depending on the order in which they are used by the mesh''s "shader/renderstate settings".

Now as I was reading your post, something emmerged from memory that I think I read a long time ago. It was something to the effect that even 40-50 render stage changes was acceptable, so long as you didn''t get near the 100 range. That was several years ago. As far as texture changes, I''m still not sure.

So let''s rephrase the question, aside from minimizing the number of multi textured effects in a scene, how does one best optimize the number of texture stage changes in a scene that uses many multitexturing effects?

This still remains the un-answered question, and ".. you need to minimize texture stage changes.." doesn''t really answer it. We know I guess that this is bad, but what to do about it given the ever increasingly complex scenes?

Thx for all the comments so far.

Hewson

This topic is closed to new replies.

Advertisement