DirectX 9: General Optimisation Tips ?

Started by
10 comments, last by rubicondev 16 years, 11 months ago
Let's forget a moment about demos and ideal world applications that a tech guy from nVidia would rustle up to show off, I'm talking about actual real-world engines. How badly do drawprim counts count against everything else for example ? My batches are fairly small, but they have to be - there's not much point putting a trillion faces into a telegraph pole, yet all those poles need rendering separately so they can be positioned. I've sort of reached a point where I feel that I'm missing something major as my engine feels kinda slow. It does all lighting per-pixel, but using ubershaders so it's all single pass. In a scene I have with around 250 dp's (and no lighting atm) I'm getting a framerate from my 1950 that feels more inline with playstation. I've commented out chunks of stuff and I never seem to get much of a speed up. I guess I'm looking for general optimising advice. It's no good suggesting playing with PIX - I'm at a stage that's way before that level of scrutiny. I'm considering an overhaul of the pipeline at a gross scale, but not quite sure which way to go because I don't really know what's wrong with what I have now. It's driving me crazy. Is there a decent *current* paper on general performance guidelines for real world apps ?
------------------------------Great Little War Game
Advertisement
This presentation is practically legendary when it comes to this stuff. And there's a lot more that is worth reading.
SlimDX | Ventspace Blog | Twitter | Diverse teams make better games. I am currently hiring capable C++ engine developers in Baltimore, MD.
Direct hit. Many thanks...
------------------------------Great Little War Game
From my own (recent) experience: shader switches are the devil. Try to only switch when you absolutely have to. I got a 300% increase in FPS just from reducing shader switches. Also, I noticed that while a lot of people tell you you should sort by shader, grouping by shader is really what you need. The difference is that grouping can be done in O(n) while sorting is at best a O(n * log n) operation.
I'm already doing that (the sorting part anyway). In fact I'm doing quite a lot of things like that which is why I don't get why my engine feels so slow. If I turn off the rendering completely then it rips so I know the problems not in the rest of my game.

Keep the tips coming though, handy thread already. :)
------------------------------Great Little War Game
In a non-pipelined single processor architecture, optimizing the wrong place would gain you little extra performance. The beauty of programming pipelined multiple processor architectures is that optimizing the wrong place would gain ZERO extra performance! A chain is only as strong as its weakest link. A pipeline is as fast as its slowest stage.

If the part you're optimizing currently is not where the bottleneck is located, all your efforts would go in vain. If your application is not CPU-bound for instance, reducing number of batches per frame won't help at all.

So, the key to the wonderland is:
Quote:
FIRST locate the bottleneck, THEN try to optimize it. This won't totally remove the bottleneck. It simply moves the bottleneck to another part of your application and boost your performance to some extent. Relocate the bottleneck, then optimize it again. Do this as many times as you need until you reach the sweet spot. You may then exploit the idle time in other stages and complicate those stages for free!

PROFILE and OPTIMIZE

I'm all for general profiling, trouble is I don't know how to do it properly anymore.

My feelings about VTune are expressed quite concisely elsewhere, but I don't know what other products are available that actually work. I used metrowerks CATS a long time ago but it seems to be discontinued. Dev no longer has it's simple one built in.

PIX on 360 is okay, but the bottlenecks on that beast are in totally different places so I've gone as far as I can with that. The PC version of PIX is grim by comparison and doesn't seem to show much about system-wide clashes and bottlenecks.

I'm currently downloading the Beta of ATIs perfhud equivalent. Hopefully that'll give me some insight, but my download seems to be running at a byte an hour so who knows....

I do strongly suspect I'm CPU bound. I'm using a lot of it for my game (it has a simple fluid dynamics system in it), and turning off rendering completely makes it fly. I have about 250 smallish batches to draw 80K polys, all of which are single pass. I wouldn't expect this to be bound by anything tbh - have we gone backwards ?

I'm certainly no newb at this stuff, but I do think I must have a schoolboy error somewhere. I just can't find it!
------------------------------Great Little War Game
Quote:Original post by RubiconMobile
How badly do drawprim counts count against everything else for example ? My batches are fairly small, but they have to be - there's not much point putting a trillion faces into a telegraph pole, yet all those poles need rendering separately so they can be positioned.


Total War doesn't draw each soldier with a single call to dip.

see here
NVPerfHUD is a great profiling tool to look into, although you already seem to be aware it. You also need an NVIDIA card if you are to use it to its full extents. I suggest you first test your application with that (or its ATI equivalent) to validate your guess and make sure you indeed are CPU-bound. If you actually are, go for algorithmic optimizations first rather than coding hacks since they would gain in much bigger improvements unless you're doing some BIG coding mistake. Go for coding optimizations next.

The documents that come with NVPerfHUD are nevertheless very interesting. They would help a lot if you feel kind of lost. There was especially that one guide on optimizations HOW TOs, whose name I've unfortunately forgotten, but I'm sure you'll be able to find it. "NVIDIA GPU Programming Guide" is also a good reading. You can find that at NVIDIA's developer website.
I think I'm currently downloading the entire nVidia website now, thanks :)

Instancing is at the top of my engine wishlist but it won't help my current projects as they don't have lots of repeats - the telegraph poles was just a classic example of batching problems. (The lead xbox programmer for Spartan is a colleague of mine btw)
------------------------------Great Little War Game

This topic is closed to new replies.

Advertisement