DirectX Performance

Started by
10 comments, last by LotusExigeS1 14 years, 2 months ago
I am writing this game - http://www.entombed.co.uk The performance of my 'engine' is good but I want to increase the fps. Does anyone know of any "do's and dont's" documents for directx ? I have tried hardware instancing for the walls and this hurt performance (I think the overhead of setting it up out-weighed the render speed increase). Also, I think this is for when you are rendering *lots* of items not just 200 ish. I have played with rendering front to back and back to front and see very minimal difference. Is the hardware just sorting them for me ? I use the DrawPrimitive call which takes in a vertexbuffer per mesh (slimdx by the way), is batching going to help ? I do make lots of calls to draw the walls which are only 2 triangles each (one call per wall). I have read conflicting reports on whether a performance increase will be achieved or not. Any tips or gotchas from anyone ? Thanks.
Advertisement
Making a lot of DrawPrimitive() calls is the main culprit, if you're only drawing 2 triangles per frame, then you're likely going to be thrashing the driver a bit. What is your algorithm for drawing the walls like? You'll probably get much better performance putting all the wall triangles into a single buffer and drawing them all with one Draw[Indexed]Primitive call, or moving to a dynamic vertex buffer and filling ith with the walls to draw each frame.
Correction - I do use the indexed drawprimitve call to draw currently, but still one call per wall.

I use portal rendering and store the results. I then sort them into mesh and texture order. Set the mesh and texture to the device and then render each wall in turn. Repeat until nothing left to render. This is how it is now due to me trying the hardware instancing. But to be honest whatever I try doesnt really effect performance. It might just be that my gfx card (ati 4890) is a monster and doesnt care.

What is considered 'a lot' of draw calls ? I can currently render at 1920x1080 6 omni shadow casting lights and still get about 60 fps. I dont know whether this is good or not ?
In addition to what Steve said:

Most performance optimizations have a tipping point in balance - when the optimization's benefits outweigh the negative impact of the setup needed.

You can't simultaneously see all the effects for all possible optimizations, since each and every system has its own bottlenecks. For example, front to back rendering is an incredibly good performance booster if your bottleneck is the pixel shader complexity, but if the geometry or scene processing becomes a bottleneck as a result of having to sort the geometry, the overall performance wouldn't rise as much as in isolation.

Hardware instancing is good if the CPU time or bus bandwidth required to draw your stuff is the bottleneck, but at the cost of increased geometry setup cost at the GPU as well as the possible overdraw. Instancing's main use case is that you have a moderately complex mesh (way more than two tris) that you want to duplicate with a smaller set of per-instance variables, and that you can use as few device calls for this as possible with as small total data set as possible.

The main point is that you should find these balances by profiling. PIX can help you see what the bottlenecks are in your particular app.

Niko Suni

I have heard about PIX, should I invest some time learning it. Is it intuitive or a pain to use ? Any hints on best practices when using it ?
You might find PerfHUD helpful if you have an Nvidia card. The documentation for that provides some good tips on performance analysis.

The first step is normally to find out if the CPU or the GPU is the limit on frame rate, as that will make a big difference to what optimizations you should look at.
Go through the tutorials in the SDK to get a general feel to PIX. I find it relatively easy to use - the analysis and conclusions you have to do from the data is the difficult part :)

Niko Suni

Thanks guys for your posts.

I have an ati card so cant use the nvidia app. I will give PIX a go and find out what I can.

Any other tips on performance killers ?
Quote:Original post by LotusExigeS1
I use portal rendering and store the results. I then sort them into mesh and texture order. Set the mesh and texture to the device and then render each wall in turn. Repeat until nothing left to render.
If you're already sorting by mesh and texture, then you're near as efficient as possible, and if you don't change any device state between two DrawIndexedPrimitive calls, then it only counts as one batch to the driver (But it still means a not-so-cheap transition to kernel mode to submit the draw call). If you're able to draw multiple walls in a single call instead of one wall at a time, that would be a win.

Quote:Original post by LotusExigeS1
What is considered 'a lot' of draw calls ? I can currently render at 1920x1080 6 omni shadow casting lights and still get about 60 fps. I dont know whether this is good or not ?
More than a few hundred (~500) batches is usually going to be a problem. There's not really any fixed value, it depends on a lot of factors.

It's not really worth profiling this sort of thing if the card doesn't have a lot of load; there's too much noise to get anything meaningful. I'd recommend testing on a lower spec CPU / GPU, or making the GPU do more work (E.g. try drawing the map 10 times or something like that) before you try to do any profiling.
Thanks everyone.

This topic is closed to new replies.

Advertisement